This post is PART-1 of two posts. PART-1 will go through matrix decomposition using Singular Vector Decomposition (SVD), randomized Singular Vector Decomposition (rSVD) and Nonnegative Matrix Factorization (NMF) mainly for image compression.
PART-2 will look at video background removal using randomized SVD and robust PCA.
Singular Vector Decomposition (simple matrix example)
Singular Vector Decomposition (SVD) is a mathematical technique for decomposing a matrix into three separate matrices, each having special properties. In essence, SVD transforms correlated variables into a set of uncorrelated variables.
SVD can be used to better understanding a dataset because SVD will show you the number of important dimensions, it also can simplify the dataset by reducing the number of features removing linearly dependent features from the point of view of Linear Algebra. So SVD can be used to reduce a high-dimensional dataset into fewer dimensions while retaining important information.
Suppose A is a matrix with each variable in a column, and observations in rows, then the SVD are composed by:
A = U · d · VT
Where:
- A: original matrix (m x n)
- U: orthogonal matrix (m × n)
- d: diagonal matrix (n × n)
- V: orthogonal matrix (n × n)
Let’s start with a small example dataset A
: 10 x 3, with 3 features and 10 rows.
1 2 3 4 5 6 7 |
set.seed(0) A <- matrix(runif(30, min=0, max=100), nrow=10) A <- data.frame(A) colnames(A) <- paste0("feature_",1:NCOL(A)) rownames(A) <- paste0("register",1:NROW(A)) A |
1 2 3 4 5 6 7 8 9 10 11 |
## feature_1 feature_2 feature_3 ## register1 89.66972 6.178627 77.744522 ## register2 26.55087 20.597457 93.470523 ## register3 37.21239 17.655675 21.214252 ## register4 57.28534 68.702285 65.167377 ## register5 90.82078 38.410372 12.555510 ## register6 20.16819 76.984142 26.722067 ## register7 89.83897 49.769924 38.611409 ## register8 94.46753 71.761851 1.339033 ## register9 66.07978 99.190609 38.238796 ## register10 62.91140 38.003518 86.969085 |
Next code will transform the A matrix into 3 matrices (diagonal matrix: d, and the u and v) according to SVD decomposition and using the function svd(). Below you can see these matrices:
1 2 3 |
A_svd <- svd(A) A_svd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
## $d ## [1] 300.06826 113.25773 84.80736 ## ## $u ## [,1] [,2] [,3] ## [1,] -0.3450932 -0.39375868 -0.44297281 ## [2,] -0.2499063 -0.54713537 0.23660896 ## [3,] -0.1517606 -0.01260518 -0.09916967 ## [4,] -0.3595852 -0.06634029 0.27641129 ## [5,] -0.2982925 0.23555537 -0.40124974 ## [6,] -0.2256024 0.19268910 0.52579073 ## [7,] -0.3584251 0.09509590 -0.22232164 ## [8,] -0.3471264 0.47793821 -0.20284986 ## [9,] -0.3897184 0.28669365 0.36108378 ## [10,] -0.3541244 -0.35883174 0.05603342 ## ## $v ## [,1] [,2] [,3] ## [1,] -0.6948113 0.1875190 -0.6943154 ## [2,] -0.5277211 0.5229674 0.6693397 ## [3,] -0.4886182 -0.8314696 0.2644060 |
We have to take into account some aspects about SVD result matrices:
- Each row of u matrix corresponds with each row in the the original matrix.
- The columns of matrix v match with features in original matrix.
- The columns of matrix u must be considered like topic1, topic2, and topic3 and doesn’t match directly with the original matrix columns (feature_1, feature_2, feature_3).
- Diagonal matrix (d) corresponds with each topic, and are ordered by topic importance (first element more important and so on).
According to that, we can put names to each column to clarify it:
1 2 3 4 5 |
U <- data.frame(A_svd$u) colnames(U) <- paste0("topic",1:NCOL(U)) rownames(U) <- paste0("register",1:NROW(A)) U |
1 2 3 4 5 6 7 8 9 10 11 |
## topic1 topic2 topic3 ## register1 -0.3450932 -0.39375868 -0.44297281 ## register2 -0.2499063 -0.54713537 0.23660896 ## register3 -0.1517606 -0.01260518 -0.09916967 ## register4 -0.3595852 -0.06634029 0.27641129 ## register5 -0.2982925 0.23555537 -0.40124974 ## register6 -0.2256024 0.19268910 0.52579073 ## register7 -0.3584251 0.09509590 -0.22232164 ## register8 -0.3471264 0.47793821 -0.20284986 ## register9 -0.3897184 0.28669365 0.36108378 ## register10 -0.3541244 -0.35883174 0.05603342 |
1 2 3 4 5 |
D <- data.frame(diag(A_svd$d)) colnames(D) <- paste0("topic",1:NCOL(U)) rownames(D) <- paste0("topic",1:NCOL(U)) D |
1 2 3 4 |
## topic1 topic2 topic3 ## topic1 300.0683 0.0000 0.00000 ## topic2 0.0000 113.2577 0.00000 ## topic3 0.0000 0.0000 84.80736 |
1 2 3 4 5 |
V <- data.frame(A_svd$v) colnames(V) <- colnames(A) rownames(V) <- paste0("topic",1:NCOL(U)) V |
1 2 3 4 |
## feature_1 feature_2 feature_3 ## topic1 -0.6948113 0.1875190 -0.6943154 ## topic2 -0.5277211 0.5229674 0.6693397 ## topic3 -0.4886182 -0.8314696 0.2644060 |
Observe the correlation between columns of u matrix:
1 2 |
cor(A_svd$u) |
1 2 3 4 |
## [,1] [,2] [,3] ## [1,] 1.0000000 -0.1230321288 0.1185052849 ## [2,] -0.1230321 1.0000000000 0.0007930161 ## [3,] 0.1185053 0.0007930161 1.0000000000 |
As an example, only to see the differences between features (from the original matrix) and topics, we can replicate the A matrix into the object B
, and then put zeros in feature_3:
1 2 3 |
B <- A B[,3] <- 0 |
Now we can calculate the SVD of matrix B
. As you can see below the zero values are visible only in the diagonal matrix, and in the v matrix, but not in the u matrix.
1 2 3 |
SVD_B <- svd(B) SVD_B |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
## $d ## [1] 267.38125 87.74893 0.00000 ## ## $u ## [,1] [,2] [,3] ## [1,] -0.2787209 0.5726378541 -0.172499723 ## [2,] -0.1256765 0.0008567322 -0.012160150 ## [3,] -0.1503636 0.1019880483 0.979107903 ## [4,] -0.3269347 -0.2162331156 -0.031176728 ## [5,] -0.3562184 0.2909811492 -0.051100385 ## [6,] -0.2364834 -0.5506977253 -0.009723958 ## [7,] -0.3794382 0.1819959212 -0.050268679 ## [8,] -0.4436522 0.0167429159 -0.052397739 ## [9,] -0.4229711 -0.4286735662 -0.035489988 ## [10,] -0.2729534 0.0991208907 -0.035126831 ## ## $v ## [,1] [,2] [,3] ## [1,] -0.7887458 0.6147195 0 ## [2,] -0.6147195 -0.7887458 0 ## [3,] 0.0000000 0.0000000 1 |
Now we can reconstruct the original B matrix using u,v and d matrices and computing B=u·d·vT.
1 2 3 |
B_ <- SVD_B$u %*% diag(SVD_B$d) %*% t(SVD_B$v) B_ |
1 2 3 4 5 6 7 8 9 10 11 |
## [,1] [,2] [,3] ## [1,] 89.66972 6.178627 0 ## [2,] 26.55087 20.597457 0 ## [3,] 37.21239 17.655675 0 ## [4,] 57.28534 68.702285 0 ## [5,] 90.82078 38.410372 0 ## [6,] 20.16819 76.984142 0 ## [7,] 89.83897 49.769924 0 ## [8,] 94.46753 71.761851 0 ## [9,] 66.07978 99.190609 0 ## [10,] 62.91140 38.003518 0 |
We can check now if the original B
matrix and the reconstructed B_
matrix are the same (or almost the same) by simply subtracting the matrices. You can see that two matrices match very well.
1 2 3 |
options(scipen=99) B - B_ |
1 2 3 4 5 6 7 8 9 10 11 |
## feature_1 feature_2 feature_3 ## register1 0.000000000000000000000 -0.000000000000056843419 0 ## register2 -0.000000000000003552714 -0.000000000000007105427 0 ## register3 -0.000000000000007105427 -0.000000000000007105427 0 ## register4 -0.000000000000021316282 -0.000000000000028421709 0 ## register5 0.000000000000000000000 0.000000000000000000000 0 ## register6 0.000000000000014210855 -0.000000000000028421709 0 ## register7 -0.000000000000014210855 -0.000000000000007105427 0 ## register8 -0.000000000000014210855 -0.000000000000014210855 0 ## register9 -0.000000000000014210855 0.000000000000000000000 0 ## register10 -0.000000000000014210855 -0.000000000000014210855 0 |
We can also calculate the RMSE of all the errors:
1 2 |
sqrt(sum((B - B_)^2)/NROW(B)) |
1 |
## [1] 0.00000000000002632367 |
Singular Vector Decomposition (image example)
Now let’s suppose we want to reduce the size of an image by compressing it. We can use SVD for this job!.
First of all, we need to install/load the package rsvd, next there is an example image of a tiger than we will use for our purpose, the dimensions are: 1600×1200 grayscaled (8 bit [0-255]/255) image.
1 2 3 4 5 |
# loading example image library(rsvd) data(tiger) dim(tiger) |
1 |
## [1] 1600 1200 |
Next step is to apply svd()
to the tiger image, so we get our three matrices: u, d, and v.
1 2 3 4 |
# Image compression using SVD SVD <- svd(tiger) str(SVD) |
1 2 3 4 |
## List of 3 ## $ d: num [1:1200] 528 119.8 86.1 77 66.1 ... ## $ u: num [1:1600, 1:1200] -0.0134 -0.0133 -0.0133 -0.0134 -0.0133 ... ## $ v: num [1:1200, 1:1200] -0.0357 -0.0357 -0.0355 -0.0356 -0.0357 ... |
Next, we can plot the importance of the topics, remember that are ordered by importance, so the left side of the plot is more important and right side is less important, in order to explain the original image:
1 2 3 |
# topics importance plot(SVD$d^2/sum(SVD$d^2), type="b", pch=16, xlab="topics", ylab="variance explained") |
For this example, we will select the most important 100 topics. Remember that topics are ordered, so it’s as easy as selecting the first 100 columns of the matrices.
So starting from 1200 topics, we will get only the first 100 and then we will reconstruct the image using the reduced matrices:
1 2 3 |
levels <- 100 # the number of topics we will use tiger.SVD = SVD$u[,1:levels] %*% diag(SVD$d[1:levels]) %*% t(SVD$v[,1:levels]) # reconstructing the image |
In essence, we are using only the orange part of the matrices for reconstructing matrix A. See below for a diagram.
Finally, we can display the original and the reduced image:
1 2 3 4 5 6 7 8 9 10 |
# Display orginal and reconstrucuted image # imager library plots faster than image() funtion # imager vignette: https://dahtah.github.io/imager/imager.html library(imager) IM_1 <- as.cimg(tiger[,1200:1]) IM_2 <- as.cimg(tiger.SVD[,1200:1]) plot(imappend(list(IM_1,IM_2),axis="x")) text(10,50,"original", pos=4, col="white") text(1610,50,"SVD reduced [100]", pos=4, col="white") |
And we can estimate below the size of the matrices that have been reduced by 9.52%
1 2 3 |
#Size of original matrices: print(object.size(SVD$u)+object.size(SVD$d)+object.size(SVD$v), units="Mb") |
1 |
## 25.6 Mb |
1 2 3 |
#Size of reduced matrices (SVD [100]): print(object.size(SVD$u[,1:levels])+object.size(SVD$d[1:levels])+object.size(SVD$u[,1:levels]), units="Mb") |
1 |
## 2.4 Mb |
Randomized SVD method (image example)
In the last part, we used SVD method for compressing images. We can also use the randomized SVD method (using the rsvd package) that is equivalent to classical SVD but faster. You can have a look at: PDF for benchmark details.
A simple speed comparative can be found below.
1 2 3 4 5 |
SVD_time <- system.time({ X <- svd(tiger) }) rSVD_time <- system.time({ X <- rsvd(tiger, k=100) }) # classical SVD time calculation: SVD_time |
1 2 |
## user system elapsed ## 8.984 0.050 9.041 |
1 2 3 |
# randomized SVD [100] time calculation: rSVD_time |
1 2 |
## user system elapsed ## 1.594 0.030 1.624 |
As you can see, in this case, the randomize SVD is much faster than the classical SVD. Next, we will see how rSVD works, note that you have to tell to the function the number of topics or the the low-rank decomposition (the k argument of the function) in advance.
1 2 3 4 |
# Image compression using randomized SVD rSVD <- rsvd(tiger, k=100) tiger.rSVD = rSVD$u %*% diag(rSVD$d) %*% t(rSVD$v) # reconstruct image |
Comparing the results of SVD and rSVD, it looks similar.
1 2 3 4 5 6 7 |
# Display orginal, and reconstrucuted images using SVD and rSVD IM_3 <- as.cimg(tiger.rSVD[,1200:1]) plot(imappend(list(IM_1,IM_2,IM_3),axis="x"), axes=FALSE) text(10,60,"original", pos=4, col="white", cex=0.8) text(1610,60,"SVD [100]", pos=4, col="white", cex=0.8) text(3210,60,"rSVD [100]", pos=4, col="white", cex=0.8) |
Calculating the error (RMSE) of the decomposition depending on the k value:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
rSVD2 <- list() tiger.rSVD_2 <- list() result <- list() for(k in c(2,5,10,20,50,100,200,500,1200)){ cat("\nk:",k,"/",1200) rSVD2[[k]] <- rsvd(tiger, k=k) size <- object.size(rSVD2[[k]]$u[,1:k])+object.size(rSVD2[[k]]$d[1:k])+object.size(rSVD2[[k]]$u[,1:k]) # object size ## tiger.rSVD_2[[k]] <- rSVD2[[k]]$u %*% diag(rSVD2[[k]]$d) %*% t(rSVD2[[k]]$v) # reconstructing image error <- sqrt(sum((tiger - tiger.rSVD_2[[k]])^2)/NROW(as.vector(tiger))) # calculating error result[[k]] <- data.frame(k=k, error=round(error,2), size_Mb=as.numeric(as.character(round(size/1024/1024,2)))) } |
1 2 3 4 5 6 7 8 9 10 |
## ## k: 2 / 1200 ## k: 5 / 1200 ## k: 10 / 1200 ## k: 20 / 1200 ## k: 50 / 1200 ## k: 100 / 1200 ## k: 200 / 1200 ## k: 500 / 1200 ## k: 1200 / 1200 |
1 2 |
result <- do.call("rbind",result) |
As we can see below, incrementing the k value (the low-rank decomposition) decreases the RMSE and increases the size of the matrices in memory:
1 2 3 4 |
par(mfrow=c(1,2)) plot(result$k, result$error, type="b", pch=16, ylab="error (RMSE)", xlab="k") plot(result$k, result$size_Mb, type="b", pch=16, ylab="matrices mem size (Mb)", xlab="k", col="blue") |
Below we can observe the different result images for different k values:
1 2 3 4 5 6 7 8 9 10 |
# Display orginal and reconstrucuted image IM <- list() for(k in result$k) IM[[k]] <- as.cimg(tiger.rSVD_2[[k]][,1200:1]) ## par(mfrow=c(3,3), mar=c(0,0,0,0)) for(i in 1:NROW(result)){ plot(IM[[result[i,"k"]]], axes=FALSE) text(10,60,paste0("k = ",result[i,"k"],". mem size = ",result[i,"size_Mb"], " Mb"), pos=4, col="white") } |
Non-negative Matrix Factorization (image example)
Non-negative matrix factorization (NMF) is another method of matrix decomposition. It decomposes a given matrix V into two matrices W and H:
V = W · H
The columns of W are called basis vectors and the columns of H represent the encodings, which are in one-to-one correspondence with the basis vectors of W.
One big difference over SVD is that all three matrices have no negative elements. SDV can have negative values, NMF is not possible to have negative values, non-negative values are easier interpretable than values with negative, so this is the main reason for the popularity of NMF over SVD. In counterpart, NMF will not able to recover the original matrix if does have negative numbers.
You can get more info about NMF package at PDF.
1 2 3 4 5 6 |
# Image compression using NMF r <- 100 library(NMF) NMFd_time <- system.time({ NMFd <- nmf(tiger, rank=r) }) tiger.NMF <- NMFd@fit@W %*% NMFd@fit@H # reconstructing image |
You must take into account that NMF is computationally intensive, and becomes infeasible for massive data. We can see below that the time consumed for processing our tiger image is huge, by 900 times slower than rSVD.
1 2 3 |
# NMF (k=100) time calculation: NMFd_time |
1 2 |
## user system elapsed ## 2157.356 19.122 2178.343 |
Below we can find the original image, and the three transformations used in this post with a low-rank decomposition of 100.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# Display orginal, and reconstrucuted images using SVD and rSVD IM_4 <- as.cimg(tiger.NMF[,1200:1]) par(mfrow=c(1,1), mar=c(0,0,0,0)) plot(imappend(list(IM_1,IM_2,IM_3,IM_4),axis="x"), axes=FALSE) text(10,60,"original", pos=4, col="white", cex=0.8) text(1610,60,"SVD [100]", pos=4, col="white", cex=0.8) text(3210,60,"rSVD [100]", pos=4, col="white", cex=0.8) text(4810,60,"NMF [100]", pos=4, col="white", cex=0.8) # calculating error text(1610,1000,paste0("SVD RMSE error = ",round(sqrt(sum((tiger - tiger.SVD)^2)/NROW(as.vector(tiger))),6)), pos=4, col="white", cex=0.8) text(3210,1000,paste0("rSVD RMSE error = ",round(sqrt(sum((tiger - tiger.rSVD)^2)/NROW(as.vector(tiger))),6)), pos=4, col="white", cex=0.8) text(4810,1000,paste0("NMF RMSE error = ",round(sqrt(sum((tiger - tiger.NMF)^2)/NROW(as.vector(tiger))),6)), pos=4, col="white", cex=0.8) # calculating size SVD_size <- object.size(SVD$u[,1:levels])+object.size(SVD$d[1:levels])+object.size(SVD$u[,1:levels]) rSVD_size <- object.size(rSVD$u[,1:levels])+object.size(rSVD$d[1:levels])+object.size(rSVD$u[,1:levels]) NMF_size <- object.size(NMFd@fit@W)+object.size(NMFd@fit@H) text(1610,1100,paste0("size = ",round(as.numeric(SVD_size)/1024/1024,2)," Mb"), pos=4, col="white", cex=0.8) text(3210,1100,paste0("size = ",round(as.numeric(rSVD_size)/1024/1024,2)," Mb"), pos=4, col="white", cex=0.8) text(4810,1100,paste0("size = ",round(as.numeric(NMF_size)/1024/1024,2)," Mb"), pos=4, col="white", cex=0.8) |
In this post we have seen the SVD, rSVD and NMF matrix decomposition methods manly applied to images. In PART 2 we will use these techniques in video surveillance for video background removal.
See you in the next post!!
Session Info:
1 2 3 |
------------------------------------ Total R execution time: 39 mins ------------------------------------ |
1 2 3 4 5 6 7 8 9 |
setting value version R version 3.4.3 (2017-11-30) os macOS High Sierra 10.13.5 system x86_64, darwin15.6.0 ui RStudio language (EN) collate es_ES.UTF-8 tz Europe/Madrid date 2018-08-18 |
1 2 |
------------------------------------ Packages: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[1] "bitops - 1.0-6 - 2013-08-17 - CRAN (R 3.4.0)" [2] "cluster - 2.0.6 - 2017-03-10 - CRAN (R 3.4.3)" [3] "imager - 0.40.2 - 2017-04-24 - CRAN (R 3.4.0)" [4] "magrittr - 1.5 - 2014-11-22 - CRAN (R 3.4.0)" [5] "NMF - 0.21.0 - 2018-03-06 - CRAN (R 3.4.4)" [6] "pkgmaker - 0.22 - 2014-05-14 - CRAN (R 3.4.0)" [7] "plyr - 1.8.4 - 2016-06-08 - CRAN (R 3.4.0)" [8] "RCurl - 1.95-4.11 - 2018-07-15 - cran (@1.95-4.)" [9] "registry - 0.3 - 2015-07-08 - CRAN (R 3.4.0)" [10] "reshape2 - 1.4.2 - 2016-10-22 - CRAN (R 3.4.0)" [11] "rngtools - 1.2.4 - 2014-03-06 - CRAN (R 3.4.0)" [12] "rsvd - 0.9 - 2017-12-08 - CRAN (R 3.4.3)" [13] "RWordPress - 0.2-3 - 2018-03-04 - Github (duncantl/RWordPress@ce6d2d6)" [14] "stringr - 1.2.0 - 2017-02-18 - CRAN (R 3.4.0)" [15] "XMLRPC - 0.3-1 - 2018-08-17 - Github (duncantl/XMLRPC@add9496)" |
Appendix, all the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
set.seed(0) A <- matrix(runif(30, min = 0, max = 100), nrow = 10) A <- data.frame(A) colnames(A) <- paste0("feature_", 1:NCOL(A)) rownames(A) <- paste0("register", 1:NROW(A)) A A_svd <- svd(A) A_svd U <- data.frame(A_svd$u) colnames(U) <- paste0("topic", 1:NCOL(U)) rownames(U) <- paste0("register", 1:NROW(A)) U D <- data.frame(diag(A_svd$d)) colnames(D) <- paste0("topic", 1:NCOL(U)) rownames(D) <- paste0("topic", 1:NCOL(U)) D V <- data.frame(A_svd$v) colnames(V) <- colnames(A) rownames(V) <- paste0("topic", 1:NCOL(U)) V cor(A_svd$u) B <- A B[, 3] <- 0 SVD_B <- svd(B) SVD_B B_ <- SVD_B$u %*% diag(SVD_B$d) %*% t(SVD_B$v) B_ options(scipen = 99) B - B_ sqrt(sum((B - B_)^2)/NROW(B)) # loading example image library(rsvd) data(tiger) dim(tiger) # Image compression using SVD SVD <- svd(tiger) str(SVD) # topics importance plot(SVD$d^2/sum(SVD$d^2), type = "b", pch = 16, xlab = "topics", ylab = "variance explained") levels <- 100 # the number of topics we will use tiger.SVD = SVD$u[, 1:levels] %*% diag(SVD$d[1:levels]) %*% t(SVD$v[, 1:levels]) # reconstructing the image # Display orginal and reconstrucuted image imager library plots faster than # image() funtion imager vignette: # https://dahtah.github.io/imager/imager.html library(imager) IM_1 <- as.cimg(tiger[, 1200:1]) IM_2 <- as.cimg(tiger.SVD[, 1200:1]) plot(imappend(list(IM_1, IM_2), axis = "x")) text(10, 50, "original", pos = 4, col = "white") text(1610, 50, "SVD reduced [100]", pos = 4, col = "white") # Size of original matrices: print(object.size(SVD$u) + object.size(SVD$d) + object.size(SVD$v), units = "Mb") # Size of reduced matrices (SVD [100]): print(object.size(SVD$u[, 1:levels]) + object.size(SVD$d[1:levels]) + object.size(SVD$u[, 1:levels]), units = "Mb") SVD_time <- system.time({ X <- svd(tiger) }) rSVD_time <- system.time({ X <- rsvd(tiger, k = 100) }) # classical SVD time calculation: SVD_time # randomized SVD [100] time calculation: rSVD_time # Image compression using randomized SVD rSVD <- rsvd(tiger, k = 100) tiger.rSVD = rSVD$u %*% diag(rSVD$d) %*% t(rSVD$v) # reconstruct image # Display orginal, and reconstrucuted images using SVD and rSVD IM_3 <- as.cimg(tiger.rSVD[, 1200:1]) plot(imappend(list(IM_1, IM_2, IM_3), axis = "x"), axes = FALSE) text(10, 60, "original", pos = 4, col = "white", cex = 0.8) text(1610, 60, "SVD [100]", pos = 4, col = "white", cex = 0.8) text(3210, 60, "rSVD [100]", pos = 4, col = "white", cex = 0.8) rSVD2 <- list() tiger.rSVD_2 <- list() result <- list() for (k in c(2, 5, 10, 20, 50, 100, 200, 500, 1200)) { cat("\nk:", k, "/", 1200) rSVD2[[k]] <- rsvd(tiger, k = k) size <- object.size(rSVD2[[k]]$u[, 1:k]) + object.size(rSVD2[[k]]$d[1:k]) + object.size(rSVD2[[k]]$u[, 1:k]) # object size ## tiger.rSVD_2[[k]] <- rSVD2[[k]]$u %*% diag(rSVD2[[k]]$d) %*% t(rSVD2[[k]]$v) # reconstructing image error <- sqrt(sum((tiger - tiger.rSVD_2[[k]])^2)/NROW(as.vector(tiger))) # calculating error result[[k]] <- data.frame(k = k, error = round(error, 2), size_Mb = as.numeric(as.character(round(size/1024/1024, 2)))) } result <- do.call("rbind", result) par(mfrow = c(1, 2)) plot(result$k, result$error, type = "b", pch = 16, ylab = "error (RMSE)", xlab = "k") plot(result$k, result$size_Mb, type = "b", pch = 16, ylab = "matrices mem size (Mb)", xlab = "k", col = "blue") # Display orginal and reconstrucuted image IM <- list() for (k in result$k) IM[[k]] <- as.cimg(tiger.rSVD_2[[k]][, 1200:1]) ## par(mfrow = c(3, 3), mar = c(0, 0, 0, 0)) for (i in 1:NROW(result)) { plot(IM[[result[i, "k"]]], axes = FALSE) text(10, 60, paste0("k = ", result[i, "k"], ". mem size = ", result[i, "size_Mb"], " Mb"), pos = 4, col = "white") } # Image compression using NMF r <- 100 library(NMF) NMFd_time <- system.time({ NMFd <- nmf(tiger, rank = r) }) tiger.NMF <- NMFd@fit@W %*% NMFd@fit@H # reconstructing image # NMF (k=100) time calculation: NMFd_time # Display orginal, and reconstrucuted images using SVD and rSVD IM_4 <- as.cimg(tiger.NMF[, 1200:1]) par(mfrow = c(1, 1), mar = c(0, 0, 0, 0)) plot(imappend(list(IM_1, IM_2, IM_3, IM_4), axis = "x"), axes = FALSE) text(10, 60, "original", pos = 4, col = "white", cex = 0.8) text(1610, 60, "SVD [100]", pos = 4, col = "white", cex = 0.8) text(3210, 60, "rSVD [100]", pos = 4, col = "white", cex = 0.8) text(4810, 60, "NMF [100]", pos = 4, col = "white", cex = 0.8) # calculating error text(1610, 1000, paste0("SVD RMSE error = ", round(sqrt(sum((tiger - tiger.SVD)^2)/NROW(as.vector(tiger))), 6)), pos = 4, col = "white", cex = 0.8) text(3210, 1000, paste0("rSVD RMSE error = ", round(sqrt(sum((tiger - tiger.rSVD)^2)/NROW(as.vector(tiger))), 6)), pos = 4, col = "white", cex = 0.8) text(4810, 1000, paste0("NMF RMSE error = ", round(sqrt(sum((tiger - tiger.NMF)^2)/NROW(as.vector(tiger))), 6)), pos = 4, col = "white", cex = 0.8) # calculating size SVD_size <- object.size(SVD$u[, 1:levels]) + object.size(SVD$d[1:levels]) + object.size(SVD$u[, 1:levels]) rSVD_size <- object.size(rSVD$u[, 1:levels]) + object.size(rSVD$d[1:levels]) + object.size(rSVD$u[, 1:levels]) NMF_size <- object.size(NMFd@fit@W) + object.size(NMFd@fit@H) text(1610, 1100, paste0("size = ", round(as.numeric(SVD_size)/1024/1024, 2), " Mb"), pos = 4, col = "white", cex = 0.8) text(3210, 1100, paste0("size = ", round(as.numeric(rSVD_size)/1024/1024, 2), " Mb"), pos = 4, col = "white", cex = 0.8) text(4810, 1100, paste0("size = ", round(as.numeric(NMF_size)/1024/1024, 2), " Mb"), pos = 4, col = "white", cex = 0.8) |