Title: | Biclustering via Sparse Singular Value Decomposition Incorporating Stability Selection |
---|---|
Description: | The main function s4vd() performs a biclustering via sparse singular value decomposition with a nested stability selection. The results is an biclust object and thus all methods of the biclust package can be applied. |
Authors: | Martin Sill, Sebastian Kaiser |
Maintainer: | Martin Sill <[email protected]> |
License: | GPL-2 |
Version: | 1.1-1 |
Built: | 2025-03-05 03:47:03 UTC |
Source: | https://github.com/mwsill/s4vd |
Heatmap function to plot biclustering results. Overlapping biclusters are indicated by colored rectangles.
BCheatmap(X, res, cexR = 1.5, cexC = 1.25, axisR = FALSE, axisC= TRUE, heatcols = maPalette(low="blue",mid="white",high="red", k=50), clustercols = c(1:5), allrows = FALSE, allcolumns = TRUE)
BCheatmap(X, res, cexR = 1.5, cexC = 1.25, axisR = FALSE, axisC= TRUE, heatcols = maPalette(low="blue",mid="white",high="red", k=50), clustercols = c(1:5), allrows = FALSE, allcolumns = TRUE)
X |
the data matrix |
res |
the biclustering result |
cexR |
relativ font size of the row labels |
cexC |
relativ font size of the column labels |
axisR |
if TRUE the row labels will be plotted |
axisC |
if TRUE the column labels will be plotted |
heatcols |
a character vector specifing the heatcolors |
clustercols |
a character vector specifing the colors of the rectangles that indicate the rows and columns that belong to a bicluster |
allrows |
if FALSE only the rows assigned to any bicluster will be plotted |
allcolumns |
if FALSE only the columns assigned to any bicluster will be plotted |
Martin Sill \ [email protected]
#lung cancer data set Bhattacharjee et al. 2001 data(lung200) set.seed(12) res1 <- biclust(lung200,method=BCs4vd(),pcerv=.5,pceru=0.01,ss.thr=c(0.6,0.65) ,start.iter=3,size=0.632,cols.nc=TRUE,steps=100,pointwise=TRUE ,merr=0.0001,iter=100,nbiclust=10,col.overlap=FALSE) BCheatmap(lung200,res1)
#lung cancer data set Bhattacharjee et al. 2001 data(lung200) set.seed(12) res1 <- biclust(lung200,method=BCs4vd(),pcerv=.5,pceru=0.01,ss.thr=c(0.6,0.65) ,start.iter=3,size=0.632,cols.nc=TRUE,steps=100,pointwise=TRUE ,merr=0.0001,iter=100,nbiclust=10,col.overlap=FALSE) BCheatmap(lung200,res1)
The function performs biclustering of the data matrix by sparse singular value decomposition with nested stability selection.
## S4 method for signature 'matrix,BCs4vd' biclust(x, method=BCs4vd(), steps = 100, pcerv = 0.05, pceru = 0.05, ss.thr = c(0.6,0.65), size = 0.632, gamm = 0, iter = 100, nbiclust = 10, merr = 10^(-4), cols.nc=FALSE, rows.nc=TRUE, row.overlap=TRUE, col.overlap=TRUE, row.min=4, col.min=4, pointwise=TRUE, start.iter=0, savepath=FALSE)
## S4 method for signature 'matrix,BCs4vd' biclust(x, method=BCs4vd(), steps = 100, pcerv = 0.05, pceru = 0.05, ss.thr = c(0.6,0.65), size = 0.632, gamm = 0, iter = 100, nbiclust = 10, merr = 10^(-4), cols.nc=FALSE, rows.nc=TRUE, row.overlap=TRUE, col.overlap=TRUE, row.min=4, col.min=4, pointwise=TRUE, start.iter=0, savepath=FALSE)
x |
The matrix to be clustered. |
method |
calls the BCs4vd() method |
steps |
Number of subsamples used to perform the stability selection. |
pcerv |
Per comparsion wise error rate to control the number of falsely selected right singular vector coefficients (columns/samples). |
pceru |
Per comparsion wise error rate to control the number of falsely selected left singular vector coefficients (rows/genes). |
ss.thr |
Range of the cutoff threshold (relative selection frequency) for the stability selection. |
size |
Size of the subsamples used to perform the stability selection. |
gamm |
Weight parameter for the adaptive LASSO, nonnegative constant (default = 0, LASSO). |
iter |
Maximal number of iterations to fit a single bicluster. |
nbiclust |
Maximal number of biclusters. |
merr |
Threshold to decide convergence. |
cols.nc |
Allow for negative correlation of columns (samples) over rows (genes). |
rows.nc |
Allow for negative correlation of rows (genes) over columns (samples). |
row.overlap |
Allow rows to overlap between biclusters. |
col.overlap |
Allow columns to overlap between biclusters. |
row.min |
Minimal number of rows. |
col.min |
Minimal number of columns. |
pointwise |
If TRUE performs a fast pointwise stability selection instead of calculating the complete stability path. |
start.iter |
Number of starting iterations in which the algorithm is not allowed to converge. |
savepath |
Saves the stability path in order plot the path with the stabpathplot function. Note that pointwise needs to be TRUE to save the path. For extreme high dimensional data sets (e.g. the lung cancer example) the resulting biclust object may exceed the available memory. |
Returns an object of class Biclust
.
Martin Sill \ [email protected]
Martin Sill, Sebastian Kaiser, Axel Benner and Annette Kopp-Schneider "Robust biclustering by sparse singular value decomposition incorporating stability selection", Bioinformatics, 2011
# example data set according to the simulation study in Lee et al. 2010 # generate artifical data set and a correspondig biclust object u <- c(10,9,8,7,6,5,4,3,rep(2,17),rep(0,75)) v <- c(10,-10,8,-8,5,-5,rep(3,5),rep(-3,5),rep(0,34)) u <- u/sqrt(sum(u^2)) v <- v/sqrt(sum(v^2)) d <- 50 set.seed(1) X <- (d*u%*%t(v)) + matrix(rnorm(100*50),100,50) params <- info <- list() RowxNumber <- matrix(rep(FALSE,100),ncol=1) NumberxCol <- matrix(rep(FALSE,50),nrow=1) RowxNumber[u!=0,1] <- TRUE NumberxCol[1,v!=0] <- TRUE Number <- 1 ressim <- BiclustResult(params,RowxNumber,NumberxCol,Number,info) #perform s4vd biclustering ress4vd <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=FALSE,nbiclust=1,savepath=TRUE) #perform s4vd biclustering with fast pointwise stability selection ress4vdpw <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=TRUE,nbiclust=1) #perform ssvd biclustering resssvd <- biclust(X,BCssvd,K=1) #agreement of the results with the simulated bicluster jaccardind(ressim,ress4vd) jaccardind(ressim,ress4vdpw) jaccardind(ressim,resssvd)
# example data set according to the simulation study in Lee et al. 2010 # generate artifical data set and a correspondig biclust object u <- c(10,9,8,7,6,5,4,3,rep(2,17),rep(0,75)) v <- c(10,-10,8,-8,5,-5,rep(3,5),rep(-3,5),rep(0,34)) u <- u/sqrt(sum(u^2)) v <- v/sqrt(sum(v^2)) d <- 50 set.seed(1) X <- (d*u%*%t(v)) + matrix(rnorm(100*50),100,50) params <- info <- list() RowxNumber <- matrix(rep(FALSE,100),ncol=1) NumberxCol <- matrix(rep(FALSE,50),nrow=1) RowxNumber[u!=0,1] <- TRUE NumberxCol[1,v!=0] <- TRUE Number <- 1 ressim <- BiclustResult(params,RowxNumber,NumberxCol,Number,info) #perform s4vd biclustering ress4vd <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=FALSE,nbiclust=1,savepath=TRUE) #perform s4vd biclustering with fast pointwise stability selection ress4vdpw <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=TRUE,nbiclust=1) #perform ssvd biclustering resssvd <- biclust(X,BCssvd,K=1) #agreement of the results with the simulated bicluster jaccardind(ressim,ress4vd) jaccardind(ressim,ress4vdpw) jaccardind(ressim,resssvd)
The function performs a biclustering of the data matrix by sparse singular value decomposition.
## S4 method for signature 'matrix,BCssvd' biclust(x,method=BCssvd(), K=10, threu = 1, threv = 1, gamu = 0, gamv =0, u0 = svd(X)$u[,1], v0 = svd(X)$v[,1], merr = 10^(-4), niter = 100)
## S4 method for signature 'matrix,BCssvd' biclust(x,method=BCssvd(), K=10, threu = 1, threv = 1, gamu = 0, gamv =0, u0 = svd(X)$u[,1], v0 = svd(X)$v[,1], merr = 10^(-4), niter = 100)
x |
the matrix to be clustered |
method |
calls the BCssvd() method |
K |
number of SSVD-layers |
threu |
type of penalty (thresholding rule) for the left singular vector, 1 = (Adaptive) LASSO (default) 2 = hard thresholding |
threv |
type of penalty (thresholding rule) for the right singular vector, 1 = (Adaptive) LASSO (default) 2 = hard thresholding |
gamu |
weight parameter in Adaptive LASSO for the left singular vector, nonnegative constant (default = 0, LASSO) |
gamv |
weight parameter in Adaptive LASSO for the right singular vector, nonnegative constant (default = 0, LASSO) |
u0 |
initial left singular vector |
v0 |
initial right singular vector |
merr |
threshold to decide convergence |
niter |
maximum number of iterations |
Returns an Biclust object.
Adaptation of original code from Mihee Lee by Martin Sill \ [email protected]
Mihee Lee, Haipeng Shen, Jianhua Z. Huang and J. S. Marron1 "Biclustering via Sparse Singular Value Decomposition", Biometrics, 2010
# example data set according to the simulation study in Lee et al. 2010 # generate artifical data set and a correspondig biclust object u <- c(10,9,8,7,6,5,4,3,rep(2,17),rep(0,75)) v <- c(10,-10,8,-8,5,-5,rep(3,5),rep(-3,5),rep(0,34)) u <- u/sqrt(sum(u^2)) v <- v/sqrt(sum(v^2)) d <- 50 set.seed(1) X <- (d*u%*%t(v)) + matrix(rnorm(100*50),100,50) params <- info <- list() RowxNumber <- matrix(rep(FALSE,100),ncol=1) NumberxCol <- matrix(rep(FALSE,50),nrow=1) RowxNumber[u!=0,1] <- TRUE NumberxCol[1,v!=0] <- TRUE Number <- 1 ressim <- BiclustResult(params,RowxNumber,NumberxCol,Number,info) #perform s4vd biclustering ress4vd <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=FALSE,nbiclust=1,savepath=TRUE) #perform s4vd biclustering with fast pointwise stability selection ress4vdpw <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=TRUE,nbiclust=1) #perform ssvd biclustering resssvd <- biclust(X,BCssvd,K=1) #agreement of the results with the simulated bicluster jaccardind(ressim,ress4vd) jaccardind(ressim,ress4vdpw) jaccardind(ressim,resssvd)
# example data set according to the simulation study in Lee et al. 2010 # generate artifical data set and a correspondig biclust object u <- c(10,9,8,7,6,5,4,3,rep(2,17),rep(0,75)) v <- c(10,-10,8,-8,5,-5,rep(3,5),rep(-3,5),rep(0,34)) u <- u/sqrt(sum(u^2)) v <- v/sqrt(sum(v^2)) d <- 50 set.seed(1) X <- (d*u%*%t(v)) + matrix(rnorm(100*50),100,50) params <- info <- list() RowxNumber <- matrix(rep(FALSE,100),ncol=1) NumberxCol <- matrix(rep(FALSE,50),nrow=1) RowxNumber[u!=0,1] <- TRUE NumberxCol[1,v!=0] <- TRUE Number <- 1 ressim <- BiclustResult(params,RowxNumber,NumberxCol,Number,info) #perform s4vd biclustering ress4vd <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=FALSE,nbiclust=1,savepath=TRUE) #perform s4vd biclustering with fast pointwise stability selection ress4vdpw <- biclust(X,method=BCs4vd,pcerv=0.5,pceru=0.5,pointwise=TRUE,nbiclust=1) #perform ssvd biclustering resssvd <- biclust(X,BCssvd,K=1) #agreement of the results with the simulated bicluster jaccardind(ressim,ress4vd) jaccardind(ressim,ress4vdpw) jaccardind(ressim,resssvd)
The function calculates the pairwise jaccard coefficients between the biclusters of two biclustering results
jaccardmat(res1,res2)
jaccardmat(res1,res2)
res1 |
A biclustering result as an object of class Biclust |
res2 |
A biclustering result as an object of class Biclust |
The result is matrix of pairwise jaccard coefficents between the biclusters of res1 and res2.
Martin Sill \ [email protected]
jaccardind
#lung cancer data set Bhattacharjee et al. 2001 data(lung200) set.seed(12) res1 <- biclust(lung200,method=BCs4vd(),pcerv=.5,pceru=0.01,ss.thr=c(0.6,0.65) ,start.iter=3,size=0.632,cols.nc=TRUE,steps=100,pointwise=TRUE ,merr=0.0001,iter=100,nbiclust=10,col.overlap=FALSE) res2 <- biclust(lung200,method=BCPlaid()) jaccardmat(res1,res2)
#lung cancer data set Bhattacharjee et al. 2001 data(lung200) set.seed(12) res1 <- biclust(lung200,method=BCs4vd(),pcerv=.5,pceru=0.01,ss.thr=c(0.6,0.65) ,start.iter=3,size=0.632,cols.nc=TRUE,steps=100,pointwise=TRUE ,merr=0.0001,iter=100,nbiclust=10,col.overlap=FALSE) res2 <- biclust(lung200,method=BCPlaid()) jaccardmat(res1,res2)
Lung cancer gene expression data set
data(lung200)
data(lung200)
This data set contain 56 samples and gene expression values of a subset of 200 genes showing the highest variance of the 12 625 genes measured using the Affymetrix 95av2 GeneChip. The samples comprise 20 pulmonary carcinoid samples (Carcinoid), 13 colon cancer metastasis samples (Colon), 17 normal lung samples (Normal) and 6 small cell lung carcinoma samples (SmallCell). The rownames are affymetrix gene ids.
http://www.pnas.org/content/98/24/13790/suppl/DC1
Bhattacharjee, A., Richards, W. G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C.,<br> Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E. J., Lander,<br> E. S., Wong, W., Johnson, B. E., Golub, T. R., Sugarbaker, D. J., and Meyerson,<br> M. (2001). Classification of human lung carcinomas by mRNA expression profiling<br> reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy<br> of Sciences of the United States of America.
The function plots the stability path of a S4VD result
stabpath(res,number)
stabpath(res,number)
res |
the S4VD result |
number |
the bicluster for which the stability path shall be plotted |
Plots the stability path for the rows and the columns regarding the last iteration of the s4vd algorithm. Note that if the pointwise error control was used or if savepath=FALSE the final selection probabilities for the rows and the columns will be plotted.
Martin Sill \ [email protected]
# example data set according to the simulation study in Lee et al. 2010 # generate artifical data set and a correspondig biclust object u <- c(10,9,8,7,6,5,4,3,rep(2,17),rep(0,75)) v <- c(10,-10,8,-8,5,-5,rep(3,5),rep(-3,5),rep(0,34)) u <- u/sqrt(sum(u^2)) v <- v/sqrt(sum(v^2)) d <- 50 set.seed(1) X <- (d*u%*%t(v)) + matrix(rnorm(100*50),100,50) params <- info <- list() RowxNumber <- matrix(rep(FALSE,100),ncol=1) NumberxCol <- matrix(rep(FALSE,50),nrow=1) RowxNumber[u!=0,1] <- TRUE NumberxCol[1,v!=0] <- TRUE Number <- 1 ressim <- BiclustResult(params,RowxNumber,NumberxCol,Number,info) #perform s4vd biclustering ress4vd <- biclust(X,method=BCs4vd,pcerv=0.5, pceru=0.5,ss.thr=c(0.6,0.65),steps=500, pointwise=FALSE,nbiclust=1,savepath=TRUE) #perform s4vd biclustering with fast pointwise stability selection ress4vdpw <- biclust(X,method=BCs4vd,pcerv=0.5, pceru=0.5,ss.thr=c(0.6,0.65),steps=500, pointwise=TRUE,nbiclust=1) #stability paths stabpath(ress4vd,1) #selection probabilitys for the pointwise stability selection stabpath(ress4vdpw,1)
# example data set according to the simulation study in Lee et al. 2010 # generate artifical data set and a correspondig biclust object u <- c(10,9,8,7,6,5,4,3,rep(2,17),rep(0,75)) v <- c(10,-10,8,-8,5,-5,rep(3,5),rep(-3,5),rep(0,34)) u <- u/sqrt(sum(u^2)) v <- v/sqrt(sum(v^2)) d <- 50 set.seed(1) X <- (d*u%*%t(v)) + matrix(rnorm(100*50),100,50) params <- info <- list() RowxNumber <- matrix(rep(FALSE,100),ncol=1) NumberxCol <- matrix(rep(FALSE,50),nrow=1) RowxNumber[u!=0,1] <- TRUE NumberxCol[1,v!=0] <- TRUE Number <- 1 ressim <- BiclustResult(params,RowxNumber,NumberxCol,Number,info) #perform s4vd biclustering ress4vd <- biclust(X,method=BCs4vd,pcerv=0.5, pceru=0.5,ss.thr=c(0.6,0.65),steps=500, pointwise=FALSE,nbiclust=1,savepath=TRUE) #perform s4vd biclustering with fast pointwise stability selection ress4vdpw <- biclust(X,method=BCs4vd,pcerv=0.5, pceru=0.5,ss.thr=c(0.6,0.65),steps=500, pointwise=TRUE,nbiclust=1) #stability paths stabpath(ress4vd,1) #selection probabilitys for the pointwise stability selection stabpath(ress4vdpw,1)