sgcca2.Rd
Variable Selection For Generalized Canonical Correlation Analysis 2 (SGCCA2) SGCCA2 extends SGCCA to address the issue following the same design in all the dimensions.
sgcca2(A, C = rep(1 - diag(length(A)), max(ncomp)), c1 = rep(1, length(A)), ncomp = rep(2, length(A)), scheme = "centroid", scale = TRUE, init = "svd", bias = TRUE, tol = .Machine$double.eps, verbose = FALSE)
A | A list that contains the J blocks of variables X1,X2,...,XJ. |
---|---|
C | A list that contains the design matrix that describes the relationships between blocks (default: complete design). |
c1 | Either a 1∗J vector or a max(ncomp)∗J matrix encoding the L1 constraints applied to the outer weight vectors. Elements of c1 vary between 1/sqrt(pj) and 1 (larger values of c1 correspond to less penalization). If c1 is a vector, L1-penalties are the same for all the weights corresponding to the same block but different components: forallh,|aj,h|L1≤c1[j]√pj, with pj the number of variables of Xj. If c1 is a matrix, each row h defines the constraints applied to the weights corresponding to components h: forallh,|aj,h|L1≤c1[h,j]√pj. |
ncomp | A 1∗J vector that contains the numbers of components for each block (default: rep(1, length(A)), which means one component per block). |
scheme | Either "horst", "factorial" or "centroid" (Default: "centroid"). |
scale | If scale = TRUE, each block is standardized to zero means and unit variances and then divided by the square root of its number of variables (default: TRUE). |
init | Mode of initialization use in the SGCCA algorithm, either by Singular Value Decompostion ("svd") or random ("random") (default : "svd"). |
bias | A logical value for biaised or unbiaised estimator of the var/cov. |
tol | Stopping value for convergence. |
verbose | Will report progress while computing if verbose = TRUE (default: FALSE). |
A list of class sgcca with the following elements:
A list of J elements. Each element of Y is a matrix that contains the SGCCA components for each block.
A list of J elements. Each element of a is a matrix that contains the outer weight vectors for each block.
A list of J elements. Each element of astar is a matrix defined as Y[[j]][, h] = A[[j]]%*%astar[[j]][, h]
A design matrix that describes the relationships between blocks (user specified).
The scheme chosen by the user (user specified).
A vector or matrix that contains the value of c1 applied to each block Xj, j=1,…,J and each dimension (user specified).
A 1×J vector that contains the number of components for each block (user specified).
A vector that contains the values of the objective function at each iterations.
Indicators of model quality based on the Average Variance Explained (AVE): AVE(for one block), AVE(outer model), AVE(inner model).
Tenenhaus, A., Philippe, C., Guillemot, V., Le Cao, K. A., Grill, J., and Frouin, V. , "Variable selection for generalized canonical correlation analysis.," Biostatistics, vol. 15, no. 3, pp. 569-583, 2014.
############# # Example 1 # ############# if (FALSE) { # Download the dataset's package at http://biodev.cea.fr/sgcca/. # --> gliomaData_0.4.tar.gz require(gliomaData) data(ge_cgh_locIGR) A <- ge_cgh_locIGR$multiblocks Loc <- factor(ge_cgh_locIGR$y) ; levels(Loc) <- colnames(ge_cgh_locIGR$multiblocks$y) C1 <- matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3) C2 <- matrix(c(0, 1, 1, 1, 0, 1, 1, 1, 0), 3, 3) C <- list(C1, C2) tau = c(1, 1, 0) # rgcca algorithm using the dual formulation for X1 and X2 # and the dual formulation for X3 A[[3]] = A[[3]][, -3] # sgcca algorithm result.sgcca = sgcca2(A, C, c1 = c(.071,.2, 1), ncomp = c(2, 2, 1), scheme = "centroid", verbose = TRUE) ############################ # plot(y1, y2) for (SGCCA) # ############################ layout(t(1:2)) plot(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[2]][, 1], col = "white", xlab = "Y1 (GE)", ylab = "Y2 (CGH)", main = "Factorial plan of SGCCA") text(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[2]][, 1], Loc, col = as.numeric(Loc), cex = .6) plot(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[1]][, 2], col = "white", xlab = "Y1 (GE)", ylab = "Y2 (GE) Dim 2.", main = "Factorial plan of SGCCA") text(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[1]][, 2], Loc, col = as.numeric(Loc), cex = .6) # sgcca algorithm with multiple components and different L1 penalties for each components # (-> c1 is a matrix) init = "random" result.sgcca = sgcca2(A, C, c1 = matrix(c(.071,.2, 1, 0.06, 0.15, 1), nrow = 2, byrow = TRUE), ncomp = c(2, 2, 1), scheme = "factorial", scale = TRUE, bias = TRUE, init = init, verbose = TRUE) # number of non zero elements per dimension apply(result.sgcca$a[[1]], 2, function(x) sum(x!=0)) #(-> 145 non zero elements for a11 and 107 non zero elements for a12) apply(result.sgcca$a[[2]], 2, function(x) sum(x!=0)) #(-> 85 non zero elements for a21 and 52 non zero elements for a22) init = "svd" result.sgcca = sgcca2(A, C, c1 = matrix(c(.071,.2, 1, 0.06, 0.15, 1), nrow = 2, byrow = TRUE), ncomp = c(2, 2, 1), scheme = "factorial", scale = TRUE, bias = TRUE, init = init, verbose = TRUE)}