Processing math: 100%

Variable Selection For Generalized Canonical Correlation Analysis 2 (SGCCA2) SGCCA2 extends SGCCA to address the issue following the same design in all the dimensions.

sgcca2(A, C = rep(1 - diag(length(A)), max(ncomp)), c1 = rep(1,
  length(A)), ncomp = rep(2, length(A)), scheme = "centroid",
  scale = TRUE, init = "svd", bias = TRUE,
  tol = .Machine$double.eps, verbose = FALSE)

Arguments

A

A list that contains the J blocks of variables X1,X2,...,XJ.

C

A list that contains the design matrix that describes the relationships between blocks (default: complete design).

c1

Either a 1J vector or a max(ncomp)J matrix encoding the L1 constraints applied to the outer weight vectors. Elements of c1 vary between 1/sqrt(pj) and 1 (larger values of c1 correspond to less penalization). If c1 is a vector, L1-penalties are the same for all the weights corresponding to the same block but different components: forallh,|aj,h|L1c1[j]pj, with pj the number of variables of Xj. If c1 is a matrix, each row h defines the constraints applied to the weights corresponding to components h: forallh,|aj,h|L1c1[h,j]pj.

ncomp

A 1J vector that contains the numbers of components for each block (default: rep(1, length(A)), which means one component per block).

scheme

Either "horst", "factorial" or "centroid" (Default: "centroid").

scale

If scale = TRUE, each block is standardized to zero means and unit variances and then divided by the square root of its number of variables (default: TRUE).

init

Mode of initialization use in the SGCCA algorithm, either by Singular Value Decompostion ("svd") or random ("random") (default : "svd").

bias

A logical value for biaised or unbiaised estimator of the var/cov.

tol

Stopping value for convergence.

verbose

Will report progress while computing if verbose = TRUE (default: FALSE).

Value

A list of class sgcca with the following elements:

Y

A list of J elements. Each element of Y is a matrix that contains the SGCCA components for each block.

a

A list of J elements. Each element of a is a matrix that contains the outer weight vectors for each block.

astar

A list of J elements. Each element of astar is a matrix defined as Y[[j]][, h] = A[[j]]%*%astar[[j]][, h]

C

A design matrix that describes the relationships between blocks (user specified).

scheme

The scheme chosen by the user (user specified).

c1

A vector or matrix that contains the value of c1 applied to each block Xj, j=1,,J and each dimension (user specified).

ncomp

A 1×J vector that contains the number of components for each block (user specified).

crit

A vector that contains the values of the objective function at each iterations.

AVE

Indicators of model quality based on the Average Variance Explained (AVE): AVE(for one block), AVE(outer model), AVE(inner model).

References

Tenenhaus, A., Philippe, C., Guillemot, V., Le Cao, K. A., Grill, J., and Frouin, V. , "Variable selection for generalized canonical correlation analysis.," Biostatistics, vol. 15, no. 3, pp. 569-583, 2014.

Examples

############# # Example 1 # ############# if (FALSE) { # Download the dataset's package at http://biodev.cea.fr/sgcca/. # --> gliomaData_0.4.tar.gz require(gliomaData) data(ge_cgh_locIGR) A <- ge_cgh_locIGR$multiblocks Loc <- factor(ge_cgh_locIGR$y) ; levels(Loc) <- colnames(ge_cgh_locIGR$multiblocks$y) C1 <- matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3) C2 <- matrix(c(0, 1, 1, 1, 0, 1, 1, 1, 0), 3, 3) C <- list(C1, C2) tau = c(1, 1, 0) # rgcca algorithm using the dual formulation for X1 and X2 # and the dual formulation for X3 A[[3]] = A[[3]][, -3] # sgcca algorithm result.sgcca = sgcca2(A, C, c1 = c(.071,.2, 1), ncomp = c(2, 2, 1), scheme = "centroid", verbose = TRUE) ############################ # plot(y1, y2) for (SGCCA) # ############################ layout(t(1:2)) plot(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[2]][, 1], col = "white", xlab = "Y1 (GE)", ylab = "Y2 (CGH)", main = "Factorial plan of SGCCA") text(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[2]][, 1], Loc, col = as.numeric(Loc), cex = .6) plot(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[1]][, 2], col = "white", xlab = "Y1 (GE)", ylab = "Y2 (GE) Dim 2.", main = "Factorial plan of SGCCA") text(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[1]][, 2], Loc, col = as.numeric(Loc), cex = .6) # sgcca algorithm with multiple components and different L1 penalties for each components # (-> c1 is a matrix) init = "random" result.sgcca = sgcca2(A, C, c1 = matrix(c(.071,.2, 1, 0.06, 0.15, 1), nrow = 2, byrow = TRUE), ncomp = c(2, 2, 1), scheme = "factorial", scale = TRUE, bias = TRUE, init = init, verbose = TRUE) # number of non zero elements per dimension apply(result.sgcca$a[[1]], 2, function(x) sum(x!=0)) #(-> 145 non zero elements for a11 and 107 non zero elements for a12) apply(result.sgcca$a[[2]], 2, function(x) sum(x!=0)) #(-> 85 non zero elements for a21 and 52 non zero elements for a22) init = "svd" result.sgcca = sgcca2(A, C, c1 = matrix(c(.071,.2, 1, 0.06, 0.15, 1), nrow = 2, byrow = TRUE), ncomp = c(2, 2, 1), scheme = "factorial", scale = TRUE, bias = TRUE, init = init, verbose = TRUE)}