Variable Selection For Generalized Canonical Correlation Analysis 2 (SGCCA2) SGCCA2 extends SGCCA to address the issue following the same design in all the dimensions.

sgcca2(A, C = rep(1 - diag(length(A)), max(ncomp)), c1 = rep(1,
  length(A)), ncomp = rep(2, length(A)), scheme = "centroid",
  scale = TRUE, init = "svd", bias = TRUE,
  tol = .Machine$double.eps, verbose = FALSE)

Arguments

A	A list that contains the $J$ blocks of variables $X_1, X_2, ..., X_J$.
C	A list that contains the design matrix that describes the relationships between blocks (default: complete design).
c1	Either a $1J$ vector or a $max(ncomp) J$ matrix encoding the L1 constraints applied to the outer weight vectors. Elements of c1 vary between $1/sqrt(p_j)$ and 1 (larger values of c1 correspond to less penalization). If c1 is a vector, L1-penalties are the same for all the weights corresponding to the same block but different components: $$for all h, \|a_{j,h}\|_{L_1} \le c_1[j] \sqrt{p_j},$$ with $p_j$ the number of variables of $X_j$. If c1 is a matrix, each row $h$ defines the constraints applied to the weights corresponding to components $h$: $$for all h, \|a_{j,h}\|_{L_1} \le c_1[h,j] \sqrt{p_j}.$$
ncomp	A $1*J$ vector that contains the numbers of components for each block (default: rep(1, length(A)), which means one component per block).
scheme	Either "horst", "factorial" or "centroid" (Default: "centroid").
scale	If scale = TRUE, each block is standardized to zero means and unit variances and then divided by the square root of its number of variables (default: TRUE).
init	Mode of initialization use in the SGCCA algorithm, either by Singular Value Decompostion ("svd") or random ("random") (default : "svd").
bias	A logical value for biaised or unbiaised estimator of the var/cov.
tol	Stopping value for convergence.
verbose	Will report progress while computing if verbose = TRUE (default: FALSE).

Value

A list of class sgcca with the following elements:

A list of $J$ elements. Each element of Y is a matrix that contains the SGCCA components for each block.

A list of $J$ elements. Each element of a is a matrix that contains the outer weight vectors for each block.

astar

A list of $J$ elements. Each element of astar is a matrix defined as Y[[j]][, h] = A[[j]]%*%astar[[j]][, h]

A design matrix that describes the relationships between blocks (user specified).

scheme

The scheme chosen by the user (user specified).

A vector or matrix that contains the value of c1 applied to each block $\mathbf{X}_j$, $ j=1, \ldots, J$ and each dimension (user specified).

ncomp

A $1 \times J$ vector that contains the number of components for each block (user specified).

crit

A vector that contains the values of the objective function at each iterations.

AVE

Indicators of model quality based on the Average Variance Explained (AVE): AVE(for one block), AVE(outer model), AVE(inner model).

References

Tenenhaus, A., Philippe, C., Guillemot, V., Le Cao, K. A., Grill, J., and Frouin, V. , "Variable selection for generalized canonical correlation analysis.," Biostatistics, vol. 15, no. 3, pp. 569-583, 2014.

Examples


#############
# Example 1 #
#############
if (FALSE) {
# Download the dataset's package at http://biodev.cea.fr/sgcca/.
# --> gliomaData_0.4.tar.gz

require(gliomaData)
data(ge_cgh_locIGR)

A <- ge_cgh_locIGR$multiblocks
Loc <- factor(ge_cgh_locIGR$y) ; levels(Loc) <- colnames(ge_cgh_locIGR$multiblocks$y)
C1 <-  matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3)
C2 <-  matrix(c(0, 1, 1, 1, 0, 1, 1, 1, 0), 3, 3)
C <- list(C1, C2)
tau = c(1, 1, 0)

# rgcca algorithm using the dual formulation for X1 and X2
# and the dual formulation for X3
A[[3]] = A[[3]][, -3]
# sgcca algorithm
result.sgcca = sgcca2(A, C, c1 = c(.071,.2, 1), ncomp = c(2, 2, 1),
                     scheme = "centroid", verbose = TRUE)

############################
# plot(y1, y2) for (SGCCA) #
############################
layout(t(1:2))
plot(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[2]][, 1], col = "white", xlab = "Y1 (GE)",
     ylab = "Y2 (CGH)", main = "Factorial plan of SGCCA")
text(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[2]][, 1], Loc, col = as.numeric(Loc), cex = .6)

plot(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[1]][, 2], col = "white", xlab = "Y1 (GE)",
     ylab = "Y2 (GE) Dim 2.", main = "Factorial plan of SGCCA")
text(result.sgcca$Y[[1]][, 1], result.sgcca$Y[[1]][, 2], Loc, col = as.numeric(Loc), cex = .6)

# sgcca algorithm with multiple components and different L1 penalties for each components
# (-> c1 is a matrix)
init = "random"
result.sgcca = sgcca2(A, C, c1 = matrix(c(.071,.2, 1, 0.06, 0.15, 1), nrow = 2, byrow = TRUE),
                     ncomp = c(2, 2, 1), scheme = "factorial", scale = TRUE, bias = TRUE,
                     init = init, verbose = TRUE)
# number of non zero elements per dimension
apply(result.sgcca$a[[1]], 2, function(x) sum(x!=0))
     #(-> 145 non zero elements for a11 and 107 non zero elements for a12)
apply(result.sgcca$a[[2]], 2, function(x) sum(x!=0))
     #(-> 85 non zero elements for a21 and 52 non zero elements for a22)
init = "svd"
result.sgcca = sgcca2(A, C, c1 = matrix(c(.071,.2, 1, 0.06, 0.15, 1), nrow = 2, byrow = TRUE),
                     ncomp = c(2, 2, 1), scheme = "factorial", scale = TRUE, bias = TRUE,
                     init = init, verbose = TRUE)}

A	A list that contains the \(J\) blocks of variables \(X_1, X_2, ..., X_J\).
C	A list that contains the design matrix that describes the relationships between blocks (default: complete design).
c1	Either a \(1J\) vector or a \(max(ncomp) J\) matrix encoding the L1 constraints applied to the outer weight vectors. Elements of c1 vary between \(1/sqrt(p_j)\) and 1 (larger values of c1 correspond to less penalization). If c1 is a vector, L1-penalties are the same for all the weights corresponding to the same block but different components: $$for all h, \|a_{j,h}\|_{L_1} \le c_1[j] \sqrt{p_j},$$ with \(p_j\) the number of variables of \(X_j\). If c1 is a matrix, each row \(h\) defines the constraints applied to the weights corresponding to components \(h\): $$for all h, \|a_{j,h}\|_{L_1} \le c_1[h,j] \sqrt{p_j}.$$
ncomp	A \(1*J\) vector that contains the numbers of components for each block (default: rep(1, length(A)), which means one component per block).
scheme	Either "horst", "factorial" or "centroid" (Default: "centroid").
scale	If scale = TRUE, each block is standardized to zero means and unit variances and then divided by the square root of its number of variables (default: TRUE).
init	Mode of initialization use in the SGCCA algorithm, either by Singular Value Decompostion ("svd") or random ("random") (default : "svd").
bias	A logical value for biaised or unbiaised estimator of the var/cov.
tol	Stopping value for convergence.
verbose	Will report progress while computing if verbose = TRUE (default: FALSE).