xeofs.models.CCA#

class xeofs.models.CCA(n_modes: int = 2, use_coslat: bool = False, check_nans: bool = True, c: float = 0, pca: bool = True, variance_fraction: float = 0.99, init_pca_modes: float = 0.75, compute: bool = True, eps: float = 1e-06)#

Bases: CCABaseModel

Canonical Correlation Analysis.

Canonical Correlation Analysis (CCA) identifies linear combinations of variables from multiple datasets that maximize their mutual correlations. An optional regularisation parameter (ridge regression) can be used to improve the conditioning of the covariance matrix.

The objective function of (regularised) CCA is:

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\(1-c_1)w_1^TX_1^TX_1w_1+c_1w_1^Tw_1=n\\(1-c_2)w_2^TX_2^TX_2w_2+c_2w_2^Tw_2=n\end{aligned}\end{align} \]

where \(c_i\) are the regularization parameters for dataset.

Parameters:
  • n_modes (int, optional) – Number of latent dimensions to use, by default 10

  • use_coslat (bool, optional) – Whether to use the square root of the cosine of the latitude as weights, by default False

  • pca (bool, optional) – Whether to perform PCA on the input data, by default True

  • variance_fraction (float, optional) – Fraction of variance to keep when performing PCA, by default 0.99

  • init_pca_modes (int | float, optional) – Number of PCA modes to compute. If float, the number of modes is given by the fraction of maximum number of modes for the given data. A value of 1.0 will perform a full SVD of the data. Choosing a smaller value can increase computation speed. Default 0.75

  • c (Sequence[float] | float], optional) – Regularisation parameter, by default 0 (no regularization)

  • compute (bool, optional) – Whether to compute the decomposition immediately, by default True

Notes

This implementation is largely based on the MCCA class from the cca_zoo repository [3] .

References

Examples

>>> from xe.models import CCA
>>> model = CCA(n_modes=5)
>>> model.fit(data)
>>> can_loadings = model.canonical_loadings()
__init__(n_modes: int = 2, use_coslat: bool = False, check_nans: bool = True, c: float = 0, pca: bool = True, variance_fraction: float = 0.99, init_pca_modes: float = 0.75, compute: bool = True, eps: float = 1e-06)#

Methods

__init__([n_modes, use_coslat, check_nans, ...])

components([normalize])

Get the canonical loadings for each view.

explained_covariance()

Get the explained covariance.

explained_covariance_ratio()

Get the explained covariance ratio.

explained_variance()

Get the explained variance for each view.

explained_variance_ratio()

Get the explained variance ratio for each view.

fit(views, dim)

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

scores()

Get the canonical variates for each view.

set_fit_request(*[, dim, views])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_transform_request(*[, views])

Request metadata passed to the transform method.

transform(views)

Transform the input data into the canonical space.

weights()

components(normalize: bool = True) List[DataArray | Dataset | List[DataArray | Dataset]]#

Get the canonical loadings for each view.

explained_covariance() DataArray#

Get the explained covariance.

explained_covariance_ratio() DataArray#

Get the explained covariance ratio.

explained_variance() List[DataArray]#

Get the explained variance for each view.

explained_variance_ratio() List[DataArray]#

Get the explained variance ratio for each view.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

scores() List[DataArray]#

Get the canonical variates for each view.

set_fit_request(*, dim: bool | None | str = '$UNCHANGED$', views: bool | None | str = '$UNCHANGED$') CCA#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • dim (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for dim parameter in fit.

  • views (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for views parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_transform_request(*, views: bool | None | str = '$UNCHANGED$') CCA#

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

views (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for views parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(views: Sequence[DataArray | Dataset | List[DataArray | Dataset]]) List[DataArray]#

Transform the input data into the canonical space.

Parameters:

views (List[DataArray | Dataset]) – Input data to transform