CPCCA#

class CPCCA(n_modes: int = 2, alpha: Sequence[float] | float = 0.2, standardize: Sequence[bool] | bool = False, use_coslat: Sequence[bool] | bool = False, use_pca: Sequence[bool] | bool = True, n_pca_modes: Sequence[float | int | str] | float | int | str = 0.999, pca_init_rank_reduction: Sequence[float] | float = 0.3, check_nans: Sequence[bool] | bool = True, compute: bool = True, sample_name: str = 'sample', feature_name: Sequence[str] | str = 'feature', solver: str = 'auto', random_state: Generator | int | None = None, solver_kwargs: dict = {}, **kwargs)#

Continuum Power CCA (CPCCA).

CPCCA extends continuum power regression to isolate pairs of coupled patterns, maximizing the squared covariance between partially whitened variables [1] [2].

This method solves the following optimization problem:

\(\max_{q_x, q_y} \left( q_x^T X^T Y q_y \right)\)

subject to the constraints:

\(q_x^T (X^TX)^{1-\alpha_x} q_x = 1, \quad q_y^T (Y^TY)^{1-\alpha_y} q_y = 1\)

where \(\alpha_x\) and \(\alpha_y\) control the degree of whitening applied to the data.

Parameters:
  • n_modes (int, default=2) – Number of modes to calculate.

  • alpha (Sequence[float] | float, default=0.2) – Degree of whitening applied to the data. If float, the same value is applied to both data sets.

  • standardize (Squence[bool] | bool, default=False) – Whether to standardize the input data. Generally not recommended as standardization can be managed by the degree of whitening.

  • use_coslat (Sequence[bool] | bool, default=False) – For data on a longitude-latitude grid, whether to correct for varying grid cell areas towards the poles by scaling each grid point with the square root of the cosine of its latitude.

  • use_pca (Sequence[bool] | bool, default=False) – Whether to preprocess each field individually by reducing dimensionality through PCA. The cross-covariance matrix is then computed in the reduced principal component space.

  • n_pca_modes (Sequence[int | float | str] | int | float | str, default=0.999) – Number of modes to retain during PCA preprocessing step. If int, specifies the exact number of modes; if float, specifies the fraction of variance to retain; if “all”, all modes are retained.

  • pca_init_rank_reduction (Sequence[float] | float, default=0.3) – Relevant when use_pca=True and n_pca_modes is a float. Specifies the initial fraction of rank reduction for faster PCA computation via randomized SVD.

  • check_nans (Sequence[bool] | bool, default=True) – Whether to check for NaNs in the input data. Set to False for lazy model evaluation.

  • compute (bool, default=True) – Whether to compute the model elements eagerly. If True, the following are computed sequentially: preprocessor scaler, optional NaN checks, SVD decomposition, scores, and components.

  • random_state (numpy.random.Generator | int | None, default=None) – Seed for the random number generator.

  • sample_name (str, default="sample") – Name for the new sample dimension.

  • feature_name (Sequence[str] | str, default="feature") – Name for the new feature dimension.

  • solver ({"auto", "full", "randomized"}) – Solver to use for the SVD computation.

  • solver_kwargs (dict, default={}) – Additional keyword arguments passed to the SVD solver function.

Notes

Canonical Correlation Analysis (CCA), Maximum Covariance Analysis (MCA) and Redundany Analysis (RDA) are all special cases of CPCCA depending on the choice of the parameter \(\alpha\).

References

Examples

Perform regular CCA on two data sets:

>>> model = CPCCA(n_modes=5, alpha=0.0)
>>> model.fit(X, Y)

Perform regularized CCA on two data sets:

>>> model = CPCCA(n_modes=5, alpha=0.2)
>>> model.fit(X, Y)

Perform Maximum Covariance Analysis:

>>> model = CPCCA(n_modes=5, alpha=1.0)
>>> model.fit(X, Y)

Perform Redundancy Analysis:

>>> model = CPCCA(n_modes=5, alpha=[0, 1])
>>> model.fit(X, Y)

Make predictions for Y given X:

>>> scores_y_pred = model.predict(X)  # prediction in "PC" space
>>> Y_pred = model.inverse_transform(Y=scores_y_pred)  # prediction in physical space
__init__(n_modes: int = 2, alpha: Sequence[float] | float = 0.2, standardize: Sequence[bool] | bool = False, use_coslat: Sequence[bool] | bool = False, use_pca: Sequence[bool] | bool = True, n_pca_modes: Sequence[float | int | str] | float | int | str = 0.999, pca_init_rank_reduction: Sequence[float] | float = 0.3, check_nans: Sequence[bool] | bool = True, compute: bool = True, sample_name: str = 'sample', feature_name: Sequence[str] | str = 'feature', solver: str = 'auto', random_state: Generator | int | None = None, solver_kwargs: dict = {}, **kwargs)#

Methods

__init__([n_modes, alpha, standardize, ...])

components([normalized])

Get the components of the model.

compute(**kwargs)

Compute and load delayed model results.

correlation_coefficients_X()

Get the correlation coefficients for the scores of \(X\).

correlation_coefficients_Y()

Get the correlation coefficients for the scores of \(Y\).

cross_correlation_coefficients()

Get the cross-correlation coefficients.

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

fit(X, Y, dim[, weights_X, weights_Y])

Fit the data to the model.

fraction_variance_X_explained_by_X()

Get the fraction of variance explained (FVE X).

fraction_variance_Y_explained_by_X()

Get the fraction of variance explained (FVE YX).

fraction_variance_Y_explained_by_Y()

Get the fraction of variance explained (FVE Y).

get_params()

Get the model parameters.

get_serialization_attrs()

Get the attributes needed to serialize the model.

heterogeneous_patterns([correction, alpha])

Get the heterogeneous correlation patterns.

homogeneous_patterns([correction, alpha])

Get the homogeneous correlation patterns.

inverse_transform([X, Y])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

predict(X)

Predict Y from X.

save(path[, overwrite, save_data, engine])

Save the model.

scores([normalized])

Get the scores of the model.

serialize()

Serialize a complete model with its preprocessor.

squared_covariance_fraction()

Get the squared covariance fraction (SCF).

transform([X, Y, normalized])

Transform the data.

components(normalized=True) tuple[DataArray | Dataset | list[DataArray | Dataset], DataArray | Dataset | list[DataArray | Dataset]]#

Get the components of the model.

The components may be referred to differently depending on the model type. Common terms include canonical vectors, singular vectors, loadings or spatial patterns.

Parameters:

normalized (bool, default=True) – Whether to return L2 normalized components.

Returns:

Components of X and Y.

Return type:

tuple[DataObject, DataObject]

compute(**kwargs)#

Compute and load delayed model results.

Parameters:

**kwargs – Additional keyword arguments to pass to dask.compute().

correlation_coefficients_X()#

Get the correlation coefficients for the scores of \(X\).

The correlation coefficients of the scores of \(X\) are given by:

\[c_{x, ij} = \text{corr} \left(\mathbf{r}_{x, i}, \mathbf{r}_{x, j} \right)\]

where \(\mathbf{r}_{x, i}\) and \(\mathbf{r}_{x, j}\) are the i`th and `j`th scores of :math:`X.

correlation_coefficients_Y()#

Get the correlation coefficients for the scores of \(Y\).

The correlation coefficients of the scores of \(Y\) are given by:

\[c_{y, ij} = \text{corr} \left(\mathbf{r}_{y, i}, \mathbf{r}_{y, j} \right)\]

where \(\mathbf{r}_{y, i}\) and \(\mathbf{r}_{y, j}\) are the i`th and `j`th scores of :math:`Y.

cross_correlation_coefficients()#

Get the cross-correlation coefficients.

The cross-correlation coefficients between the scores of X and Y are computed as:

\[c_{xy, i} = \text{corr} \left(\mathbf{r}_{x, i}, \mathbf{r}_{y, i} \right)\]

where \(\mathbf{r}_{x, i}\) and \(\mathbf{r}_{y, i}\) are the i`th scores of ``X` and Y,

Notes

When \(\alpha=0\), the cross-correlation coefficients are equivalent to the canonical correlation coefficients.

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

fit(X: DataArray | Dataset | list[DataArray | Dataset], Y: DataArray | Dataset | list[DataArray | Dataset], dim: Hashable | Sequence[Hashable], weights_X: DataArray | Dataset | list[DataArray | Dataset] | None = None, weights_Y: DataArray | Dataset | list[DataArray | Dataset] | None = None) Self#

Fit the data to the model.

Parameters:
  • X (DataObject) – Data to be fitted.

  • Y (DataObject) – Data to be fitted.

  • dim (Hashable | Sequence[Hashable]) – Define the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights_X (DataObject | None, default=None) – Weights for the data. If None, no weights are used.

  • weights_Y (DataObject | None, default=None) – Weights for the data. If None, no weights are used.

Returns:

Fitted model.

Return type:

xeofs MultiSetModel

fraction_variance_X_explained_by_X()#

Get the fraction of variance explained (FVE X).

The FVE X is the fraction of variance in \(X\) explained by the scores of \(X\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{X|X,i} = 1 - \frac{\|\mathbf{d}_{X,i}\|_F^2}{\|X\|_F^2}\]

where \(\mathbf{d}_{X,i}\) are the residuals of the input data \(X\) after reconstruction by the ith scores of \(X\).

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating

Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

fraction_variance_Y_explained_by_X() DataArray#

Get the fraction of variance explained (FVE YX).

The FVE YX is the fraction of variance in \(Y\) explained by the scores of \(X\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{Y|X,i} = 1 - \frac{\|(X^TX)^{-1/2} \mathbf{d}_{X,i}^T \mathbf{d}_{Y,i}\|_F^2}{\|(X^TX)^{-1/2} X^TY\|_F^2}\]

where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(X\) and \(Y\) after reconstruction by the ith scores of \(X\) and \(Y\), respectively.

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

fraction_variance_Y_explained_by_Y()#

Get the fraction of variance explained (FVE Y).

The FVE Y is the fraction of variance in \(Y\) explained by the scores of \(Y\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{Y|Y,i} = 1 - \frac{\|\mathbf{d}_{Y,i}\|_F^2}{\|Y\|_F^2}\]

where \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(Y\) after reconstruction by the ith scores of \(Y\).

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating

Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

get_params() dict[str, Any]#

Get the model parameters.

get_serialization_attrs() dict#

Get the attributes needed to serialize the model.

Returns:

Attributes needed to serialize the model.

Return type:

dict

heterogeneous_patterns(correction=None, alpha=0.05)#

Get the heterogeneous correlation patterns.

The heterogeneous patterns are the correlation coefficients between the input data and the scores of the other field:

\[G_{X, i} = \text{corr} \left(X, \mathbf{r}_{y,i} \right)\]
\[G_{Y, i} = \text{corr} \left(Y, \mathbf{r}_{x,i} \right)\]

where \(X\) and \(Y\) are the input data, and \(\mathbf{r}_{x,i}\) and \(\mathbf{r}_{y,i}\) are the i`th scores of :math:`X and \(Y\), respectively.

Parameters:
  • correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)

  • alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

  • tuple[DataObject, DataObject] – Heterogenous correlation patterns of X and Y.

  • tuple[DataObject, DataObject] – p-values of the heterogenous correlation patterns of X and Y.

homogeneous_patterns(correction=None, alpha=0.05)#

Get the homogeneous correlation patterns.

The homogeneous correlation patterns are the correlation coefficients between the input data and the scores. They are defined as:

\[H_{X, i} = \text{corr} \left(X, \mathbf{r}_{x,i} \right)\]
\[H_{Y, i} = \text{corr} \left(Y, \mathbf{r}_{y,i} \right)\]

where \(X\) and \(Y\) are the input data, and \(\mathbf{r}_{x,i}\) and \(\mathbf{r}_{y,i}\) are the i`th scores of :math:`X and \(Y\), respectively.

Parameters:
  • correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)

  • alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

  • tuple[DataObject, DataObject] – Homogenous correlation patterns of X and Y.

  • tuple[DataObject, DataObject] – p-values of the homogenous correlation patterns of X and Y.

inverse_transform(X: DataArray | None = None, Y: DataArray | None = None) Sequence[DataArray | Dataset | list[DataArray | Dataset]] | DataArray | Dataset | list[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • X (DataArray | None) – Transformed data to be reconstructed. At least one of them must be provided.

  • Y (DataArray | None) – Transformed data to be reconstructed. At least one of them must be provided.

Returns:

Reconstructed data.

Return type:

Sequence[DataObject] | DataObject

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

BaseModel

predict(X: DataArray | Dataset | list[DataArray | Dataset]) DataArray#

Predict Y from X.

Parameters:

X (DataObject) – Data to be used for prediction.

Returns:

Predicted data in transformed space.

Return type:

DataArray

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized=False) tuple[DataArray, DataArray]#

Get the scores of the model.

The component scores may be referred to differently depending on the model type. Common terms include canonical variates, expansion coefficents, principal component (scores) or temporal patterns.

Parameters:

normalized (bool, default=False) – Whether to return L2 normalized scores.

Returns:

Scores of X and Y.

Return type:

tuple[DataArray, DataArray]

serialize() DataTree#

Serialize a complete model with its preprocessor.

squared_covariance_fraction()#

Get the squared covariance fraction (SCF).

The SCF is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[SCF_{i} = 1 - \frac{\|\mathbf{d}_{X,i}^T \mathbf{d}_{Y,i}\|_F^2}{\|X^TY\|_F^2}\]

where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(X\) and \(Y\) after reconstruction by the ith scores of \(X\) and \(Y\), respectively.

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating

Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

transform(X: DataArray | Dataset | list[DataArray | Dataset] | None = None, Y: DataArray | Dataset | list[DataArray | Dataset] | None = None, normalized=False) Sequence[DataArray] | DataArray#

Transform the data.

Parameters:
  • X (DataObject | None) – Data to be transformed. At least one of them must be provided.

  • Y (DataObject | None) – Data to be transformed. At least one of them must be provided.

  • normalized (bool, default=False) – Whether to return L2 normalized scores.

Returns:

Transformed data.

Return type:

Sequence[DataArray] | DataArray