ComplexMCARotator#

class ComplexMCARotator(n_modes: int = 10, power: int = 1, max_iter: int | None = None, rtol: float = 1e-08, compute: bool = True)#

Rotate a solution obtained from xe.cross.ComplexMCA.

Rotate the obtained components and scores of a CPCCA model to increase interpretability. The algorithm here is based on the approach of Cheng & Dunkerton (1995) [1], Elipot et al. (2017) [2] and Rieger et al. (2021).

Parameters:
  • n_modes (int, default=10) – Specify the number of modes to be rotated.

  • power (int, default=1) – Set the power for the Promax rotation. A power value of 1 results in a Varimax rotation.

  • max_iter (int, default=1000) – Determine the maximum number of iterations for the computation of the rotation matrix.

  • rtol (float, default=1e-8) – Define the relative tolerance required to achieve convergence and terminate the iterative process.

  • compute (bool, default=True) – Whether to compute the rotation immediately.

References

Examples

Perform a Complex MCA:

>>> model = ComplexMCA(n_modes=10)
>>> model.fit(X, Y, dim='time')

Then, apply varimax rotation to first 5 components and scores:

>>> rotator = ComplexMCARotator(n_modes=5)
>>> rotator.fit(model)

Retrieve the rotated components and scores:

>>> rotator.components()
>>> rotator.scores()
__init__(n_modes: int = 10, power: int = 1, max_iter: int | None = None, rtol: float = 1e-08, compute: bool = True)#

Methods

__init__([n_modes, power, max_iter, rtol, ...])

check_needed_module(module)

Check if a necessary non-core dependency is available.

components([normalized])

Get the components of the model.

components_amplitude([normalized])

Get the amplitude of the components.

components_phase([normalized])

Get the phase of the components.

compute(**kwargs)

Compute and load delayed model results.

correlation_coefficients_X()

Get the correlation coefficients for the scores of \(X\).

correlation_coefficients_Y()

Get the correlation coefficients for the scores of \(Y\).

covariance_fraction_CD95()

Get the covariance fraction (CF).

cross_correlation_coefficients()

Get the cross-correlation coefficients.

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

fit(model)

Rotate the solution obtained from xe.cross.CPCCA.

fraction_variance_X_explained_by_X()

Get the fraction of variance explained (FVE X).

fraction_variance_Y_explained_by_X()

Get the fraction of variance explained (FVE YX).

fraction_variance_Y_explained_by_Y()

Get the fraction of variance explained (FVE Y).

get_params()

Get the model parameters.

get_serialization_attrs()

Get the attributes needed to serialize the model.

heterogeneous_patterns([correction, alpha])

Get the heterogeneous correlation patterns.

homogeneous_patterns([correction, alpha])

Get the homogeneous correlation patterns.

inverse_transform([X, Y])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

predict(X)

Predict Y from X.

save(path[, overwrite, save_data, engine])

Save the model.

scores([normalized])

Get the scores of the model.

scores_amplitude([normalized])

Get the amplitude of the scores.

scores_phase([normalized])

Get the phase of the scores.

serialize()

Serialize a complete model with its preprocessor.

squared_covariance_fraction()

Get the squared covariance fraction (SCF).

transform([X, Y, normalized])

Transform the data.

Attributes

extra_modules

uses_complex

check_needed_module(module: str)#

Check if a necessary non-core dependency is available.

components(normalized=True) tuple[DataArray | Dataset | list[DataArray | Dataset], DataArray | Dataset | list[DataArray | Dataset]]#

Get the components of the model.

The components may be referred to differently depending on the model type. Common terms include canonical vectors, singular vectors, loadings or spatial patterns.

Parameters:

normalized (bool, default=True) – Whether to return L2 normalized components.

Returns:

Components of X and Y.

Return type:

tuple[DataObject, DataObject]

components_amplitude(normalized=True) tuple[DataArray | Dataset | list[DataArray | Dataset], DataArray | Dataset | list[DataArray | Dataset]]#

Get the amplitude of the components.

The amplitudes of the components are defined as

\[A_{x, ij} = |p_{x, ij}|\]
\[A_{y, ij} = |p_{y, ij}|\]

where \(p_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(|\cdot|\) denotes the absolute value.

Returns:

Component amplitudes of \(X\) and \(Y\).

Return type:

tuple[DataObject, DataObject]

components_phase(normalized=True) tuple[DataArray | Dataset | list[DataArray | Dataset], DataArray | Dataset | list[DataArray | Dataset]]#

Get the phase of the components.

The phases of the components are defined as

\[\phi_{x, ij} = \arg(p_{x, ij})\]
\[\phi_{y, ij} = \arg(p_{y, ij})\]

where \(p_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

Component phases of \(X\) and \(Y\).

Return type:

tuple[DataObject, DataObject]

compute(**kwargs)#

Compute and load delayed model results.

Parameters:

**kwargs – Additional keyword arguments to pass to dask.compute().

correlation_coefficients_X()#

Get the correlation coefficients for the scores of \(X\).

The correlation coefficients of the scores of \(X\) are given by:

\[c_{x, ij} = \text{corr} \left(\mathbf{r}_{x, i}, \mathbf{r}_{x, j} \right)\]

where \(\mathbf{r}_{x, i}\) and \(\mathbf{r}_{x, j}\) are the i`th and `j`th scores of :math:`X.

correlation_coefficients_Y()#

Get the correlation coefficients for the scores of \(Y\).

The correlation coefficients of the scores of \(Y\) are given by:

\[c_{y, ij} = \text{corr} \left(\mathbf{r}_{y, i}, \mathbf{r}_{y, j} \right)\]

where \(\mathbf{r}_{y, i}\) and \(\mathbf{r}_{y, j}\) are the i`th and `j`th scores of :math:`Y.

covariance_fraction_CD95()#

Get the covariance fraction (CF).

Cheng and Dunkerton (1995) [3]_ define the CF as follows:

\[CF_i = \frac{\sigma_i}{\sum_{i=1}^{m} \sigma_i}\]

where m is the total number of modes and \(\sigma_i\) is the ith singular value of the covariance matrix.

This implementation estimates the sum of singular values from the first n modes, therefore one should aim to retain as many modes as possible to get a good estimate of the covariance fraction.

Note

In MCA, the focus is on maximizing the squared covariance (SC). As a result, this quantity is preserved during decomposition - meaning the SC of both datasets remains unchanged before and after decomposition. Each mode explains a fraction of the total SC, and together, all modes can reconstruct the total SC of the cross-covariance matrix. However, the (non-squared) covariance is not invariant in MCA; it is not preserved by the individual modes and cannot be reconstructed from them. Consequently, the squared covariance fraction (SCF) is invariant in MCA and is typically used to assess the relative importance of each mode. In contrast, the convariance fraction (CF) is not invariant. Cheng and Dunkerton [3]_ introduced the CF to compare the relative importance of modes before and after Varimax rotation in MCA. Notably, when the data fields in MCA are identical, the CF corresponds to the explained variance ratio in Principal Component Analysis (PCA).

References

cross_correlation_coefficients()#

Get the cross-correlation coefficients.

The cross-correlation coefficients between the scores of X and Y are computed as:

\[c_{xy, i} = \text{corr} \left(\mathbf{r}_{x, i}, \mathbf{r}_{y, i} \right)\]

where \(\mathbf{r}_{x, i}\) and \(\mathbf{r}_{y, i}\) are the i`th scores of ``X` and Y,

Notes

When \(\alpha=0\), the cross-correlation coefficients are equivalent to the canonical correlation coefficients.

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

fit(model: CPCCA) Self#

Rotate the solution obtained from xe.cross.CPCCA.

Parameters:

model (xe.cross.CPCCA) – The CPCCA model to be rotated.

fraction_variance_X_explained_by_X()#

Get the fraction of variance explained (FVE X).

The FVE X is the fraction of variance in \(X\) explained by the scores of \(X\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{X|X,i} = 1 - \frac{\|\mathbf{d}_{X,i}\|_F^2}{\|X\|_F^2}\]

where \(\mathbf{d}_{X,i}\) are the residuals of the input data \(X\) after reconstruction by the ith scores of \(X\).

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating

Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

fraction_variance_Y_explained_by_X() DataArray#

Get the fraction of variance explained (FVE YX).

The FVE YX is the fraction of variance in \(Y\) explained by the scores of \(X\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{Y|X,i} = 1 - \frac{\|(X^TX)^{-1/2} \mathbf{d}_{X,i}^T \mathbf{d}_{Y,i}\|_F^2}{\|(X^TX)^{-1/2} X^TY\|_F^2}\]

where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(X\) and \(Y\) after reconstruction by the ith scores of \(X\) and \(Y\), respectively.

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

fraction_variance_Y_explained_by_Y()#

Get the fraction of variance explained (FVE Y).

The FVE Y is the fraction of variance in \(Y\) explained by the scores of \(Y\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{Y|Y,i} = 1 - \frac{\|\mathbf{d}_{Y,i}\|_F^2}{\|Y\|_F^2}\]

where \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(Y\) after reconstruction by the ith scores of \(Y\).

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating

Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

get_params() dict[str, Any]#

Get the model parameters.

get_serialization_attrs() dict#

Get the attributes needed to serialize the model.

Returns:

Attributes needed to serialize the model.

Return type:

dict

heterogeneous_patterns(correction=None, alpha=0.05)#

Get the heterogeneous correlation patterns.

The heterogeneous patterns are the correlation coefficients between the input data and the scores of the other field:

\[G_{X, i} = \text{corr} \left(X, \mathbf{r}_{y,i} \right)\]
\[G_{Y, i} = \text{corr} \left(Y, \mathbf{r}_{x,i} \right)\]

where \(X\) and \(Y\) are the input data, and \(\mathbf{r}_{x,i}\) and \(\mathbf{r}_{y,i}\) are the i`th scores of :math:`X and \(Y\), respectively.

Parameters:
  • correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)

  • alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

  • tuple[DataObject, DataObject] – Heterogenous correlation patterns of X and Y.

  • tuple[DataObject, DataObject] – p-values of the heterogenous correlation patterns of X and Y.

homogeneous_patterns(correction=None, alpha=0.05)#

Get the homogeneous correlation patterns.

The homogeneous correlation patterns are the correlation coefficients between the input data and the scores. They are defined as:

\[H_{X, i} = \text{corr} \left(X, \mathbf{r}_{x,i} \right)\]
\[H_{Y, i} = \text{corr} \left(Y, \mathbf{r}_{y,i} \right)\]

where \(X\) and \(Y\) are the input data, and \(\mathbf{r}_{x,i}\) and \(\mathbf{r}_{y,i}\) are the i`th scores of :math:`X and \(Y\), respectively.

Parameters:
  • correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)

  • alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

  • tuple[DataObject, DataObject] – Homogenous correlation patterns of X and Y.

  • tuple[DataObject, DataObject] – p-values of the homogenous correlation patterns of X and Y.

inverse_transform(X: DataArray | None = None, Y: DataArray | None = None) Sequence[DataArray | Dataset | list[DataArray | Dataset]] | DataArray | Dataset | list[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • X (DataArray | None) – Transformed data to be reconstructed. At least one of them must be provided.

  • Y (DataArray | None) – Transformed data to be reconstructed. At least one of them must be provided.

Returns:

Reconstructed data.

Return type:

Sequence[DataObject] | DataObject

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

BaseModel

predict(X: DataArray | Dataset | list[DataArray | Dataset]) DataArray#

Predict Y from X.

Parameters:

X (DataObject) – Data to be used for prediction.

Returns:

Predicted data in transformed space.

Return type:

DataArray

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized=False) tuple[DataArray, DataArray]#

Get the scores of the model.

The component scores may be referred to differently depending on the model type. Common terms include canonical variates, expansion coefficents, principal component (scores) or temporal patterns.

Parameters:

normalized (bool, default=False) – Whether to return L2 normalized scores.

Returns:

Scores of X and Y.

Return type:

tuple[DataArray, DataArray]

scores_amplitude(normalized=False) tuple[DataArray, DataArray]#

Get the amplitude of the scores.

The amplitudes of the scores are defined as

\[A_{x, ij} = |r_{y, ij}|\]
\[A_{y, ij} = |r_{x, ij}|\]

where \(r_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(|\cdot|\) denotes the absolute value.

Returns:

Score amplitudes of \(X\) and \(Y\).

Return type:

tuple[DataArray, DataArray]

scores_phase(normalized=False) tuple[DataArray, DataArray]#

Get the phase of the scores.

The phases of the scores are defined as

\[\phi_{x, ij} = \arg(r_{x, ij})\]
\[\phi_{y, ij} = \arg(r_{y, ij})\]

where \(r_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

Score phases of \(X\) and \(Y\).

Return type:

tuple[DataArray, DataArray]

serialize() DataTree#

Serialize a complete model with its preprocessor.

squared_covariance_fraction()#

Get the squared covariance fraction (SCF).

The SCF is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[SCF_{i} = 1 - \frac{\|\mathbf{d}_{X,i}^T \mathbf{d}_{Y,i}\|_F^2}{\|X^TY\|_F^2}\]

where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(X\) and \(Y\) after reconstruction by the ith scores of \(X\) and \(Y\), respectively.

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating

Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

transform(X: DataArray | Dataset | list[DataArray | Dataset] | None = None, Y: DataArray | Dataset | list[DataArray | Dataset] | None = None, normalized: bool = False) DataArray | list[DataArray]#

Transform the data.

Parameters:
  • X (DataObject | None) – Data to be transformed. At least one of them must be provided.

  • Y (DataObject | None) – Data to be transformed. At least one of them must be provided.

  • normalized (bool, default=False) – Whether to return L2 normalized scores.

Returns:

Transformed data.

Return type:

Sequence[DataArray] | DataArray