xeofs.models.ComplexMCA#

class xeofs.models.ComplexMCA(n_modes: int = 2, padding: str = 'exp', decay_factor: float = 0.2, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int | None = None, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: bool | None = None, solver_kwargs: Dict = {}, **kwargs)#

Bases: MCA

Complex MCA.

Complex MCA, also referred to as Analytical SVD (ASVD) by Elipot et al. (2017) [1], enhances traditional MCA by accommodating both amplitude and phase information. It achieves this by utilizing the Hilbert transform to preprocess the data, thus allowing for a more comprehensive analysis in the subsequent MCA computation.

An optional padding with exponentially decaying values can be applied prior to the Hilbert transform in order to mitigate the impact of spectral leakage.

Parameters:
  • n_modes (int, default=2) – Number of modes to calculate.

  • padding (str, optional) – Specifies the method used for padding the data prior to applying the Hilbert transform. This can help to mitigate the effect of spectral leakage. Currently, only ‘exp’ for exponential padding is supported. Default is ‘exp’.

  • decay_factor (float, optional) – Specifies the decay factor used in the exponential padding. This parameter is only used if padding=’exp’. The recommended value typically ranges between 0.05 to 0.2 but ultimately depends on the variability in the data. A smaller value (e.g. 0.05) is recommended for data with high variability, while a larger value (e.g. 0.2) is recommended for data with low variability. Default is 0.2.

  • center (bool, default=True) – Whether to center the input data.

  • standardize (bool, default=False) – Whether to standardize the input data.

  • use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.

  • n_pca_modes (int, default=None) – The number of principal components to retain during the PCA preprocessing step applied to both data sets prior to executing MCA. If set to None, PCA preprocessing will be bypassed, and the MCA will be performed on the original datasets. Specifying an integer value greater than 0 for n_pca_modes will trigger the PCA preprocessing, retaining only the specified number of principal components. This reduction in dimensionality can be especially beneficial when dealing with high-dimensional data, where computing the cross-covariance matrix can become computationally intensive or in scenarios where multicollinearity is a concern.

  • compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.

  • sample_name (str, default="sample") – Name of the new sample dimension.

  • feature_name (str, default="feature") – Name of the new feature dimension.

  • solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.

  • random_state (int, optional) – Random state for randomized SVD solver.

  • solver_kwargs (dict, default={}) – Additional keyword arguments passed to the SVD solver.

Notes

Complex MCA extends MCA to complex-valued data that contain both magnitude and phase information. The Hilbert transform is used to transform real-valued data to complex-valued data, from which both amplitude and phase can be extracted.

Similar to MCA, Complex MCA is used in climate science to identify coupled patterns of variability between two different climate variables. But unlike MCA, Complex MCA can identify coupled patterns that involve phase shifts.

References

Examples

>>> model = ComplexMCA(n_modes=5, standardize=True)
>>> model.fit(data1, data2)
__init__(n_modes: int = 2, padding: str = 'exp', decay_factor: float = 0.2, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int | None = None, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: bool | None = None, solver_kwargs: Dict = {}, **kwargs)#

Methods

__init__([n_modes, padding, decay_factor, ...])

components()

Return the singular vectors of the left and right field.

components_amplitude()

Compute the amplitude of the components.

components_phase()

Compute the phase of the components.

compute([verbose])

Compute and load delayed model results.

covariance_fraction()

Get the covariance fraction (CF).

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

fit(data1, data2, dim[, weights1, weights2])

Fit the model to the data.

get_params()

Get the model parameters.

get_serialization_attrs()

heterogeneous_patterns([correction, alpha])

Return the heterogeneous patterns of the left and right field.

homogeneous_patterns([correction, alpha])

Return the homogeneous patterns of the left and right field.

inverse_transform(scores1, scores2)

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

save(path[, overwrite, save_data, engine])

Save the model.

scores()

Return the scores of the left and right field.

scores_amplitude()

Compute the amplitude of the scores.

scores_phase()

Compute the phase of the scores.

serialize()

Serialize a complete model with its preprocessors.

singular_values()

Get the singular values of the cross-covariance matrix.

squared_covariance()

Get the squared covariance.

squared_covariance_fraction()

Calculate the squared covariance fraction (SCF).

total_covariance()

Get the total covariance.

transform(data1, data2)

Get the expansion coefficients of "unseen" data.

components()#

Return the singular vectors of the left and right field.

Returns:

  • components1 (DataArray | Dataset | List[DataArray]) – Left components of the fitted model.

  • components2 (DataArray | Dataset | List[DataArray]) – Right components of the fitted model.

components_amplitude() Tuple[DataArray | Dataset | List[DataArray | Dataset], DataArray | Dataset | List[DataArray | Dataset]]#

Compute the amplitude of the components.

The amplitude of the components are defined as

\[A_ij = |C_ij|\]

where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(|\cdot|\) denotes the absolute value.

Returns:

  • DataObject – Amplitude of the left components.

  • DataObject – Amplitude of the left components.

components_phase() Tuple[DataArray | Dataset | List[DataArray | Dataset], DataArray | Dataset | List[DataArray | Dataset]]#

Compute the phase of the components.

The phase of the components are defined as

\[\phi_{ij} = \arg(C_{ij})\]

where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

  • DataObject – Phase of the left components.

  • DataObject – Phase of the right components.

compute(verbose: bool = False, **kwargs)#

Compute and load delayed model results.

Parameters:
  • verbose (bool) – Whether or not to provide additional information about the computing progress.

  • **kwargs – Additional keyword arguments to pass to dask.compute().

covariance_fraction()#

Get the covariance fraction (CF).

Cheng and Dunkerton (1995) define the CF as follows:

\[CF_i = \frac{\sigma_i}{\sum_{i=1}^{m} \sigma_i}\]

where m is the total number of modes and \(\sigma_i\) is the ith singular value of the covariance matrix.

In this implementation the sum of singular values is estimated from the first n modes, therefore one should aim to retain as many modes as possible to get a good estimate of the covariance fraction.

Note

It is important to differentiate the CF from the squared covariance fraction (SCF). While the SCF is an invariant quantity in MCA, the CF is not. Therefore, the SCF is used to assess the relative importance of each mode. Cheng and Dunkerton (1995) introduced the CF in the context of Varimax-rotated MCA to compare the relative importance of each mode before and after rotation. In the special case of both data fields in MCA being identical, the CF is equivalent to the explained variance ratio in EOF analysis.

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

fit(data1: DataArray | Dataset | List[DataArray | Dataset], data2: DataArray | Dataset | List[DataArray | Dataset], dim: Hashable | Sequence[Hashable], weights1: List[DataArray | Dataset] | DataArray | Dataset | None = None, weights2: List[DataArray | Dataset] | DataArray | Dataset | None = None) Self#

Fit the model to the data.

Parameters:
  • data1 (DataArray | Dataset | List[DataArray]) – Left input data.

  • data2 (DataArray | Dataset | List[DataArray]) – Right input data.

  • dim (Hashable | Sequence[Hashable]) – Define the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights1 (Optional[DataObject]) – Weights to be applied to the left input data.

  • weights2 (Optional[DataObject]) – Weights to be applied to the right input data.

get_params() Dict#

Get the model parameters.

heterogeneous_patterns(correction=None, alpha=0.05)#

Return the heterogeneous patterns of the left and right field.

The heterogeneous patterns are the correlation coefficients between the input data and the scores of the other field.

More precisely, the heterogeneous patterns r_{het} are defined as

\[r_{het, x} = corr \left(X, A_y \right)\]
\[r_{het, y} = corr \left(Y, A_x \right)\]

where \(X\) and \(Y\) are the input data, \(A_x\) and \(A_y\) are the scores of the left and right field, respectively.

Parameters:
  • correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)

  • alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

homogeneous_patterns(correction=None, alpha=0.05)#

Return the homogeneous patterns of the left and right field.

The homogeneous patterns are the correlation coefficients between the input data and the scores.

More precisely, the homogeneous patterns r_{hom} are defined as

\[r_{hom, x} = corr \left(X, A_x \right)\]
\[r_{hom, y} = corr \left(Y, A_y \right)\]

where \(X\) and \(Y\) are the input data, \(A_x\) and \(A_y\) are the scores of the left and right field, respectively.

Parameters:
  • correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)

  • alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

  • patterns1 (DataArray | Dataset | List[DataArray]) – Left homogenous patterns.

  • patterns2 (DataArray | Dataset | List[DataArray]) – Right homogenous patterns.

  • pvals1 (DataArray | Dataset | List[DataArray]) – Left p-values.

  • pvals2 (DataArray | Dataset | List[DataArray]) – Right p-values.

inverse_transform(scores1: DataArray, scores2: DataArray) Tuple[DataArray | Dataset | List[DataArray | Dataset], DataArray | Dataset | List[DataArray | Dataset]]#

Reconstruct the original data from transformed data.

Parameters:
  • scores1 (DataObject) – Transformed left field data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

  • scores2 (DataObject) – Transformed right field data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

Returns:

  • Xrec1 (DataArray | Dataset | List[DataArray]) – Reconstructed data of left field.

  • Xrec2 (DataArray | Dataset | List[DataArray]) – Reconstructed data of right field.

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

_BaseCrossModel

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores()#

Return the scores of the left and right field.

The scores in MCA are the projection of the left and right field onto the left and right singular vector of the cross-covariance matrix.

Returns:

  • scores1 (DataArray) – Left scores.

  • scores2 (DataArray) – Right scores.

scores_amplitude() Tuple[DataArray, DataArray]#

Compute the amplitude of the scores.

The amplitude of the scores are defined as

\[A_ij = |S_ij|\]

where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(|\cdot|\) denotes the absolute value.

Returns:

  • DataArray – Amplitude of the left scores.

  • DataArray – Amplitude of the right scores.

scores_phase() Tuple[DataArray, DataArray]#

Compute the phase of the scores.

The phase of the scores are defined as

\[\phi_{ij} = \arg(S_{ij})\]

where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

  • DataArray – Phase of the left scores.

  • DataArray – Phase of the right scores.

serialize() DataTree#

Serialize a complete model with its preprocessors.

singular_values()#

Get the singular values of the cross-covariance matrix.

squared_covariance()#

Get the squared covariance.

The squared covariance corresponds to the explained variance in PCA and is given by the squared singular values of the covariance matrix.

squared_covariance_fraction()#

Calculate the squared covariance fraction (SCF).

The SCF is a measure of the proportion of the total squared covariance that is explained by each mode i. It is computed as follows:

\[SCF_i = \frac{\sigma_i^2}{\sum_{i=1}^{m} \sigma_i^2}\]

where m is the total number of modes and \(\sigma_i\) is the ith singular value of the covariance matrix.

total_covariance() DataArray#

Get the total covariance.

This measure follows the defintion of Cheng and Dunkerton (1995). Note that this measure is not an invariant in MCA.

transform(data1: DataArray | Dataset | List[DataArray | Dataset], data2: DataArray | Dataset | List[DataArray | Dataset])#

Get the expansion coefficients of “unseen” data.

The expansion coefficients are obtained by projecting data onto the singular vectors.

Parameters:
  • data1 (DataArray | Dataset | List[DataArray]) – Left input data. Must be provided if data2 is not provided.

  • data2 (DataArray | Dataset | List[DataArray]) – Right input data. Must be provided if data1 is not provided.

Returns:

  • scores1 (DataArray | Dataset | List[DataArray]) – Left scores.

  • scores2 (DataArray | Dataset | List[DataArray]) – Right scores.