HilbertMCA#

class HilbertMCA(n_modes: int = 2, padding: Sequence[str] | str | None = 'exp', decay_factor: Sequence[float] | float = 0.2, standardize: Sequence[bool] | bool = False, use_coslat: Sequence[bool] | bool = False, check_nans: Sequence[bool] | bool = True, use_pca: Sequence[bool] | bool = True, n_pca_modes: Sequence[float | int | str] | float | int | str = 0.999, pca_init_rank_reduction: Sequence[float] | float = 0.3, compute: bool = True, sample_name: str = 'sample', feature_name: Sequence[str] | str = 'feature', solver: str = 'auto', random_state: Generator | int | None = None, solver_kwargs: dict = {})#

Hilbert MCA.

Hilbert MCA [1] (aka Analytical SVD), extends MCA by examining amplitude-phase relationships. It augments the input data with its Hilbert transform, creating a complex-valued field.

This method solves the following optimization problem:

\(\max_{q_x, q_y} \left( q_x^H X^H Y q_y \right)\)

subject to the constraints:

\(q_x^H q_x = 1, \quad q_y^H q_y = 1\)

where \(H\) denotes the conjugate transpose and \(X\) and \(Y\) are the augmented data matrices.

An optional padding with exponentially decaying values can be applied prior to the Hilbert transform in order to mitigate the impact of spectral leakage.

Parameters:

n_modes (int, default=2) – Number of modes to calculate.
padding (Sequence[str] | str | None, default="exp") – Padding method for the Hilbert transform. Available options are: - None: no padding - “exp”: exponential decay
decay_factor (Sequence[float] | float, default=0.2) – Decay factor for the exponential padding.
standardize (Squence[bool] | bool, default=False) – Whether to standardize the input data. Generally not recommended as standardization can be managed by the degree of whitening.
use_coslat (Sequence[bool] | bool, default=False) – For data on a longitude-latitude grid, whether to correct for varying grid cell areas towards the poles by scaling each grid point with the square root of the cosine of its latitude.
use_pca (Sequence[bool] | bool, default=False) – Whether to preprocess each field individually by reducing dimensionality through PCA. The cross-covariance matrix is computed in the reduced principal component space.
n_pca_modes (Sequence[int | float | str] | int | float | str, default=0.999) – Number of modes to retain during PCA preprocessing step. If int, specifies the exact number of modes; if float, specifies the fraction of variance to retain; if “all”, all modes are retained.
pca_init_rank_reduction (Sequence[float] | float, default=0.3) – Relevant when use_pca=True and n_pca_modes is a float. Specifies the initial fraction of rank reduction for faster PCA computation via randomized SVD.
check_nans (Sequence[bool] | bool, default=True) – Whether to check for NaNs in the input data. Set to False for lazy model evaluation.
compute (bool, default=True) – Whether to compute the model elements eagerly. If True, the following are computed sequentially: preprocessor scaler, optional NaN checks, SVD decomposition, scores, and components.
random_state (numpy.random.Generator | int | None, default=None) – Seed for the random number generator.
sample_name (str, default="sample") – Name for the new sample dimension.
feature_name (Sequence[str] | str, default="feature") – Name for the new feature dimension.
solver ({"auto", "full", "randomized"}) – Solver to use for the SVD computation.
solver_kwargs (dict, default={}) – Additional keyword arguments passed to the SVD solver function.

References

Examples

>>> model = HilbertMCA(n_modes=5)
>>> model.fit(X, Y, "time")

__init__(n_modes: int = 2, padding: Sequence[str] | str | None = 'exp', decay_factor: Sequence[float] | float = 0.2, standardize: Sequence[bool] | bool = False, use_coslat: Sequence[bool] | bool = False, check_nans: Sequence[bool] | bool = True, use_pca: Sequence[bool] | bool = True, n_pca_modes: Sequence[float | int | str] | float | int | str = 0.999, pca_init_rank_reduction: Sequence[float] | float = 0.3, compute: bool = True, sample_name: str = 'sample', feature_name: Sequence[str] | str = 'feature', solver: str = 'auto', random_state: Generator | int | None = None, solver_kwargs: dict = {})#

Methods

`__init__`([n_modes, padding, decay_factor, ...])
`components`([normalized])	Get the components of the model.
`components_amplitude`([normalized])	Get the amplitude of the components.
`components_phase`([normalized])	Get the phase of the components.
`compute`(**kwargs)	Compute and load delayed model results.
`correlation_coefficients_X`()	Get the correlation coefficients for the scores of \(X\).
`correlation_coefficients_Y`()	Get the correlation coefficients for the scores of \(Y\).
`covariance_fraction_CD95`()	Get the covariance fraction (CF).
`cross_correlation_coefficients`()	Get the cross-correlation coefficients.
`deserialize`(dt)	Deserialize the model and its preprocessors from a DataTree.
`fit`(X, Y, dim[, weights_X, weights_Y])	Fit the data to the model.
`fraction_variance_X_explained_by_X`()	Get the fraction of variance explained (FVE X).
`fraction_variance_Y_explained_by_X`()	Get the fraction of variance explained (FVE YX).
`fraction_variance_Y_explained_by_Y`()	Get the fraction of variance explained (FVE Y).
`get_params`()	Get the model parameters.
`get_serialization_attrs`()	Get the attributes needed to serialize the model.
`heterogeneous_patterns`([correction, alpha])	Get the heterogeneous correlation patterns.
`homogeneous_patterns`([correction, alpha])	Get the homogeneous correlation patterns.
`inverse_transform`([X, Y])	Reconstruct the original data from transformed data.
`load`(path[, engine])	Load a saved model.
`predict`(X)	Predict Y from X.
`save`(path[, overwrite, save_data, engine])	Save the model.
`scores`([normalized])	Get the scores of the model.
`scores_amplitude`([normalized])	Get the amplitude of the scores.
`scores_phase`([normalized])	Get the phase of the scores.
`serialize`()	Serialize a complete model with its preprocessor.
`squared_covariance_fraction`()	Get the squared covariance fraction (SCF).
`transform`([X, Y, normalized])	Transform the input data into the component space.

Get the components of the model.

The components may be referred to differently depending on the model type. Common terms include canonical vectors, singular vectors, loadings or spatial patterns.

Parameters:: normalized (bool, default=True) – Whether to return L2 normalized components.
Returns:: Components of X and Y.
Return type:: tuple[DataObject, DataObject]

Get the amplitude of the components.

The amplitudes of the components are defined as

\[A_{x, ij} = |p_{x, ij}|\]

\[A_{y, ij} = |p_{y, ij}|\]

where \(p_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(|\cdot|\) denotes the absolute value.

Returns:: Component amplitudes of \(X\) and \(Y\).
Return type:: tuple[DataObject, DataObject]

Get the phase of the components.

The phases of the components are defined as

\[\phi_{x, ij} = \arg(p_{x, ij})\]

\[\phi_{y, ij} = \arg(p_{y, ij})\]

where \(p_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:: Component phases of \(X\) and \(Y\).
Return type:: tuple[DataObject, DataObject]

compute(**kwargs)#

Compute and load delayed model results.

Parameters:: **kwargs – Additional keyword arguments to pass to dask.compute().

correlation_coefficients_X()#

Get the correlation coefficients for the scores of \(X\).

The correlation coefficients of the scores of \(X\) are given by:

\[c_{x, ij} = \text{corr} \left(\mathbf{r}_{x, i}, \mathbf{r}_{x, j} \right)\]

where \(\mathbf{r}_{x, i}\) and \(\mathbf{r}_{x, j}\) are the i`th and `j`th scores of :math:`X.

correlation_coefficients_Y()#

Get the correlation coefficients for the scores of \(Y\).

The correlation coefficients of the scores of \(Y\) are given by:

\[c_{y, ij} = \text{corr} \left(\mathbf{r}_{y, i}, \mathbf{r}_{y, j} \right)\]

where \(\mathbf{r}_{y, i}\) and \(\mathbf{r}_{y, j}\) are the i`th and `j`th scores of :math:`Y.

covariance_fraction_CD95()#

Get the covariance fraction (CF).

Cheng and Dunkerton (1995) [3] define the CF as follows:

\[CF_i = \frac{\sigma_i}{\sum_{i=1}^{m} \sigma_i}\]

where m is the total number of modes and \(\sigma_i\) is the ith singular value of the covariance matrix.

This implementation estimates the sum of singular values from the first n modes, therefore one should aim to retain as many modes as possible to get a good estimate of the covariance fraction.

Note

In MCA, the focus is on maximizing the squared covariance (SC). As a result, this quantity is preserved during decomposition - meaning the SC of both datasets remains unchanged before and after decomposition. Each mode explains a fraction of the total SC, and together, all modes can reconstruct the total SC of the cross-covariance matrix. However, the (non-squared) covariance is not invariant in MCA; it is not preserved by the individual modes and cannot be reconstructed from them. Consequently, the squared covariance fraction (SCF) is invariant in MCA and is typically used to assess the relative importance of each mode. In contrast, the convariance fraction (CF) is not invariant. Cheng and Dunkerton [3] introduced the CF to compare the relative importance of modes before and after Varimax rotation in MCA. Notably, when the data fields in MCA are identical, the CF corresponds to the explained variance ratio in Principal Component Analysis (PCA).

References

cross_correlation_coefficients()#

Get the cross-correlation coefficients.

The cross-correlation coefficients between the scores of X and Y are computed as:

\[c_{xy, i} = \text{corr} \left(\mathbf{r}_{x, i}, \mathbf{r}_{y, i} \right)\]

where \(\mathbf{r}_{x, i}\) and \(\mathbf{r}_{y, i}\) are the i`th scores of ``X` and Y,

Notes

When \(\alpha=0\), the cross-correlation coefficients are equivalent to the canonical correlation coefficients.

classmethod deserialize(dt: DataTree) → Self#: Deserialize the model and its preprocessors from a DataTree.

Fit the data to the model.

Parameters:

X (DataObject) – Data to be fitted.
Y (DataObject) – Data to be fitted.
dim (Hashable | Sequence[Hashable]) – Define the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights_X (DataObject | None, default=None) – Weights for the data. If None, no weights are used.
weights_Y (DataObject | None, default=None) – Weights for the data. If None, no weights are used.

Returns:

Fitted model.

Return type:

xeofs MultiSetModel

fraction_variance_X_explained_by_X()#

Get the fraction of variance explained (FVE X).

The FVE X is the fraction of variance in \(X\) explained by the scores of \(X\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{X|X,i} = 1 - \frac{\|\mathbf{d}_{X,i}\|_F^2}{\|X\|_F^2}\]

where \(\mathbf{d}_{X,i}\) are the residuals of the input data \(X\) after reconstruction by the ith scores of \(X\).

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating: Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

fraction_variance_Y_explained_by_X() → DataArray#

Get the fraction of variance explained (FVE YX).

The FVE YX is the fraction of variance in \(Y\) explained by the scores of \(X\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{Y|X,i} = 1 - \frac{\|(X^TX)^{-1/2} \mathbf{d}_{X,i}^T \mathbf{d}_{Y,i}\|_F^2}{\|(X^TX)^{-1/2} X^TY\|_F^2}\]

where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(X\) and \(Y\) after reconstruction by the ith scores of \(X\) and \(Y\), respectively.

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

fraction_variance_Y_explained_by_Y()#

Get the fraction of variance explained (FVE Y).

The FVE Y is the fraction of variance in \(Y\) explained by the scores of \(Y\). It is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[FVE_{Y|Y,i} = 1 - \frac{\|\mathbf{d}_{Y,i}\|_F^2}{\|Y\|_F^2}\]

where \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(Y\) after reconstruction by the ith scores of \(Y\).

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating: Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

get_params() → dict[str, Any]#: Get the model parameters.

get_serialization_attrs() → dict#

Get the attributes needed to serialize the model.

Returns:: Attributes needed to serialize the model.
Return type:: dict

heterogeneous_patterns(correction=None, alpha=0.05)#

Get the heterogeneous correlation patterns.

The heterogeneous patterns are the correlation coefficients between the input data and the scores of the other field:

\[G_{X, i} = \text{corr} \left(X, \mathbf{r}_{y,i} \right)\]

\[G_{Y, i} = \text{corr} \left(Y, \mathbf{r}_{x,i} \right)\]

where \(X\) and \(Y\) are the input data, and \(\mathbf{r}_{x,i}\) and \(\mathbf{r}_{y,i}\) are the i`th scores of :math:`X and \(Y\), respectively.

Parameters:

correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)
alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

tuple[DataObject, DataObject] – Heterogenous correlation patterns of X and Y.
tuple[DataObject, DataObject] – p-values of the heterogenous correlation patterns of X and Y.

homogeneous_patterns(correction=None, alpha=0.05)#

Get the homogeneous correlation patterns.

The homogeneous correlation patterns are the correlation coefficients between the input data and the scores. They are defined as:

\[H_{X, i} = \text{corr} \left(X, \mathbf{r}_{x,i} \right)\]

\[H_{Y, i} = \text{corr} \left(Y, \mathbf{r}_{y,i} \right)\]

where \(X\) and \(Y\) are the input data, and \(\mathbf{r}_{x,i}\) and \(\mathbf{r}_{y,i}\) are the i`th scores of :math:`X and \(Y\), respectively.

Parameters:

correction (str, default=None) – Method to apply a multiple testing correction. If None, no correction is applied. Available methods are: - bonferroni : one-step correction - sidak : one-step correction - holm-sidak : step down method using Sidak adjustments - holm : step-down method using Bonferroni adjustments - simes-hochberg : step-up method (independent) - hommel : closed method based on Simes tests (non-negative) - fdr_bh : Benjamini/Hochberg (non-negative) (default) - fdr_by : Benjamini/Yekutieli (negative) - fdr_tsbh : two stage fdr correction (non-negative) - fdr_tsbky : two stage fdr correction (non-negative)
alpha (float, default=0.05) – The desired family-wise error rate. Not used if correction is None.

Returns:

tuple[DataObject, DataObject] – Homogenous correlation patterns of X and Y.
tuple[DataObject, DataObject] – p-values of the homogenous correlation patterns of X and Y.

Reconstruct the original data from transformed data.

Parameters:

X (DataArray | None) – Transformed data to be reconstructed. At least one of them must be provided.
Y (DataArray | None) – Transformed data to be reconstructed. At least one of them must be provided.

Returns:

Reconstructed data.

Return type:

Sequence[DataObject] | DataObject

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) → Self#

Load a saved model.

Parameters:

path (str) – Path to the saved model.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.
**kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

BaseModel

predict(X: DataArray | Dataset | list[DataArray | Dataset]) → DataArray#

Predict Y from X.

Parameters:: X (DataObject) – Data to be used for prediction.
Returns:: Predicted data in transformed space.
Return type:: DataArray

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:

path (str) – Path to save the model.
overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.
save_data (str) – Whether or not to save the full input data along with the fitted components.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.
**kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized=False) → tuple[DataArray, DataArray]#

Get the scores of the model.

The component scores may be referred to differently depending on the model type. Common terms include canonical variates, expansion coefficents, principal component (scores) or temporal patterns.

Parameters:: normalized (bool, default=False) – Whether to return L2 normalized scores.
Returns:: Scores of X and Y.
Return type:: tuple[DataArray, DataArray]

scores_amplitude(normalized=False) → tuple[DataArray, DataArray]#

Get the amplitude of the scores.

The amplitudes of the scores are defined as

\[A_{x, ij} = |r_{y, ij}|\]

\[A_{y, ij} = |r_{x, ij}|\]

where \(r_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(|\cdot|\) denotes the absolute value.

Returns:: Score amplitudes of \(X\) and \(Y\).
Return type:: tuple[DataArray, DataArray]

scores_phase(normalized=False) → tuple[DataArray, DataArray]#

Get the phase of the scores.

The phases of the scores are defined as

\[\phi_{x, ij} = \arg(r_{x, ij})\]

\[\phi_{y, ij} = \arg(r_{y, ij})\]

where \(r_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:: Score phases of \(X\) and \(Y\).
Return type:: tuple[DataArray, DataArray]

serialize() → DataTree#: Serialize a complete model with its preprocessor.

squared_covariance_fraction()#

Get the squared covariance fraction (SCF).

The SCF is computed as a weighted mean-square error (see equation (15) in Swenson (2015)) :

\[SCF_{i} = 1 - \frac{\|\mathbf{d}_{X,i}^T \mathbf{d}_{Y,i}\|_F^2}{\|X^TY\|_F^2}\]

where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the residuals of the input data \(X\) and \(Y\) after reconstruction by the ith scores of \(X\) and \(Y\), respectively.

References

Swenson, E. Continuum Power CCA: A Unified Approach for Isolating: Coupled Modes. Journal of Climate 28, 1016–1030 (2015).

transform(X: DataArray | Dataset | list[DataArray | Dataset] | None = None, Y: DataArray | Dataset | list[DataArray | Dataset] | None = None, normalized=False) → Sequence[DataArray]#: Transform the input data into the component space.