xeofs.models.ComplexEOF#

class xeofs.models.ComplexEOF(n_modes: int = 2, padding: str = 'exp', decay_factor: float = 0.2, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, verbose: bool = False, random_state: int | None = None, solver: str = 'auto', solver_kwargs: Dict = {}, **kwargs)#

Bases: EOF

Complex EOF analysis.

The Complex EOF analysis [1] [2] [3] [4] (also known as Hilbert EOF analysis) applies a Hilbert transform to the data before performing the standard EOF analysis. The Hilbert transform is applied to each feature of the data individually.

An optional padding with exponentially decaying values can be applied prior to the Hilbert transform in order to mitigate the impact of spectral leakage.

Parameters:
  • n_modes (int) – Number of modes to calculate.

  • padding (str, optional) – Specifies the method used for padding the data prior to applying the Hilbert transform. This can help to mitigate the effect of spectral leakage. Currently, only ‘exp’ for exponential padding is supported. Default is ‘exp’.

  • decay_factor (float, optional) – Specifies the decay factor used in the exponential padding. This parameter is only used if padding=’exp’. The recommended value typically ranges between 0.05 to 0.2 but ultimately depends on the variability in the data. A smaller value (e.g. 0.05) is recommended for data with high variability, while a larger value (e.g. 0.2) is recommended for data with low variability. Default is 0.2.

  • center (bool, default=True) – Whether to center the input data.

  • standardize (bool) – Whether to standardize the input data.

  • use_coslat (bool) – Whether to use cosine of latitude for scaling.

  • sample_name (str, default="sample") – Name of the sample dimension.

  • feature_name (str, default="feature") – Name of the feature dimension.

  • compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.

  • verbose (bool, default=False) – Whether to show a progress bar when computing the decomposition.

  • random_state (Optional[int], default=None) – Seed for the random number generator.

  • solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.

  • solver_kwargs (dict, optional) – Additional keyword arguments to be passed to the SVD solver.

  • solver_kwargs – Additional keyword arguments to be passed to the SVD solver.

References

Examples

>>> model = ComplexEOF(n_modes=5, standardize=True)
>>> model.fit(data)
__init__(n_modes: int = 2, padding: str = 'exp', decay_factor: float = 0.2, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, verbose: bool = False, random_state: int | None = None, solver: str = 'auto', solver_kwargs: Dict = {}, **kwargs)#

Methods

__init__([n_modes, padding, decay_factor, ...])

components()

Return the (EOF) components.

components_amplitude()

Return the amplitude of the (EOF) components.

components_phase()

Return the phase of the (EOF) components.

compute([verbose])

Compute and load delayed model results.

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

explained_variance()

Return explained variance.

explained_variance_ratio()

Return explained variance ratio.

fit(X, dim[, weights])

Fit the model to the input data.

fit_transform(data, dim[, weights])

Fit the model to the input data and project the data onto the components.

get_params()

Get the model parameters.

get_serialization_attrs()

inverse_transform(scores[, normalized])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

save(path[, overwrite, save_data, engine])

Save the model.

scores([normalized])

Return the (PC) scores.

scores_amplitude([normalized])

Return the amplitude of the (PC) scores.

scores_phase()

Return the phase of the (PC) scores.

serialize()

Serialize a complete model with its preprocessor.

singular_values()

Return the singular values of the Singular Value Decomposition.

transform(data[, normalized])

Project data onto the components.

components() DataArray | Dataset | List[DataArray | Dataset]#

Return the (EOF) components.

The components in EOF anaylsis are the eigenvectors of the covariance/correlation matrix. Other names include the principal components or EOFs.

Returns:

components – Components of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

components_amplitude() DataArray | Dataset | List[DataArray | Dataset]#

Return the amplitude of the (EOF) components.

The amplitude of the components are defined as

\[A_{ij} = |C_{ij}|\]

where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(|\cdot|\) denotes the absolute value.

Returns:

components_amplitude – Amplitude of the components of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

components_phase() DataArray | Dataset | List[DataArray | Dataset]#

Return the phase of the (EOF) components.

The phase of the components are defined as

\[\phi_{ij} = \arg(C_{ij})\]

where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

components_phase – Phase of the components of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

compute(verbose: bool = False, **kwargs)#

Compute and load delayed model results.

Parameters:
  • verbose (bool) – Whether or not to provide additional information about the computing progress.

  • **kwargs – Additional keyword arguments to pass to dask.compute().

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

explained_variance() DataArray#

Return explained variance.

The explained variance \(\lambda_i\) is the variance explained by each mode. It is defined as

\[\lambda_i = \frac{\sigma_i^2}{N-1}\]

where \(\sigma_i\) is the singular value of the \(i\)-th mode and \(N\) is the number of samples. Equivalently, \(\lambda_i\) is the \(i\)-th eigenvalue of the covariance matrix.

Returns:

explained_variance – Explained variance.

Return type:

DataArray

explained_variance_ratio() DataArray#

Return explained variance ratio.

The explained variance ratio \(\gamma_i\) is the variance explained by each mode normalized by the total variance. It is defined as

\[\gamma_i = \frac{\lambda_i}{\sum_{j=1}^M \lambda_j}\]

where \(\lambda_i\) is the explained variance of the \(i\)-th mode and \(M\) is the total number of modes.

Returns:

explained_variance_ratio – Explained variance ratio.

Return type:

DataArray

fit(X: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None) Self#

Fit the model to the input data.

Parameters:
  • X (DataArray | Dataset | List[DataArray]) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataArray | Dataset | List[DataArray]]) – Weighting factors for the input data.

fit_transform(data: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None, **kwargs) DataArray#

Fit the model to the input data and project the data onto the components.

Parameters:
  • data (DataObject) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataObject]) – Weighting factors for the input data.

  • **kwargs – Additional keyword arguments to pass to the transform method.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray

get_params() Dict[str, Any]#

Get the model parameters.

inverse_transform(scores: DataArray, normalized: bool = True) DataArray | Dataset | List[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

  • normalized (bool, default=True) – Whether the scores data have been normalized by the L2 norm.

Returns:

data – Reconstructed data.

Return type:

DataArray | Dataset | List[DataArray]

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

_BaseModel

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized: bool = True) DataArray#

Return the (PC) scores.

The scores in EOF anaylsis are the projection of the data matrix onto the eigenvectors of the covariance matrix (or correlation) matrix. Other names include the principal component (PC) scores or just PCs.

Parameters:

normalized (bool, default=True) – Whether to normalize the scores by the L2 norm (singular values).

Returns:

components – Scores of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

scores_amplitude(normalized=True) DataArray#

Return the amplitude of the (PC) scores.

The amplitude of the scores are defined as

\[A_{ij} = |S_{ij}|\]

where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(|\cdot|\) denotes the absolute value.

Parameters:

normalized (bool, default=True) – Whether to normalize the scores by the singular values.

Returns:

scores_amplitude – Amplitude of the scores of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

scores_phase() DataArray#

Return the phase of the (PC) scores.

The phase of the scores are defined as

\[\phi_{ij} = \arg(S_{ij})\]

where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

scores_phase – Phase of the scores of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

serialize() DataTree#

Serialize a complete model with its preprocessor.

singular_values() DataArray#

Return the singular values of the Singular Value Decomposition.

Returns:

singular_values – Singular values obtained from the SVD.

Return type:

DataArray

transform(data: List[DataArray | Dataset] | DataArray | Dataset, normalized=True) DataArray#

Project data onto the components.

Parameters:
  • data (DataArray | Dataset | List[DataArray]) – Data to be transformed.

  • normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray