xeofs.models.ExtendedEOF#

class xeofs.models.ExtendedEOF(n_modes: int, tau: int, embedding: int, n_pca_modes: int | None = None, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, solver: str = 'auto', random_state: int | None = None, solver_kwargs: dict = {}, **kwargs)#

Bases: EOF

Extended EOF analysis.

Extended EOF (EEOF) analysis [1] [2], often referred to as Multivariate/Multichannel Singular Spectrum Analysis, enhances traditional EOF analysis by identifying propagating signals or oscillations in multivariate datasets. This approach integrates the spatial correlation of EOFs with the temporal auto- and cross-correlation derived from the lagged covariance matrix.

Parameters:
  • n_modes (int) – Number of modes to be computed.

  • tau (int) – Time delay used to construct a time-delayed version of the original time series.

  • embedding (int) – Embedding dimension is the number of dimensions in the delay-coordinate space used to represent the dynamics of the system. It determines the number of delayed copies of the time series that are used to construct the delay-coordinate space.

  • n_pca_modes (Optional[int]) – If provided, the input data is first preprocessed using PCA with the specified number of modes. The EEOF analysis is then performed on the resulting PCA scores. This approach can lead to important computational savings.

  • **kwargs – Additional keyword arguments passed to the EOF model.

References

Examples

>>> from xeofs.models import EEOF
>>> model = EEOF(n_modes=5, tau=1, embedding=20, n_pca_modes=20)
>>> model.fit(data, dim=("time"))

Retrieve the extended empirical orthogonal functions (EEOFs) and their explained variance:

>>> eeofs = model.components()
>>> exp_var = model.explained_variance()

Retrieve the time-dependent coefficients corresponding to the EEOF modes:

>>> scores = model.scores()
__init__(n_modes: int, tau: int, embedding: int, n_pca_modes: int | None = None, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, solver: str = 'auto', random_state: int | None = None, solver_kwargs: dict = {}, **kwargs)#

Methods

__init__(n_modes, tau, embedding[, ...])

components()

Return the (EOF) components.

compute([verbose])

Compute and load delayed model results.

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

explained_variance()

Return explained variance.

explained_variance_ratio()

Return explained variance ratio.

fit(X, dim[, weights])

Fit the model to the input data.

fit_transform(data, dim[, weights])

Fit the model to the input data and project the data onto the components.

get_params()

Get the model parameters.

get_serialization_attrs()

inverse_transform(scores[, normalized])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

save(path[, overwrite, save_data, engine])

Save the model.

scores([normalized])

Return the (PC) scores.

serialize()

Serialize a complete model with its preprocessor.

singular_values()

Return the singular values of the Singular Value Decomposition.

transform(data[, normalized])

Project data onto the components.

components() DataArray | Dataset | List[DataArray | Dataset]#

Return the (EOF) components.

The components in EOF anaylsis are the eigenvectors of the covariance/correlation matrix. Other names include the principal components or EOFs.

Returns:

components – Components of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

compute(verbose: bool = False, **kwargs)#

Compute and load delayed model results.

Parameters:
  • verbose (bool) – Whether or not to provide additional information about the computing progress.

  • **kwargs – Additional keyword arguments to pass to dask.compute().

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

explained_variance() DataArray#

Return explained variance.

The explained variance \(\lambda_i\) is the variance explained by each mode. It is defined as

\[\lambda_i = \frac{\sigma_i^2}{N-1}\]

where \(\sigma_i\) is the singular value of the \(i\)-th mode and \(N\) is the number of samples. Equivalently, \(\lambda_i\) is the \(i\)-th eigenvalue of the covariance matrix.

Returns:

explained_variance – Explained variance.

Return type:

DataArray

explained_variance_ratio() DataArray#

Return explained variance ratio.

The explained variance ratio \(\gamma_i\) is the variance explained by each mode normalized by the total variance. It is defined as

\[\gamma_i = \frac{\lambda_i}{\sum_{j=1}^M \lambda_j}\]

where \(\lambda_i\) is the explained variance of the \(i\)-th mode and \(M\) is the total number of modes.

Returns:

explained_variance_ratio – Explained variance ratio.

Return type:

DataArray

fit(X: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None) Self#

Fit the model to the input data.

Parameters:
  • X (DataArray | Dataset | List[DataArray]) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataArray | Dataset | List[DataArray]]) – Weighting factors for the input data.

fit_transform(data: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None, **kwargs) DataArray#

Fit the model to the input data and project the data onto the components.

Parameters:
  • data (DataObject) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataObject]) – Weighting factors for the input data.

  • **kwargs – Additional keyword arguments to pass to the transform method.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray

get_params() Dict[str, Any]#

Get the model parameters.

inverse_transform(scores: DataArray, normalized: bool = True) DataArray | Dataset | List[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

  • normalized (bool, default=True) – Whether the scores data have been normalized by the L2 norm.

Returns:

data – Reconstructed data.

Return type:

DataArray | Dataset | List[DataArray]

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

_BaseModel

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized: bool = True) DataArray#

Return the (PC) scores.

The scores in EOF anaylsis are the projection of the data matrix onto the eigenvectors of the covariance matrix (or correlation) matrix. Other names include the principal component (PC) scores or just PCs.

Parameters:

normalized (bool, default=True) – Whether to normalize the scores by the L2 norm (singular values).

Returns:

components – Scores of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

serialize() DataTree#

Serialize a complete model with its preprocessor.

singular_values() DataArray#

Return the singular values of the Singular Value Decomposition.

Returns:

singular_values – Singular values obtained from the SVD.

Return type:

DataArray

transform(data: List[DataArray | Dataset] | DataArray | Dataset, normalized=True) DataArray#

Project data onto the components.

Parameters:
  • data (DataArray | Dataset | List[DataArray]) – Data to be transformed.

  • normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray