xeofs.models.EOF#

class xeofs.models.EOF(n_modes: int = 2, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans=True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, verbose: bool = False, random_state: int | None = None, solver: str = 'auto', solver_kwargs: Dict = {}, **kwargs)#

Bases: _BaseModel

EOF analysis.

Empirical Orthogonal Functions (EOF) analysis, more commonly known as Principal Component Analysis (PCA).

Parameters:
  • n_modes (int, default=10) – Number of modes to calculate.

  • center (bool, default=True) – Whether to center the input data.

  • standardize (bool, default=False) – Whether to standardize the input data.

  • use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.

  • sample_name (str, default="sample") – Name of the sample dimension.

  • feature_name (str, default="feature") – Name of the feature dimension.

  • compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.

  • verbose (bool, default=False) – Whether to show a progress bar when computing the decomposition.

  • random_state (Optional[int], default=None) – Seed for the random number generator.

  • solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.

  • solver_kwargs (dict, default={}) – Additional keyword arguments to be passed to the SVD solver.

Examples

>>> model = xe.models.EOF(n_modes=5)
>>> model.fit(data)
>>> scores = model.scores()
__init__(n_modes: int = 2, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans=True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, verbose: bool = False, random_state: int | None = None, solver: str = 'auto', solver_kwargs: Dict = {}, **kwargs)#

Methods

__init__([n_modes, center, standardize, ...])

components()

Return the (EOF) components.

compute([verbose])

Compute and load delayed model results.

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

explained_variance()

Return explained variance.

explained_variance_ratio()

Return explained variance ratio.

fit(X, dim[, weights])

Fit the model to the input data.

fit_transform(data, dim[, weights])

Fit the model to the input data and project the data onto the components.

get_params()

Get the model parameters.

get_serialization_attrs()

inverse_transform(scores[, normalized])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

save(path[, overwrite, save_data, engine])

Save the model.

scores([normalized])

Return the (PC) scores.

serialize()

Serialize a complete model with its preprocessor.

singular_values()

Return the singular values of the Singular Value Decomposition.

transform(data[, normalized])

Project data onto the components.

components() DataArray | Dataset | List[DataArray | Dataset]#

Return the (EOF) components.

The components in EOF anaylsis are the eigenvectors of the covariance/correlation matrix. Other names include the principal components or EOFs.

Returns:

components – Components of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

compute(verbose: bool = False, **kwargs)#

Compute and load delayed model results.

Parameters:
  • verbose (bool) – Whether or not to provide additional information about the computing progress.

  • **kwargs – Additional keyword arguments to pass to dask.compute().

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

explained_variance() DataArray#

Return explained variance.

The explained variance \(\lambda_i\) is the variance explained by each mode. It is defined as

\[\lambda_i = \frac{\sigma_i^2}{N-1}\]

where \(\sigma_i\) is the singular value of the \(i\)-th mode and \(N\) is the number of samples. Equivalently, \(\lambda_i\) is the \(i\)-th eigenvalue of the covariance matrix.

Returns:

explained_variance – Explained variance.

Return type:

DataArray

explained_variance_ratio() DataArray#

Return explained variance ratio.

The explained variance ratio \(\gamma_i\) is the variance explained by each mode normalized by the total variance. It is defined as

\[\gamma_i = \frac{\lambda_i}{\sum_{j=1}^M \lambda_j}\]

where \(\lambda_i\) is the explained variance of the \(i\)-th mode and \(M\) is the total number of modes.

Returns:

explained_variance_ratio – Explained variance ratio.

Return type:

DataArray

fit(X: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None) Self#

Fit the model to the input data.

Parameters:
  • X (DataArray | Dataset | List[DataArray]) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataArray | Dataset | List[DataArray]]) – Weighting factors for the input data.

fit_transform(data: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None, **kwargs) DataArray#

Fit the model to the input data and project the data onto the components.

Parameters:
  • data (DataObject) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataObject]) – Weighting factors for the input data.

  • **kwargs – Additional keyword arguments to pass to the transform method.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray

get_params() Dict[str, Any]#

Get the model parameters.

inverse_transform(scores: DataArray, normalized: bool = True) DataArray | Dataset | List[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

  • normalized (bool, default=True) – Whether the scores data have been normalized by the L2 norm.

Returns:

data – Reconstructed data.

Return type:

DataArray | Dataset | List[DataArray]

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

_BaseModel

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized: bool = True) DataArray#

Return the (PC) scores.

The scores in EOF anaylsis are the projection of the data matrix onto the eigenvectors of the covariance matrix (or correlation) matrix. Other names include the principal component (PC) scores or just PCs.

Parameters:

normalized (bool, default=True) – Whether to normalize the scores by the L2 norm (singular values).

Returns:

components – Scores of the fitted model.

Return type:

DataArray | Dataset | List[DataArray]

serialize() DataTree#

Serialize a complete model with its preprocessor.

singular_values() DataArray#

Return the singular values of the Singular Value Decomposition.

Returns:

singular_values – Singular values obtained from the SVD.

Return type:

DataArray

transform(data: List[DataArray | Dataset] | DataArray | Dataset, normalized=True) DataArray#

Project data onto the components.

Parameters:
  • data (DataArray | Dataset | List[DataArray]) – Data to be transformed.

  • normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray