xeofs.models.OPA#

class xeofs.models.OPA(n_modes: int, tau_max: int, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int = 100, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: int | None = None, solver_kwargs: Dict = {})#

Bases: _BaseModel

Optimal Persistence Analysis.

Optimal Persistence Analysis (OPA) [1] [2] identifies the patterns with the largest decorrelation time in a time-varying field, known as optimal persistence patterns or optimally persistent patterns (OPP).

Parameters:
  • n_modes (int) – Number of optimal persistence patterns (OPP) to be computed.

  • tau_max (int) – Maximum time lag for the computation of the covariance matrix.

  • center (bool, default=True) – Whether to center the input data.

  • standardize (bool, default=False) – Whether to standardize the input data.

  • use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.

  • n_pca_modes (int) – Number of modes to be computed in the pre-processing step using EOF.

  • compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.

  • sample_name (str, default="sample") – Name of the sample dimension.

  • feature_name (str, default="feature") – Name of the feature dimension.

  • solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.

  • solver_kwargs (dict, default={}) – Additional keyword arguments to pass to the solver.

References

Examples

>>> from xeofs.models import OPA
>>> model = OPA(n_modes=10, tau_max=50, n_pca_modes=100)
>>> model.fit(data, dim=("time"))

Retrieve the optimally persistent patterns (OPP) and their time series:

>>> opp = model.components()
>>> opp_ts = model.scores()

Retrieve the decorrelation time of the OPPs:

>>> decorrelation_time = model.decorrelation_time()
__init__(n_modes: int, tau_max: int, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int = 100, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: int | None = None, solver_kwargs: Dict = {})#

Methods

__init__(n_modes, tau_max[, center, ...])

components()

Return the optimally persistent patterns (OPPs).

compute([verbose])

Compute and load delayed model results.

decorrelation_time()

Return the decorrelation time of the optimal persistence pattern (OPP).

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

filter_patterns()

Return the filter patterns.

fit(X, dim[, weights])

Fit the model to the input data.

fit_transform(data, dim[, weights])

Fit the model to the input data and project the data onto the components.

get_params()

Get the model parameters.

get_serialization_attrs()

inverse_transform(scores[, normalized])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

save(path[, overwrite, save_data, engine])

Save the model.

scores()

Return the time series of the OPPs.

serialize()

Serialize a complete model with its preprocessor.

transform(data[, normalized])

Project data onto the components.

components() DataArray | Dataset | List[DataArray | Dataset]#

Return the optimally persistent patterns (OPPs).

compute(verbose: bool = False, **kwargs)#

Compute and load delayed model results.

Parameters:
  • verbose (bool) – Whether or not to provide additional information about the computing progress.

  • **kwargs – Additional keyword arguments to pass to dask.compute().

decorrelation_time() DataArray#

Return the decorrelation time of the optimal persistence pattern (OPP).

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

filter_patterns() DataArray | Dataset | List[DataArray | Dataset]#

Return the filter patterns.

fit(X: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None) Self#

Fit the model to the input data.

Parameters:
  • X (DataArray | Dataset | List[DataArray]) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataArray | Dataset | List[DataArray]]) – Weighting factors for the input data.

fit_transform(data: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None, **kwargs) DataArray#

Fit the model to the input data and project the data onto the components.

Parameters:
  • data (DataObject) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (Optional[DataObject]) – Weighting factors for the input data.

  • **kwargs – Additional keyword arguments to pass to the transform method.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray

get_params() Dict[str, Any]#

Get the model parameters.

inverse_transform(scores: DataArray, normalized: bool = True) DataArray | Dataset | List[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

  • normalized (bool, default=True) – Whether the scores data have been normalized by the L2 norm.

Returns:

data – Reconstructed data.

Return type:

DataArray | Dataset | List[DataArray]

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

_BaseModel

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores() DataArray#

Return the time series of the OPPs.

The time series have a maximum decorrelation time that are uncorrelated with each other.

serialize() DataTree#

Serialize a complete model with its preprocessor.

transform(data: List[DataArray | Dataset] | DataArray | Dataset, normalized=True) DataArray#

Project data onto the components.

Parameters:
  • data (DataArray | Dataset | List[DataArray]) – Data to be transformed.

  • normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray