xeofs.models.OPA#

class xeofs.models.OPA(n_modes: int, tau_max: int, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int = 100, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: int | None = None, solver_kwargs: Dict = {})#

Bases: _BaseModel

Optimal Persistence Analysis.

Optimal Persistence Analysis (OPA) [1] [2] identifies the patterns with the largest decorrelation time in a time-varying field, known as optimal persistence patterns or optimally persistent patterns (OPP).

Parameters:

n_modes (int) – Number of optimal persistence patterns (OPP) to be computed.
tau_max (int) – Maximum time lag for the computation of the covariance matrix.
center (bool, default=True) – Whether to center the input data.
standardize (bool, default=False) – Whether to standardize the input data.
use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.
n_pca_modes (int) – Number of modes to be computed in the pre-processing step using EOF.
compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.
sample_name (str, default="sample") – Name of the sample dimension.
feature_name (str, default="feature") – Name of the feature dimension.
solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.
solver_kwargs (dict, default={}) – Additional keyword arguments to pass to the solver.

References

Examples

>>> from xeofs.models import OPA
>>> model = OPA(n_modes=10, tau_max=50, n_pca_modes=100)
>>> model.fit(data, dim=("time"))

Retrieve the optimally persistent patterns (OPP) and their time series:

>>> opp = model.components()
>>> opp_ts = model.scores()

Retrieve the decorrelation time of the OPPs:

>>> decorrelation_time = model.decorrelation_time()

__init__(n_modes: int, tau_max: int, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int = 100, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: int | None = None, solver_kwargs: Dict = {})#

Methods

`__init__`(n_modes, tau_max[, center, ...])
`components`()	Return the optimally persistent patterns (OPPs).
`compute`([verbose])	Compute and load delayed model results.
`decorrelation_time`()	Return the decorrelation time of the optimal persistence pattern (OPP).
`deserialize`(dt)	Deserialize the model and its preprocessors from a DataTree.
`filter_patterns`()	Return the filter patterns.
`fit`(X, dim[, weights])	Fit the model to the input data.
`fit_transform`(data, dim[, weights])	Fit the model to the input data and project the data onto the components.
`get_params`()	Get the model parameters.
`get_serialization_attrs`()
`inverse_transform`(scores[, normalized])	Reconstruct the original data from transformed data.
`load`(path[, engine])	Load a saved model.
`save`(path[, overwrite, save_data, engine])	Save the model.
`scores`()	Return the time series of the OPPs.
`serialize`()	Serialize a complete model with its preprocessor.
`transform`(data[, normalized])	Project data onto the components.

components() → DataArray | Dataset | List[DataArray | Dataset]#: Return the optimally persistent patterns (OPPs).

compute(verbose: bool = False, **kwargs)#

Compute and load delayed model results.

Parameters:

verbose (bool) – Whether or not to provide additional information about the computing progress.
**kwargs – Additional keyword arguments to pass to dask.compute().

decorrelation_time() → DataArray#: Return the decorrelation time of the optimal persistence pattern (OPP).

classmethod deserialize(dt: DataTree) → Self#: Deserialize the model and its preprocessors from a DataTree.

filter_patterns() → DataArray | Dataset | List[DataArray | Dataset]#: Return the filter patterns.

Fit the model to the input data.

Parameters:

X (DataArray | Dataset | List[DataArray]) – Input data.
dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights (Optional[DataArray | Dataset | List[DataArray]]) – Weighting factors for the input data.

Fit the model to the input data and project the data onto the components.

Parameters:

data (DataObject) – Input data.
dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights (Optional[DataObject]) – Weighting factors for the input data.
**kwargs – Additional keyword arguments to pass to the transform method.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray

get_params() → Dict[str, Any]#: Get the model parameters.

inverse_transform(scores: DataArray, normalized: bool = True) → DataArray | Dataset | List[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:

scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.
normalized (bool, default=True) – Whether the scores data have been normalized by the L2 norm.

Returns:

data – Reconstructed data.

Return type:

DataArray | Dataset | List[DataArray]

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) → Self#

Load a saved model.

Parameters:

path (str) – Path to the saved model.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.
**kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

_BaseModel

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:

path (str) – Path to save the model.
overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.
save_data (str) – Whether or not to save the full input data along with the fitted components.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.
**kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores() → DataArray#

Return the time series of the OPPs.

The time series have a maximum decorrelation time that are uncorrelated with each other.

serialize() → DataTree#: Serialize a complete model with its preprocessor.

transform(data: List[DataArray | Dataset] | DataArray | Dataset, normalized=True) → DataArray#

Project data onto the components.

Parameters:

data (DataArray | Dataset | List[DataArray]) – Data to be transformed.
normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray