xeofs.models.OPA#
- class xeofs.models.OPA(n_modes: int, tau_max: int, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int = 100, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: int | None = None, solver_kwargs: Dict = {})#
Bases:
_BaseModel
Optimal Persistence Analysis.
Optimal Persistence Analysis (OPA) [1] [2] identifies the patterns with the largest decorrelation time in a time-varying field, known as optimal persistence patterns or optimally persistent patterns (OPP).
- Parameters:
n_modes (int) – Number of optimal persistence patterns (OPP) to be computed.
tau_max (int) – Maximum time lag for the computation of the covariance matrix.
center (bool, default=True) – Whether to center the input data.
standardize (bool, default=False) – Whether to standardize the input data.
use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.
n_pca_modes (int) – Number of modes to be computed in the pre-processing step using EOF.
compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.
sample_name (str, default="sample") – Name of the sample dimension.
feature_name (str, default="feature") – Name of the feature dimension.
solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.
solver_kwargs (dict, default={}) – Additional keyword arguments to pass to the solver.
References
Examples
>>> from xeofs.models import OPA >>> model = OPA(n_modes=10, tau_max=50, n_pca_modes=100) >>> model.fit(data, dim=("time"))
Retrieve the optimally persistent patterns (OPP) and their time series:
>>> opp = model.components() >>> opp_ts = model.scores()
Retrieve the decorrelation time of the OPPs:
>>> decorrelation_time = model.decorrelation_time()
- __init__(n_modes: int, tau_max: int, center: bool = True, standardize: bool = False, use_coslat: bool = False, check_nans: bool = True, n_pca_modes: int = 100, compute: bool = True, sample_name: str = 'sample', feature_name: str = 'feature', solver: str = 'auto', random_state: int | None = None, solver_kwargs: Dict = {})#
Methods
__init__
(n_modes, tau_max[, center, ...])Return the optimally persistent patterns (OPPs).
compute
([verbose])Compute and load delayed model results.
Return the decorrelation time of the optimal persistence pattern (OPP).
deserialize
(dt)Deserialize the model and its preprocessors from a DataTree.
Return the filter patterns.
fit
(X, dim[, weights])Fit the model to the input data.
fit_transform
(data, dim[, weights])Fit the model to the input data and project the data onto the components.
Get the model parameters.
get_serialization_attrs
()inverse_transform
(scores[, normalized])Reconstruct the original data from transformed data.
load
(path[, engine])Load a saved model.
save
(path[, overwrite, save_data, engine])Save the model.
scores
()Return the time series of the OPPs.
Serialize a complete model with its preprocessor.
transform
(data[, normalized])Project data onto the components.
- components() DataArray | Dataset | List[DataArray | Dataset] #
Return the optimally persistent patterns (OPPs).
- compute(verbose: bool = False, **kwargs)#
Compute and load delayed model results.
- Parameters:
verbose (bool) – Whether or not to provide additional information about the computing progress.
**kwargs – Additional keyword arguments to pass to dask.compute().
- decorrelation_time() DataArray #
Return the decorrelation time of the optimal persistence pattern (OPP).
- classmethod deserialize(dt: DataTree) Self #
Deserialize the model and its preprocessors from a DataTree.
- filter_patterns() DataArray | Dataset | List[DataArray | Dataset] #
Return the filter patterns.
- fit(X: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None) Self #
Fit the model to the input data.
- Parameters:
X (DataArray | Dataset | List[DataArray]) – Input data.
dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights (Optional[DataArray | Dataset | List[DataArray]]) – Weighting factors for the input data.
- fit_transform(data: List[DataArray | Dataset] | DataArray | Dataset, dim: Sequence[Hashable] | Hashable, weights: List[DataArray | Dataset] | DataArray | Dataset | None = None, **kwargs) DataArray #
Fit the model to the input data and project the data onto the components.
- Parameters:
data (DataObject) – Input data.
dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights (Optional[DataObject]) – Weighting factors for the input data.
**kwargs – Additional keyword arguments to pass to the transform method.
- Returns:
projections – Projections of the data onto the components.
- Return type:
DataArray
- get_params() Dict[str, Any] #
Get the model parameters.
- inverse_transform(scores: DataArray, normalized: bool = True) DataArray | Dataset | List[DataArray | Dataset] #
Reconstruct the original data from transformed data.
- Parameters:
scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.
normalized (bool, default=True) – Whether the scores data have been normalized by the L2 norm.
- Returns:
data – Reconstructed data.
- Return type:
DataArray | Dataset | List[DataArray]
- classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self #
Load a saved model.
- Parameters:
path (str) – Path to the saved model.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.
**kwargs – Additional keyword arguments to pass to open_datatree().
- Returns:
model – The loaded model.
- Return type:
_BaseModel
- save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#
Save the model.
- Parameters:
path (str) – Path to save the model.
overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.
save_data (str) – Whether or not to save the full input data along with the fitted components.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.
**kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().
- scores() DataArray #
Return the time series of the OPPs.
The time series have a maximum decorrelation time that are uncorrelated with each other.
- serialize() DataTree #
Serialize a complete model with its preprocessor.
- transform(data: List[DataArray | Dataset] | DataArray | Dataset, normalized=True) DataArray #
Project data onto the components.
- Parameters:
data (DataArray | Dataset | List[DataArray]) – Data to be transformed.
normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.
- Returns:
projections – Projections of the data onto the components.
- Return type:
DataArray