POP#
- class POP(n_modes: int = 2, center: bool = True, standardize: bool = False, use_coslat: bool = False, use_pca: bool = True, n_pca_modes: float | int = 0.999, pca_init_rank_reduction: float = 0.3, check_nans=True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, random_state: int | None = None, solver: str = 'auto', solver_kwargs: dict = {}, **kwargs)#
Principal Oscillation Pattern (POP) analysis.
POP analysis [1] [2] is a linear multivariate technique used to identify and describe dominant oscillatory modes in a dynamical system. POP analysis involves computing the eigenvalues and eigenvectors of the feedback matrix defined as
\[A = C_1 C_0^{-1}\]where \(C_0\) is the covariance matrix and \(C_1\) is the lag-1 covariance matrix of the input data. The eigenvectors of the feedback matrix are the POPs and the eigenvalues are related to the damping times and periods of the oscillatory modes.
- Parameters:
n_modes (int, default=10) – Number of modes to calculate.
center (bool, default=True) – Whether to center the input data.
standardize (bool, default=False) – Whether to standardize the input data.
use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.
use_pca (bool, default=False) – If True, perform PCA to reduce the dimensionality of the data.
n_pca_modes (int | float | str, default=0.999) – If int, specifies the number of modes to retain. If float, specifies the fraction of variance in the (whitened) data that should be explained by the retained modes. If “all”, all modes are retained.
init_rank_reduction (float, default=0.3) – Only relevant when use_pca=True and n_modes is a float, in which case it denotes the fraction of the initial rank to reduce the data to via PCA as a first guess before truncating the solution to the desired fraction of explained variance. This allows for faster computation of PCA via randomized SVD and avoids the need to compute the full SVD.
sample_name (str, default="sample") – Name of the sample dimension.
feature_name (str, default="feature") – Name of the feature dimension.
check_nans (bool, default=True) – If True, remove full-dimensional NaN features from the data, check to ensure that NaN features match the original fit data during transform, and check for isolated NaNs. Note: this forces eager computation of dask arrays. If False, skip all NaN checks. In this case, NaNs should be explicitly removed or filled prior to fitting, or SVD will fail.
compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.
random_state (int, optional) – Seed for the random number generator.
solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.
solver_kwargs (dict, default={}) – Additional keyword arguments to be passed to the SVD solver.
References
Principal Oscillation Patterns: A Review. J. Climate, 8, 377–400, https://doi.org/10.1175/1520-0442(1995)008<0377:POPAR>2.0.CO;2.
Examples
Perform POP analysis in PC space spanned by the first 10 modes:
>>> pop = xe.single.POP(n_modes="all", use_pca=True, n_pca_modes=10) >>> pop.fit(X, "time)
Get the POPs and associated time coefficients:
>>> patterns = pop.components() >>> scores = pop.scores()
Reconstruct the original data using a conjugate pair of POPs:
>>> pop_pairs = scores.sel(mode=[1, 2]) >>> X_rec = pop.inverse_transform(pop_pairs)
- __init__(n_modes: int = 2, center: bool = True, standardize: bool = False, use_coslat: bool = False, use_pca: bool = True, n_pca_modes: float | int = 0.999, pca_init_rank_reduction: float = 0.3, check_nans=True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, random_state: int | None = None, solver: str = 'auto', solver_kwargs: dict = {}, **kwargs)#
Methods
__init__([n_modes, center, standardize, ...])Return the POPs.
Return the amplitude of the POP components.
Return the phase of the POP components.
compute(**kwargs)Compute and load delayed model results.
Return the damping times of the feedback matrix.
deserialize(dt)Deserialize the model and its preprocessors from a DataTree.
Return the eigenvalues of the feedback matrix.
fit(X, dim[, weights])Fit the model to the input data.
fit_transform(data, dim[, weights])Fit the model to the input data and project the data onto the components.
Get the model parameters.
Get the attributes to serialize.
inverse_transform(scores[, normalized])Reconstruct the original data from transformed data.
load(path[, engine])Load a saved model.
periods()Return the periods of the feedback matrix.
save(path[, overwrite, save_data, engine])Save the model.
scores([normalized])Return the POP coefficients/scores.
scores_amplitude([normalized])Return the amplitude of the POP coefficients/scores.
Return the phase of the POP coefficients/scores.
Serialize a complete model with its preprocessor.
transform(data[, normalized])Project data onto the components.
- components() DataArray | Dataset | list[DataArray | Dataset]#
Return the POPs.
The POPs are the eigenvectors of the feedback matrix.
- Returns:
components – Principal Oscillation Patterns (POPs).
- Return type:
DataObject
- components_amplitude() DataArray | Dataset | list[DataArray | Dataset]#
Return the amplitude of the POP components.
The amplitude of the components are defined as
\[A_{ij} = |C_{ij}|\]where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(|\cdot|\) denotes the absolute value.
- Returns:
components_amplitude – Amplitude of the components of the fitted model.
- Return type:
DataObject
- components_phase() DataArray | Dataset | list[DataArray | Dataset]#
Return the phase of the POP components.
The phase of the components are defined as
\[\phi_{ij} = \arg(C_{ij})\]where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(\arg(\cdot)\) denotes the argument of a complex number.
- Returns:
components_phase – Phase of the components of the fitted model.
- Return type:
DataObject
- compute(**kwargs)#
Compute and load delayed model results.
- Parameters:
**kwargs – Additional keyword arguments to pass to dask.compute().
- damping_times() DataArray#
Return the damping times of the feedback matrix.
The damping times are defined as
\[\tau = -\frac{1}{\log(|\lambda|)}\]where \(\lambda\) is the eigenvalue.
- Returns:
Damping times.
- Return type:
DataArray
- classmethod deserialize(dt: DataTree) Self#
Deserialize the model and its preprocessors from a DataTree.
- eigenvalues() DataArray#
Return the eigenvalues of the feedback matrix.
- Returns:
Real or complex eigenvalues.
- Return type:
DataArray
- fit(X: DataArray | Dataset | list[DataArray | Dataset], dim: Sequence[Hashable] | Hashable, weights: DataArray | Dataset | list[DataArray | Dataset] | None = None) Self#
Fit the model to the input data.
- Parameters:
X (DataObject) – Input data.
dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights (DataObject | None, default=None) – Weighting factors for the input data.
- fit_transform(data: DataArray | Dataset | list[DataArray | Dataset], dim: Sequence[Hashable] | Hashable, weights: DataArray | Dataset | list[DataArray | Dataset] | None = None, **kwargs) DataArray#
Fit the model to the input data and project the data onto the components.
- Parameters:
data (DataObject) – Input data.
dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.
weights (DataObject | None, default=None) – Weighting factors for the input data.
**kwargs – Additional keyword arguments to pass to the transform method.
- Returns:
projections – Projections of the data onto the components.
- Return type:
DataArray
- get_params() dict[str, Any]#
Get the model parameters.
- get_serialization_attrs() dict#
Get the attributes to serialize.
- inverse_transform(scores: DataArray, normalized: bool = False) DataArray | Dataset | list[DataArray | Dataset]#
Reconstruct the original data from transformed data.
- Parameters:
scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.
normalized (bool, default=False) – Whether the scores data have been normalized by the L2 norm.
- Returns:
data – Reconstructed data.
- Return type:
DataObject
- classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#
Load a saved model.
- Parameters:
path (str) – Path to the saved model.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.
**kwargs – Additional keyword arguments to pass to open_datatree().
- Returns:
model – The loaded model.
- Return type:
BaseModel
- periods() DataArray#
Return the periods of the feedback matrix.
For complex eigenvalues, the periods are defined as
\[T = \frac{2\pi}{\arg(\lambda)}\]where \(\lambda\) is the eigenvalue. For real eigenvalues
infis returned.- Returns:
Periods.
- Return type:
DataArray
- save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#
Save the model.
- Parameters:
path (str) – Path to save the model.
overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.
save_data (str) – Whether or not to save the full input data along with the fitted components.
engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.
**kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().
- scores(normalized: bool = False) DataArray#
Return the POP coefficients/scores.
- Parameters:
normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.
- Returns:
components – POP coefficients.
- Return type:
DataObject
- scores_amplitude(normalized=True) DataArray#
Return the amplitude of the POP coefficients/scores.
The amplitude of the scores are defined as
\[A_{ij} = |S_{ij}|\]where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(|\cdot|\) denotes the absolute value.
- Parameters:
normalized (bool, default=True) – Whether to normalize the scores by the singular values.
- Returns:
scores_amplitude – Amplitude of the scores of the fitted model.
- Return type:
DataObject
- scores_phase() DataArray#
Return the phase of the POP coefficients/scores.
The phase of the scores are defined as
\[\phi_{ij} = \arg(S_{ij})\]where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(\arg(\cdot)\) denotes the argument of a complex number.
- Returns:
scores_phase – Phase of the scores of the fitted model.
- Return type:
DataObject
- serialize() DataTree#
Serialize a complete model with its preprocessor.
- transform(data: DataArray | Dataset | list[DataArray | Dataset], normalized=False) DataArray#
Project data onto the components.
- Parameters:
data (DataObject) – Data to be transformed.
normalized (bool, default=False) – Whether to normalize the scores by the L2 norm.
- Returns:
projections – Projections of the data onto the components.
- Return type:
DataArray