POP#

class POP(n_modes: int = 2, center: bool = True, standardize: bool = False, use_coslat: bool = False, use_pca: bool = True, n_pca_modes: float | int = 0.999, pca_init_rank_reduction: float = 0.3, check_nans=True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, random_state: int | None = None, solver: str = 'auto', solver_kwargs: dict = {}, **kwargs)#

Principal Oscillation Pattern (POP) analysis.

POP analysis [1] [2] is a linear multivariate technique used to identify and describe dominant oscillatory modes in a dynamical system. POP analysis involves computing the eigenvalues and eigenvectors of the feedback matrix defined as

\[A = C_1 C_0^{-1}\]

where \(C_0\) is the covariance matrix and \(C_1\) is the lag-1 covariance matrix of the input data. The eigenvectors of the feedback matrix are the POPs and the eigenvalues are related to the damping times and periods of the oscillatory modes.

Parameters:
  • n_modes (int, default=10) – Number of modes to calculate.

  • center (bool, default=True) – Whether to center the input data.

  • standardize (bool, default=False) – Whether to standardize the input data.

  • use_coslat (bool, default=False) – Whether to use cosine of latitude for scaling.

  • use_pca (bool, default=False) – If True, perform PCA to reduce the dimensionality of the data.

  • n_pca_modes (int | float | str, default=0.999) – If int, specifies the number of modes to retain. If float, specifies the fraction of variance in the (whitened) data that should be explained by the retained modes. If “all”, all modes are retained.

  • init_rank_reduction (float, default=0.3) – Only relevant when use_pca=True and n_modes is a float, in which case it denotes the fraction of the initial rank to reduce the data to via PCA as a first guess before truncating the solution to the desired fraction of explained variance. This allows for faster computation of PCA via randomized SVD and avoids the need to compute the full SVD.

  • sample_name (str, default="sample") – Name of the sample dimension.

  • feature_name (str, default="feature") – Name of the feature dimension.

  • check_nans (bool, default=True) – If True, remove full-dimensional NaN features from the data, check to ensure that NaN features match the original fit data during transform, and check for isolated NaNs. Note: this forces eager computation of dask arrays. If False, skip all NaN checks. In this case, NaNs should be explicitly removed or filled prior to fitting, or SVD will fail.

  • compute (bool, default=True) – Whether to compute elements of the model eagerly, or to defer computation. If True, four pieces of the fit will be computed sequentially: 1) the preprocessor scaler, 2) optional NaN checks, 3) SVD decomposition, 4) scores and components.

  • random_state (int, optional) – Seed for the random number generator.

  • solver ({"auto", "full", "randomized"}, default="auto") – Solver to use for the SVD computation.

  • solver_kwargs (dict, default={}) – Additional keyword arguments to be passed to the SVD solver.

References

Principal Oscillation Patterns: A Review. J. Climate, 8, 377–400, https://doi.org/10.1175/1520-0442(1995)008<0377:POPAR>2.0.CO;2.

Examples

Perform POP analysis in PC space spanned by the first 10 modes:

>>> pop = xe.single.POP(n_modes="all", use_pca=True, n_pca_modes=10)
>>> pop.fit(X, "time)

Get the POPs and associated time coefficients:

>>> patterns = pop.components()
>>> scores = pop.scores()

Reconstruct the original data using a conjugate pair of POPs:

>>> pop_pairs = scores.sel(mode=[1, 2])
>>> X_rec = pop.inverse_transform(pop_pairs)
__init__(n_modes: int = 2, center: bool = True, standardize: bool = False, use_coslat: bool = False, use_pca: bool = True, n_pca_modes: float | int = 0.999, pca_init_rank_reduction: float = 0.3, check_nans=True, sample_name: str = 'sample', feature_name: str = 'feature', compute: bool = True, random_state: int | None = None, solver: str = 'auto', solver_kwargs: dict = {}, **kwargs)#

Methods

__init__([n_modes, center, standardize, ...])

check_needed_module(module)

Check if a necessary non-core dependency is available.

components()

Return the POPs.

components_amplitude()

Return the amplitude of the POP components.

components_phase()

Return the phase of the POP components.

compute(**kwargs)

Compute and load delayed model results.

damping_times()

Return the damping times of the feedback matrix.

deserialize(dt)

Deserialize the model and its preprocessors from a DataTree.

eigenvalues()

Return the eigenvalues of the feedback matrix.

fit(X, dim[, weights])

Fit the model to the input data.

fit_transform(data, dim[, weights])

Fit the model to the input data and project the data onto the components.

get_params()

Get the model parameters.

get_serialization_attrs()

Get the attributes to serialize.

inverse_transform(scores[, normalized])

Reconstruct the original data from transformed data.

load(path[, engine])

Load a saved model.

periods()

Return the periods of the feedback matrix.

save(path[, overwrite, save_data, engine])

Save the model.

scores([normalized])

Return the POP coefficients/scores.

scores_amplitude([normalized])

Return the amplitude of the POP coefficients/scores.

scores_phase()

Return the phase of the POP coefficients/scores.

serialize()

Serialize a complete model with its preprocessor.

transform(data[, normalized])

Project data onto the components.

Attributes

extra_modules

uses_complex

check_needed_module(module: str)#

Check if a necessary non-core dependency is available.

components() DataArray | Dataset | list[DataArray | Dataset]#

Return the POPs.

The POPs are the eigenvectors of the feedback matrix.

Returns:

components – Principal Oscillation Patterns (POPs).

Return type:

DataObject

components_amplitude() DataArray | Dataset | list[DataArray | Dataset]#

Return the amplitude of the POP components.

The amplitude of the components are defined as

\[A_{ij} = |C_{ij}|\]

where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(|\cdot|\) denotes the absolute value.

Returns:

components_amplitude – Amplitude of the components of the fitted model.

Return type:

DataObject

components_phase() DataArray | Dataset | list[DataArray | Dataset]#

Return the phase of the POP components.

The phase of the components are defined as

\[\phi_{ij} = \arg(C_{ij})\]

where \(C_{ij}\) is the \(i\)-th entry of the \(j\)-th component and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

components_phase – Phase of the components of the fitted model.

Return type:

DataObject

compute(**kwargs)#

Compute and load delayed model results.

Parameters:

**kwargs – Additional keyword arguments to pass to dask.compute().

damping_times() DataArray#

Return the damping times of the feedback matrix.

The damping times are defined as

\[\tau = -\frac{1}{\log(|\lambda|)}\]

where \(\lambda\) is the eigenvalue.

Returns:

Damping times.

Return type:

DataArray

classmethod deserialize(dt: DataTree) Self#

Deserialize the model and its preprocessors from a DataTree.

eigenvalues() DataArray#

Return the eigenvalues of the feedback matrix.

Returns:

Real or complex eigenvalues.

Return type:

DataArray

fit(X: DataArray | Dataset | list[DataArray | Dataset], dim: Sequence[Hashable] | Hashable, weights: DataArray | Dataset | list[DataArray | Dataset] | None = None) Self#

Fit the model to the input data.

Parameters:
  • X (DataObject) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (DataObject | None, default=None) – Weighting factors for the input data.

fit_transform(data: DataArray | Dataset | list[DataArray | Dataset], dim: Sequence[Hashable] | Hashable, weights: DataArray | Dataset | list[DataArray | Dataset] | None = None, **kwargs) DataArray#

Fit the model to the input data and project the data onto the components.

Parameters:
  • data (DataObject) – Input data.

  • dim (Sequence[Hashable] | Hashable) – Specify the sample dimensions. The remaining dimensions will be treated as feature dimensions.

  • weights (DataObject | None, default=None) – Weighting factors for the input data.

  • **kwargs – Additional keyword arguments to pass to the transform method.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray

get_params() dict[str, Any]#

Get the model parameters.

get_serialization_attrs() dict#

Get the attributes to serialize.

inverse_transform(scores: DataArray, normalized: bool = False) DataArray | Dataset | list[DataArray | Dataset]#

Reconstruct the original data from transformed data.

Parameters:
  • scores (DataArray) – Transformed data to be reconstructed. This could be a subset of the scores data of a fitted model, or unseen data. Must have a ‘mode’ dimension.

  • normalized (bool, default=False) – Whether the scores data have been normalized by the L2 norm.

Returns:

data – Reconstructed data.

Return type:

DataObject

classmethod load(path: str, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs) Self#

Load a saved model.

Parameters:
  • path (str) – Path to the saved model.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for reading the saved model.

  • **kwargs – Additional keyword arguments to pass to open_datatree().

Returns:

model – The loaded model.

Return type:

BaseModel

periods() DataArray#

Return the periods of the feedback matrix.

For complex eigenvalues, the periods are defined as

\[T = \frac{2\pi}{\arg(\lambda)}\]

where \(\lambda\) is the eigenvalue. For real eigenvalues inf is returned.

Returns:

Periods.

Return type:

DataArray

save(path: str, overwrite: bool = False, save_data: bool = False, engine: Literal['zarr', 'netcdf4', 'h5netcdf'] = 'zarr', **kwargs)#

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • overwrite (bool, default=False) – Whether or not to overwrite the existing path if it already exists. Ignored unless engine=”zarr”.

  • save_data (str) – Whether or not to save the full input data along with the fitted components.

  • engine ({"zarr", "netcdf4", "h5netcdf"}, default="zarr") – Xarray backend engine to use for writing the saved model.

  • **kwargs – Additional keyword arguments to pass to DataTree.to_netcdf() or DataTree.to_zarr().

scores(normalized: bool = False) DataArray#

Return the POP coefficients/scores.

Parameters:

normalized (bool, default=True) – Whether to normalize the scores by the L2 norm.

Returns:

components – POP coefficients.

Return type:

DataObject

scores_amplitude(normalized=True) DataArray#

Return the amplitude of the POP coefficients/scores.

The amplitude of the scores are defined as

\[A_{ij} = |S_{ij}|\]

where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(|\cdot|\) denotes the absolute value.

Parameters:

normalized (bool, default=True) – Whether to normalize the scores by the singular values.

Returns:

scores_amplitude – Amplitude of the scores of the fitted model.

Return type:

DataObject

scores_phase() DataArray#

Return the phase of the POP coefficients/scores.

The phase of the scores are defined as

\[\phi_{ij} = \arg(S_{ij})\]

where \(S_{ij}\) is the \(i\)-th entry of the \(j\)-th score and \(\arg(\cdot)\) denotes the argument of a complex number.

Returns:

scores_phase – Phase of the scores of the fitted model.

Return type:

DataObject

serialize() DataTree#

Serialize a complete model with its preprocessor.

transform(data: DataArray | Dataset | list[DataArray | Dataset], normalized=False) DataArray#

Project data onto the components.

Parameters:
  • data (DataObject) – Data to be transformed.

  • normalized (bool, default=False) – Whether to normalize the scores by the L2 norm.

Returns:

projections – Projections of the data onto the components.

Return type:

DataArray