Extented EOF analysis#

This example demonstrates Extended EOF (EEOF) analysis on xarray tutorial data. EEOF analysis, also termed as Multivariate/Multichannel Singular Spectrum Analysis, advances traditional EOF analysis to capture propagating signals or oscillations in multivariate datasets. At its core, this involves the formulation of a lagged covariance matrix that encapsulates both spatial and temporal correlations. Subsequently, this matrix is decomposed to yield its eigenvectors (components) and eigenvalues (explained variance).

Let’s begin by setting up the required packages and fetching the data:

import xarray as xr
import xeofs as xe
import matplotlib.pyplot as plt

xr.set_options(display_expand_data=False)

<xarray.core.options.set_options object at 0x7ffac7efbc10>

Load the tutorial data.

t2m = xr.tutorial.load_dataset("air_temperature").air

Prior to conducting the EEOF analysis, it’s essential to determine the structure of the lagged covariance matrix. This entails defining the time delay tau and the embedding dimension. The former signifies the interval between the original and lagged time series, while the latter dictates the number of time-lagged copies in the delay-coordinate space, representing the system’s dynamics. For illustration, using tau=4 and embedding=40, we generate 40 delayed versions of the time series, each offset by 4 time steps, resulting in a maximum shift of tau x embedding = 160. Given our dataset’s 6-hour intervals, tau = 4 translates to a 24-hour shift. It’s obvious that this way of constructing the lagged covariance matrix and subsequently decomposing it can be computationally expensive. For example, given our dataset’s dimensions,

t2m.shape

(2920, 25, 53)

the extended dataset would have 40 x 25 x 53 = 53000 features which is much larger than the original dataset’s 1325 features. To mitigate this, we can first preprocess the data using PCA / EOF analysis and then perform EEOF analysis on the resulting PCA / EOF scores. Here, we’ll use n_pca_modes=50 to retain the first 50 PCA modes, so we end up with 40 x 50 = 200 (latent) features. With these parameters set, we proceed to instantiate the ExtendedEOF model and fit our data.

model = xe.models.ExtendedEOF(
    n_modes=10, tau=4, embedding=40, n_pca_modes=50, use_coslat=True
)
model.fit(t2m, dim="time")
scores = model.scores()
components = model.components()
components

<xarray.DataArray 'components' (mode: 10, embedding: 40, lat: 25, lon: 53)> Size: 4MB
0.0003854 0.0003646 0.000357 0.0003562 ... -0.001459 -0.00105 -0.0006424
Coordinates:
  * lat        (lat) float32 100B 15.0 17.5 20.0 22.5 ... 67.5 70.0 72.5 75.0
  * lon        (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * embedding  (embedding) int64 320B 0 4 8 12 16 20 ... 136 140 144 148 152 156
  * mode       (mode) int64 80B 1 2 3 4 5 6 7 8 9 10
Attributes: (12/16)
    model:          Extended EOF Analysis
    software:       xeofs
    version:        2.3.2
    date:           2024-03-31 20:34:10
    n_modes:        10
    center:         True
    ...             ...
    feature_name:   feature
    random_state:   None
    verbose:        False
    compute:        True
    solver:         auto
    solver_kwargs:  {}

A notable distinction from standard EOF analysis is the incorporation of an extra embedding dimension in the components. Nonetheless, the overarching methodology mirrors traditional EOF practices. The results, for instance, can be assessed by examining the explained variance ratio.

model.explained_variance_ratio().plot()
plt.show()

Additionally, we can look into the scores; let’s spotlight mode 4.

scores.sel(mode=4).plot()
plt.show()

In wrapping up, we visualize the corresponding EEOF component of mode 4. For visualization purposes, we’ll focus on the component at a specific latitude, in this instance, 60 degrees north.

components.sel(mode=4, lat=60).plot()
plt.show()

Total running time of the script: (0 minutes 0.777 seconds)

Gallery generated by Sphinx-Gallery