Should I Use This?#
Short answer: It depends.
You may not need to use xeofs if:
The method you need is already available in another package.
Your data is naturally 2D and unlabeled.
For example, scikit-learn offers a wide variety of well-established models for 2D data, cca-zoo provides multiple CCA options, and pyeof supports Varimax-rotated PCA.
For multi-dimensional data in xarray and dask, popular tools like eofs (for PCA/EOF analysis) and xMCA (for Maximum Covariance Analysis) might already cover your needs. Specifically, eofs by Andrew Dawson offers basic support for xarray and dask, but only for single 1D sample dimensions.
However, consider using xeofs if:
You need efficient computation in Python using randomized linear algebra (~10x faster than eofs for medium to large datasets).
Your data exceeds memory limits and requires dask for processing.
You prefer to work within the familiar ecosystem of xarray DataArrays and Datasets.
Your data is naturally N-dimensional (e.g., time, longitude, latitude, steps, sensors, variables).
You want to perform analysis along an N-dimensional sample dimension such as in T-mode EOF analysis.
You need specialized dimensionality reduction methods for climate science (e.g. Hilbert PCA, POP analysis and more).
Below is an overview of some features where xeofs stands out compared to other packages:
xeofs |
eofs |
pyEOF |
xMCA |
|
|---|---|---|---|---|
xarray Interface |
✅ |
✅ |
❌ |
✅ |
Dask Support |
✅ |
✅ |
❌ |
❌ |
Multi-Dimensional |
✅ |
Only 1D sample dim |
2D input only |
Only 1D sample dim |
Missing Values |
✅ |
✅ |
❌ |
✅ |
Support for |
✅ |
❌ |
❌ |
❌ |
Algorithm1 |
Randomized SVD |
Full SVD |
Randomized SVD |
Full SVD |
Extensible Code Structure |
✅ |
❌ |
❌ |
❌ |
Validation |
||||
Bootstrapping |
✅ |
❌ |
❌ |
❌ |
1Note on the algorithm: The computational burden of a full SVD decomposition for an m x n matrix is O(min(mn², m²n)). owever, the randomized SVD, which identifies only the initial k singular values, notably curtails this complexity to O(m n log(k)), making the randomized SVD, as utilized by xeofs, more suitable for expansive datasets. For an in-depth exploration, refer to the sklearn docs on PCA.