Hilbert MCA [1] (aka Analytical SVD), extends MCA by
examining amplitude-phase relationships. It augments the input data with its
Hilbert transform, creating a complex-valued field.
This method solves the following optimization problem:
\(\max_{q_x, q_y} \left( q_x^H X^H Y q_y \right)\)
subject to the constraints:
\(q_x^H q_x = 1, \quad q_y^H q_y = 1\)
where \(H\) denotes the conjugate transpose and \(X\) and \(Y\)
are the augmented data matrices.
An optional padding with exponentially decaying values can be applied prior
to the Hilbert transform in order to mitigate the impact of spectral
leakage.
Parameters:
n_modes (int, default=2) – Number of modes to calculate.
padding (Sequence[str] | str | None, default="exp") – Padding method for the Hilbert transform. Available options are: - None:
no padding - “exp”: exponential decay
decay_factor (Sequence[float] | float, default=0.2) – Decay factor for the exponential padding.
standardize (Squence[bool] | bool, default=False) – Whether to standardize the input data. Generally not recommended as
standardization can be managed by the degree of whitening.
use_coslat (Sequence[bool] | bool, default=False) – For data on a longitude-latitude grid, whether to correct for varying
grid cell areas towards the poles by scaling each grid point with the
square root of the cosine of its latitude.
use_pca (Sequence[bool] | bool, default=False) – Whether to preprocess each field individually by reducing dimensionality
through PCA. The cross-covariance matrix is computed in the reduced
principal component space.
n_pca_modes (Sequence[int | float | str] | int | float | str, default=0.999) – Number of modes to retain during PCA preprocessing step. If int,
specifies the exact number of modes; if float, specifies the fraction of
variance to retain; if “all”, all modes are retained.
pca_init_rank_reduction (Sequence[float] | float, default=0.3) – Relevant when use_pca=True and n_pca_modes is a float. Specifies the
initial fraction of rank reduction for faster PCA computation via
randomized SVD.
check_nans (Sequence[bool] | bool, default=True) – Whether to check for NaNs in the input data. Set to False for lazy model
evaluation.
compute (bool, default=True) – Whether to compute the model elements eagerly. If True, the following
are computed sequentially: preprocessor scaler, optional NaN checks, SVD
decomposition, scores, and components.
random_state (numpy.random.Generator | int | None, default=None) – Seed for the random number generator.
sample_name (str, default="sample") – Name for the new sample dimension.
feature_name (Sequence[str] | str, default="feature") – Name for the new feature dimension.
solver ({"auto", "full", "randomized"}) – Solver to use for the SVD computation.
solver_kwargs (dict, default={}) – Additional keyword arguments passed to the SVD solver function.
The components may be referred to differently depending on the model
type. Common terms include canonical vectors, singular vectors, loadings
or spatial patterns.
Parameters:
normalized (bool, default=True) – Whether to return L2 normalized components.
where m is the total number of modes and \(\sigma_i\) is the
ith singular value of the covariance matrix.
This implementation estimates the sum of singular values from the first
n modes, therefore one should aim to retain as many modes as possible
to get a good estimate of the covariance fraction.
Note
In MCA, the focus is on maximizing the squared covariance (SC). As a
result, this quantity is preserved during decomposition - meaning the SC
of both datasets remains unchanged before and after decomposition. Each
mode explains a fraction of the total SC, and together, all modes can
reconstruct the total SC of the cross-covariance matrix. However, the
(non-squared) covariance is not invariant in MCA; it is not preserved by
the individual modes and cannot be reconstructed from them.
Consequently, the squared covariance fraction (SCF) is invariant in MCA
and is typically used to assess the relative importance of each mode. In
contrast, the convariance fraction (CF) is not invariant. Cheng and
Dunkerton [3] introduced the CF to compare the relative importance of
modes before and after Varimax rotation in MCA. Notably, when the data
fields in MCA are identical, the CF corresponds to the explained
variance ratio in Principal Component Analysis (PCA).
The FVE X is the fraction of variance in \(X\) explained by the
scores of \(X\). It is computed as a weighted mean-square error (see
equation (15) in Swenson (2015)) :
The FVE YX is the fraction of variance in \(Y\) explained by the
scores of \(X\). It is computed as a weighted mean-square error (see
equation (15) in Swenson (2015)) :
where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the
residuals of the input data \(X\) and \(Y\) after reconstruction
by the ith scores of \(X\) and \(Y\), respectively.
References
Swenson, E. Continuum Power CCA: A Unified Approach for Isolating
Coupled Modes. Journal of Climate 28, 1016–1030 (2015).
The FVE Y is the fraction of variance in \(Y\) explained by the
scores of \(Y\). It is computed as a weighted mean-square error (see
equation (15) in Swenson (2015)) :
The component scores may be referred to differently depending on the
model type. Common terms include canonical variates, expansion
coefficents, principal component (scores) or temporal patterns.
Parameters:
normalized (bool, default=False) – Whether to return L2 normalized scores.
where \(\mathbf{d}_{X,i}\) and \(\mathbf{d}_{Y,i}\) are the
residuals of the input data \(X\) and \(Y\) after reconstruction
by the ith scores of \(X\) and \(Y\), respectively.
References
Swenson, E. Continuum Power CCA: A Unified Approach for Isolating
Coupled Modes. Journal of Climate 28, 1016–1030 (2015).