.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/2multi/plot_cca.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_2multi_plot_cca.py: Canonical Correlation Analysis ============================== In this example, we're going to perform a Canonical Correlation Analysis (CCA) on three datasets using the ERSSTv5 monthly sea surface temperature (SST) data from 1970 to 2022. We divide this data into three areas: the Indian Ocean, the Pacific Ocean, and the Atlantic Ocean. Our goal is to perform CCA on these regions. First, we'll import the necessary modules. .. GENERATED FROM PYTHON SOURCE LINES 13-21 .. code-block:: Python import xarray as xr import xeofs as xe import matplotlib.pyplot as plt from matplotlib.gridspec import GridSpec import cartopy.crs as ccrs .. GENERATED FROM PYTHON SOURCE LINES 22-24 Next, we load the data and compute the SST anomalies. This removes the monthly climatologies, so the seasonal cycle doesn't impact our CCA. .. GENERATED FROM PYTHON SOURCE LINES 24-29 .. code-block:: Python sst = xr.tutorial.load_dataset("ersstv5").sst sst = sst.groupby("time.month") - sst.groupby("time.month").mean("time") .. GENERATED FROM PYTHON SOURCE LINES 30-31 Now, we define the three regions of interest and store them in a list. .. GENERATED FROM PYTHON SOURCE LINES 31-38 .. code-block:: Python indian = sst.sel(lon=slice(35, 115), lat=slice(30, -30)) pacific = sst.sel(lon=slice(130, 290), lat=slice(30, -30)) atlantic = sst.sel(lon=slice(320, 360), lat=slice(70, 10)) data_list = [indian, pacific, atlantic] .. GENERATED FROM PYTHON SOURCE LINES 39-58 We now perform CCA. Since we are dealing with a high-dimensional feature space, we first perform PCA to reduce the dimensionality (this is kind of a regularized CCA) by setting ``pca=True``. By setting the ``variance_fraction`` keyword argument, we specify that we want to keep the number of PCA modes that explain 90% of the variance in each of the three data sets. An important parameter is ``init_pca_modes``. It specifies the number of PCA modes that are initially compute before truncating them to account for 90 %. If this number is small enough, randomized PCAs will be performed instead of the full SVD decomposition which is much faster. We can also specify ``init_pca_modes`` as a float (0 < x <= 1), in which case the number of PCA modes is given by the fraction of the data matrix's rank The default is set to 0.75 which will ensure that randomized PCAs are performed. Given the nature of SST data, we might lower it to something like 0.3, since we expect that most of the variance in the data will be explained by a small number of PC modes. Note that if our initial PCA modes don't hit the 90% variance target, ``xeofs`` will give a warning. .. GENERATED FROM PYTHON SOURCE LINES 58-70 .. code-block:: Python model = xe.models.CCA( n_modes=2, use_coslat=True, pca=True, variance_fraction=0.9, init_pca_modes=0.30, ) model.fit(data_list, dim="time") components = model.components() scores = model.scores() .. GENERATED FROM PYTHON SOURCE LINES 71-72 Let's look at the canonical loadings (components) of the first mode. .. GENERATED FROM PYTHON SOURCE LINES 72-95 .. code-block:: Python mode = 1 central_longitudes = [ indian.lon.median().item(), pacific.lon.median().item(), pacific.lon.median().item(), ] projections = [ccrs.PlateCarree(central_longitude=lon) for lon in central_longitudes] fig = plt.figure(figsize=(12, 2.5)) gs = GridSpec(1, 4, figure=fig, width_ratios=[2, 4, 1, 0.2]) axes = [fig.add_subplot(gs[0, i], projection=projections[i]) for i in range(3)] cax = fig.add_subplot(1, 4, 4) kwargs = dict(transform=ccrs.PlateCarree(), vmin=-1, vmax=1, cmap="RdBu_r", cbar_ax=cax) components[0].sel(mode=mode).plot(ax=axes[0], **kwargs) components[1].sel(mode=mode).plot(ax=axes[1], **kwargs) im = components[2].sel(mode=mode).plot(ax=axes[2], **kwargs) fig.colorbar(im, cax=cax, orientation="vertical") for ax in axes: ax.coastlines() ax.set_title("") .. image-sg:: /auto_examples/2multi/images/sphx_glr_plot_cca_001.png :alt: plot cca :srcset: /auto_examples/2multi/images/sphx_glr_plot_cca_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 96-97 And lastly, we'll check out the canonical variates (scores) of the first mode. .. GENERATED FROM PYTHON SOURCE LINES 97-103 .. code-block:: Python fig, ax = plt.subplots(figsize=(12, 4)) scores[0].sel(mode=mode).plot(ax=ax, label="Indian Ocean") scores[1].sel(mode=mode).plot(ax=ax, label="Central Pacific") scores[2].sel(mode=mode).plot(ax=ax, label="North Atlantic") ax.legend() .. image-sg:: /auto_examples/2multi/images/sphx_glr_plot_cca_002.png :alt: mode = 1 :srcset: /auto_examples/2multi/images/sphx_glr_plot_cca_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.632 seconds) .. _sphx_glr_download_auto_examples_2multi_plot_cca.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_cca.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_cca.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_