.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/2multi/plot_cca.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_2multi_plot_cca.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_2multi_plot_cca.py:


Canonical Correlation Analysis
==============================

In this example, we're going to perform a Canonical Correlation Analysis (CCA)
on three datasets using the ERSSTv5 monthly sea surface temperature (SST) data
from 1970 to 2022. We divide this data into three areas: the Indian Ocean,
the Pacific Ocean, and the Atlantic Ocean. Our goal is to perform CCA on these
regions.

First, we'll import the necessary modules.

.. GENERATED FROM PYTHON SOURCE LINES 13-21

.. code-block:: Python


    import xarray as xr
    import xeofs as xe

    import matplotlib.pyplot as plt
    from matplotlib.gridspec import GridSpec
    import cartopy.crs as ccrs


.. GENERATED FROM PYTHON SOURCE LINES 22-24

Next, we load the data and compute the SST anomalies. This removes the
monthly climatologies, so the seasonal cycle doesn't impact our CCA.

.. GENERATED FROM PYTHON SOURCE LINES 24-29

.. code-block:: Python


    sst = xr.tutorial.load_dataset("ersstv5").sst
    sst = sst.groupby("time.month") - sst.groupby("time.month").mean("time")


.. GENERATED FROM PYTHON SOURCE LINES 30-31

Now, we define the three regions of interest and store them in a list.

.. GENERATED FROM PYTHON SOURCE LINES 31-38

.. code-block:: Python


    indian = sst.sel(lon=slice(35, 115), lat=slice(30, -30))
    pacific = sst.sel(lon=slice(130, 290), lat=slice(30, -30))
    atlantic = sst.sel(lon=slice(320, 360), lat=slice(70, 10))

    data_list = [indian, pacific, atlantic]


.. GENERATED FROM PYTHON SOURCE LINES 39-58

We now perform CCA. Since we are dealing with a high-dimensional feature space, we first
perform PCA to reduce the dimensionality (this is kind of a regularized CCA) by setting
``pca=True``. By setting the ``variance_fraction`` keyword argument, we specify that we
want to keep the number of PCA modes that explain 90% of the variance in each of the
three data sets.

An important parameter is ``init_pca_modes``. It specifies the number
of PCA modes that are initially compute before truncating them to account for 90 %. If this
number is small enough, randomized PCAs will be performed instead of the full SVD decomposition
which is much faster. We can also specify ``init_pca_modes`` as a float (0 < x <= 1),
in which case the number of PCA modes is given by the fraction of the data matrix's rank
The default is set to 0.75 which will ensure that randomized PCAs are performed.

Given the nature of SST data, we might lower it to something like 0.3, since
we expect that most of the variance in the data will be explained by a small
number of PC modes.

Note that if our initial PCA modes don't hit the 90% variance target, ``xeofs``
will give a warning.

.. GENERATED FROM PYTHON SOURCE LINES 58-70

.. code-block:: Python


    model = xe.models.CCA(
        n_modes=2,
        use_coslat=True,
        pca=True,
        variance_fraction=0.9,
        init_pca_modes=0.30,
    )
    model.fit(data_list, dim="time")
    components = model.components()
    scores = model.scores()


.. GENERATED FROM PYTHON SOURCE LINES 71-72

Let's look at the canonical loadings (components) of the first mode.

.. GENERATED FROM PYTHON SOURCE LINES 72-95

.. code-block:: Python


    mode = 1

    central_longitudes = [
        indian.lon.median().item(),
        pacific.lon.median().item(),
        pacific.lon.median().item(),
    ]
    projections = [ccrs.PlateCarree(central_longitude=lon) for lon in central_longitudes]

    fig = plt.figure(figsize=(12, 2.5))
    gs = GridSpec(1, 4, figure=fig, width_ratios=[2, 4, 1, 0.2])
    axes = [fig.add_subplot(gs[0, i], projection=projections[i]) for i in range(3)]
    cax = fig.add_subplot(1, 4, 4)
    kwargs = dict(transform=ccrs.PlateCarree(), vmin=-1, vmax=1, cmap="RdBu_r", cbar_ax=cax)
    components[0].sel(mode=mode).plot(ax=axes[0], **kwargs)
    components[1].sel(mode=mode).plot(ax=axes[1], **kwargs)
    im = components[2].sel(mode=mode).plot(ax=axes[2], **kwargs)
    fig.colorbar(im, cax=cax, orientation="vertical")
    for ax in axes:
        ax.coastlines()
        ax.set_title("")


.. image-sg:: /auto_examples/2multi/images/sphx_glr_plot_cca_001.png
   :alt: plot cca
   :srcset: /auto_examples/2multi/images/sphx_glr_plot_cca_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 96-97

And lastly, we'll check out the canonical variates (scores) of the first mode.

.. GENERATED FROM PYTHON SOURCE LINES 97-103

.. code-block:: Python


    fig, ax = plt.subplots(figsize=(12, 4))
    scores[0].sel(mode=mode).plot(ax=ax, label="Indian Ocean")
    scores[1].sel(mode=mode).plot(ax=ax, label="Central Pacific")
    scores[2].sel(mode=mode).plot(ax=ax, label="North Atlantic")
    ax.legend()


.. image-sg:: /auto_examples/2multi/images/sphx_glr_plot_cca_002.png
   :alt: mode = 1
   :srcset: /auto_examples/2multi/images/sphx_glr_plot_cca_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <matplotlib.legend.Legend object at 0x7fa6fa2bb3d0>


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.632 seconds)


.. _sphx_glr_download_auto_examples_2multi_plot_cca.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_cca.ipynb <plot_cca.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_cca.py <plot_cca.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_