{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Quickstart" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We begin with a straightforward example: PCA (EOF analysis) of a 3D `xarray.Dataset`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import the package\n", "We start by importing the ``xarray`` and ``xeofs`` package." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import xeofs as xe\n", "\n", "xr.set_options(display_expand_attrs=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the data\n", "Next, we fetch the data from the ``xarray`` tutorial repository. The data is a 3D dataset of 6 hourly surface air temperature over North America between 2013 and 2014. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t2m = xr.tutorial.open_dataset('air_temperature')\n", "t2m" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Fit the model\n", "\n", "In order to apply PCA to the data, we first have to create an `EOF` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = xe.models.EOF()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now fit the model to the data. If you've worked with `sklearn` before, this process will seem familiar. However, there's an important difference: while `sklearn` `fit` method typically assumes 2D input data shaped as (`sample` x `feature`), our scenario is less straightforward. For any model, including PCA, we must specify the sample dimension. With this information, `xeofs` will interpret all other dimensions as feature dimensions.\n", "\n", "In climate science, it's common to maximize variance along the time dimension when applying PCA. Yet, this isn't the sole approach. For instance, [Compagnucci & Richmann (2007)](https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/joc.1574) discuss alternative applications.\n", "\n", "`xeofs` offers flexibility in this aspect. You can designate multiple sample dimensions, provided at least one feature dimension remains. For our purposes, we'll set `time` as our sample dimension and then fit the model to our data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.fit(t2m, dim='time')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspect the results" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now that the model has been fitted, we can examine the result. For example, one typically starts by looking at the explained variance (ratio). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.explained_variance_ratio()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can next examine the spatial patterns, which are the eigenvectors of the covariance matrix, often referred to as EOFs or principal components." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **_NOTE:_** The `xeofs` library aims to adhere to the convention where the primary patterns obtained from dimensionality reduction (which typically exclude the sample dimension) are termed components (akin to principal components). When data is projected onto these patterns, for instance using the `transform` method, the outcome is termed `scores` (similar to principal component scores). However, this terminology is more of a guideline than a strict rule." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "components = model.components()\n", "components" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "You'll observe that the result is an `xr.Dataset`, mirroring the format of our original input data. To visualize the components, we can use typical methods for `xarray` objects. Now, let's inspect the first component." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **_NOTE:_** `xeofs` is designed to match the data type of its input. For instance, if you provide an `xr.DataArray` as input, the components will also be of type `xr.DataArray`. Similarly, if the input is an `xr.Dataset`, the components will mirror that as an `xr.Dataset`. The same principle applies if the input is a `list`; the output components will be presented in a `list` format. This consistent behavior is maintained across all `xeofs` methods." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "components[\"air\"].sel(mode=1).plot()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can also examine the principal component scores, which represent the corresponding time series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **_NOTE:_** When comparing the scores from `xeofs` to outputs from other PCA implementations like `sklearn` or `eofs`, you might spot discrepancies in the absolute values. This arises because `xeofs` typically returns scores normalized by the L2 norm. However, if you prefer unnormalized scores, simply set `normalized=False` when using the `scores` or `transform` method.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.scores().sel(mode=1).plot()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }