{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quickstart"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We begin with a straightforward example: PCA (EOF analysis) of a 3D `xarray.Dataset`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the package\n",
    "We start by importing the xarray and xeofs package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "import xeofs as xe\n",
    "\n",
    "xr.set_options(display_expand_attrs=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the data\n",
    "Next, we fetch the data from the xarray tutorial repository. The data is a 3D dataset of 6 hourly surface air temperature over North America between 2013 and 2014. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t2m = xr.tutorial.open_dataset(\"air_temperature\")\n",
    "t2m"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fit the model\n",
    "\n",
    "To apply PCA to the data, we first need to create an `EOF` object. Since our data is on a regular latitude-longitude grid, we need to weigh each grid cell by the square root of the cosine of the latitude. This can be enabled by setting `use_coslat=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = xe.single.EOF(use_coslat=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now fit the model to the data. If you've worked with scikit-learn before, this process will seem familiar. However, there's an important difference: while scikit-learn's `fit` method typically assumes 2D input data shaped as (`sample` x `feature`), our scenario is less straightforward. For any model, including PCA, we must specify the sample dimension. With this information, xeofs will interpret all other dimensions as feature dimensions (more on that [here](core_functionalities/labeled_data.rst)).\n",
    "\n",
    "In climate science, it's common to maximize variance along the time dimension when applying PCA. Yet, this isn't the sole approach. For instance, [Compagnucci & Richmann (2007)](https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/joc.1574) discuss alternative applications. `xeofs` offers flexibility in this aspect. You can designate multiple sample dimensions, provided at least one feature dimension remains. For our purposes, we'll set `time` as our sample dimension and then fit the model to our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.fit(t2m, dim=\"time\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect the results"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the model has been fitted, we can examine the result. For example, one typically starts by looking at the explained variance (ratio). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.explained_variance_ratio()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can next examine the spatial patterns, which are the eigenvectors of the covariance matrix, often referred to as EOFs or principal components."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **_NOTE:_** The xeofs library aims to adhere to the convention where the primary patterns obtained from dimensionality reduction (which typically exclude the sample dimension) are termed components (akin to principal components). When data is projected onto these patterns, for instance using the `transform` method, the outcome is termed `scores` (similar to principal component scores). However, this terminology is more of a guideline than a strict rule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "components = model.components()\n",
    "components"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll observe that the result is an `xr.Dataset`, mirroring the format of our original input data. To visualize the components, we can use typical methods for xarray objects. Now, let's inspect the first component."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **_NOTE:_**  xeofs is designed to match the data type of its input. For instance, if you provide an `xr.DataArray` as input, the components will also be of type `xr.DataArray`. Similarly, if the input is an `xr.Dataset`, the components will mirror that as an `xr.Dataset`. The same principle applies if the input is a `list`; the output components will be presented in a `list` format. This consistent behavior is maintained across all xeofs methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "components[\"air\"].sel(mode=1).plot()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also examine the principal component scores, which represent the corresponding time series."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **_NOTE:_**  When comparing the scores from xeofs to outputs from other PCA implementations like [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) or [eofs](https://ajdawson.github.io/eofs/latest/), you might spot discrepancies in the absolute values. This arises because xeofs typically returns scores normalized by the L2 norm. However, if you prefer unnormalized scores, simply set `normalized=False` when using the `scores` or `transform` method.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.scores().sel(mode=1).plot()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}