.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/1single/plot_gwpca.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_1single_plot_gwpca.py: Geographically weighted PCA =========================== Geographically Weighted Principal Component Analysis (GWPCA) is a spatial analysis method that identifies and visualizes local spatial patterns and relationships in multivariate datasets across various geographic areas. It operates by applying PCA within a moving window over a geographical region, which enables the extraction of local principal components that can differ across locations. TIn this demonstration, we'll apply GWPCA to a dataset detailing the chemical compositions of soils from countries around the Baltic Sea [1]_. This example is inspired by a tutorial originally crafted and published by Chris Brunsdon [2]_. The dataset comprises 10 variables (chemical elements) and spans 768 samples. Here, each sample refers to a pair of latitude and longitude coordinates, representing specific sampling stations. .. [1] Reimann, C. et al. Baltic soil survey: total concentrations of major and selected trace elements in arable soils from 10 countries around the Baltic Sea. Science of The Total Environment 257, 155–170 (2000). .. [2] https://rpubs.com/chrisbrunsdon/99675 .. note:: The dataset we're using is found in the R package `mvoutlier `_. To access it, we'll employ the Python package `rpy2 `_ which facilitates interaction with R packages from within Python. .. note:: Presently, there's no support for ``xarray.Dataset`` lacking an explicit feature dimension. As a workaround, ``xarray.DataArray.to_array`` can be used to convert the ``Dataset`` to an ``DataArray``. .. warning:: Bear in mind that GWPCA requires significant computational power. The ``xeofs`` implementation is optimized for CPU efficiency and is best suited for smaller to medium data sets. For more extensive datasets where parallel processing becomes essential, it's advisable to turn to the R package `GWmodel `_. This package harnesses CUDA to enable GPU-accelerated GWPCA for optimized performance. Let's import the necessary packages. .. GENERATED FROM PYTHON SOURCE LINES 33-48 .. code-block:: Python # For the analysis import numpy as np import xarray as xr import xeofs as xe # For visualization import matplotlib.pyplot as plt import seaborn as sns # For accessing R packages import rpy2.robjects as ro from rpy2.robjects.packages import importr from rpy2.robjects import pandas2ri .. GENERATED FROM PYTHON SOURCE LINES 49-51 Next, we'll install the R package `mvoutlier `_ using the `rpy2 `_ package. .. GENERATED FROM PYTHON SOURCE LINES 51-57 .. code-block:: Python xr.set_options(display_expand_data=False) utils = importr("utils") utils.chooseCRANmirror(ind=1) utils.install_packages("mvoutlier") .. rst-class:: sphx-glr-script-out .. code-block:: none R[write to console]: also installing the dependencies ‘DEoptimR’, ‘sgeostat’, ‘robustbase’ R[write to console]: trying URL 'https://cloud.r-project.org/src/contrib/DEoptimR_1.1-3.tar.gz' R[write to console]: Content type 'application/x-gzip' R[write to console]: length 19222 bytes (18 KB) R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: R[write to console]: downloaded 18 KB R[write to console]: trying URL 'https://cloud.r-project.org/src/contrib/sgeostat_1.0-27.tar.gz' R[write to console]: Content type 'application/x-gzip' R[write to console]: length 35685 bytes (34 KB) R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: R[write to console]: downloaded 34 KB R[write to console]: trying URL 'https://cloud.r-project.org/src/contrib/robustbase_0.99-2.tar.gz' R[write to console]: Content type 'application/x-gzip' R[write to console]: length 2292086 bytes (2.2 MB) R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: R[write to console]: downloaded 2.2 MB R[write to console]: trying URL 'https://cloud.r-project.org/src/contrib/mvoutlier_2.1.1.tar.gz' R[write to console]: Content type 'application/x-gzip' R[write to console]: length 476636 bytes (465 KB) R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: = R[write to console]: R[write to console]: downloaded 465 KB R[write to console]: R[write to console]: R[write to console]: The downloaded source packages are in ‘/tmp/Rtmpr8iXHn/downloaded_packages’ R[write to console]: R[write to console]: R[write to console]: Updating HTML index of packages in '.Library' R[write to console]: Making 'packages.html' ... R[write to console]: done [0] .. GENERATED FROM PYTHON SOURCE LINES 58-61 Let's load the dataset and convert it into a ``pandas.DataFrame``. Alongside, we'll also load the background data that outlines the borders of countries in the Baltic Sea region. This will help us visually represent the GWPCA results. .. GENERATED FROM PYTHON SOURCE LINES 61-75 .. code-block:: Python ro.r( """ require("mvoutlier") data(bsstop) Data <- bsstop[,1:14] background <- bss.background """ ) with (ro.default_converter + pandas2ri.converter).context(): data_df = ro.conversion.get_conversion().rpy2py(ro.r["Data"]) background_df = ro.conversion.get_conversion().rpy2py(ro.r["background"]) data_df.head() .. rst-class:: sphx-glr-script-out .. code-block:: none R[write to console]: Loading required package: mvoutlier R[write to console]: Loading required package: sgeostat .. raw:: html
ID CNo XCOO YCOO SiO2_T TiO2_T Al2O3_T Fe2O3_T MnO_T MgO_T CaO_T Na2O_T K2O_T P2O5_T
1 5001.0 60.0 -619656.5 6805304.1 43.61 1.290 13.07 12.25 0.167 3.22 2.48 1.14 2.01 0.481
2 5002.0 120.0 214714.1 7745546.6 58.73 0.913 14.78 6.48 0.105 2.47 3.08 2.19 1.78 0.298
3 5003.0 33.0 -368415.5 7065039.2 58.14 0.902 11.89 5.70 0.126 2.44 3.17 2.13 1.16 0.408
4 5004.0 39.0 226609.0 6922431.0 43.98 0.524 10.00 4.08 0.052 1.00 1.37 1.60 1.82 0.395
5 5005.0 103.0 544050.0 7808760.0 60.90 0.702 13.20 6.37 0.079 2.59 3.13 2.97 1.35 0.139


.. GENERATED FROM PYTHON SOURCE LINES 76-77 Since ``xeofs`` uses ``xarray``, we convert the data into an ``xarray.DataArray``. .. GENERATED FROM PYTHON SOURCE LINES 77-86 .. code-block:: Python data_df = data_df.rename(columns={"ID": "station"}).set_index("station") data = data_df.to_xarray() data = data.rename({"XCOO": "x", "YCOO": "y"}) data = data.set_index(station=("x", "y")) data = data.drop_vars("CNo") da = data.to_array(dim="element") da .. raw:: html
<xarray.DataArray (element: 10, station: 768)> Size: 61kB
    43.61 58.73 58.14 43.98 60.9 54.0 82.72 ... 0.196 0.202 0.207 0.109 0.141 0.185
    Coordinates:
      * station  (station) object 6kB MultiIndex
      * x        (station) float64 6kB -6.197e+05 2.147e+05 ... -2.82e+05 -1.273e+05
      * y        (station) float64 6kB 6.805e+06 7.746e+06 ... 5.796e+06 6.523e+06
      * element  (element) object 80B 'SiO2_T' 'TiO2_T' ... 'K2O_T' 'P2O5_T'


.. GENERATED FROM PYTHON SOURCE LINES 87-98 Let's dive into the GWPCA. First, initialize a ``GWPCA`` instance and fit it to the data. The ``station`` dimension serves as our sample dimension, along which the local PCAs will be applied. Since these PCAs need to gauge distances to adjacent stations, we must specify a distance metric. Our station data includes coordinates in meters, so we'll choose the ``euclidean`` metric. If you have coordinates in degrees (like latitude and longitude), choose the ``haversine`` metric instead. We're also using a ``bisquare`` kernel with a bandwidth of 1000 km. Note that the bandwidth unit always follows input data (which is in meters here), except when using the ``haversine`` metric, which always gives distances in kilometers. Lastly, we'll standardize the input to ensure consistent scales for the chemical elements. .. GENERATED FROM PYTHON SOURCE LINES 98-110 .. code-block:: Python gwpca = xe.models.GWPCA( n_modes=5, standardize=True, metric="euclidean", kernel="bisquare", bandwidth=1000000.0, ) gwpca.fit(da, "station") gwpca.components() .. raw:: html
<xarray.DataArray 'components' (mode: 5, element: 10, station: 768)> Size: 307kB
    0.1813 -0.3584 0.1243 0.2 -0.3812 ... 0.1229 -0.2865 0.4732 0.4197 -0.4249
    Coordinates:
      * mode     (mode) int64 40B 1 2 3 4 5
      * element  (element) object 80B 'SiO2_T' 'TiO2_T' ... 'K2O_T' 'P2O5_T'
      * station  (station) object 6kB MultiIndex
      * x        (station) float64 6kB -6.197e+05 2.147e+05 ... -2.82e+05 -1.273e+05
      * y        (station) float64 6kB 6.805e+06 7.746e+06 ... 5.796e+06 6.523e+06
    Attributes: (12/16)
        model:          GWPCA
        software:       xeofs
        version:        2.3.2
        date:           2024-03-31 21:13:03
        n_modes:        5
        center:         True
        ...             ...
        feature_name:   feature
        random_state:   None
        verbose:        False
        compute:        True
        solver:         auto
        solver_kwargs:  {}


.. GENERATED FROM PYTHON SOURCE LINES 111-116 The ``components`` method returns the local principal components for each station. Note that the dimensionality of the returned array is ``[station, element, mode]``, so in practice we don't really have reduced the dimensionality of the data set. However, we can extract the largest locally weighted components for each station which tells us which chemical elements dominate the local PCAs. .. GENERATED FROM PYTHON SOURCE LINES 116-120 .. code-block:: Python llwc = gwpca.largest_locally_weighted_components() llwc .. raw:: html
<xarray.DataArray 'largest_locally_weighted_components' (mode: 5, station: 768)> Size: 31kB
    'MgO_T' 'Al2O3_T' 'MgO_T' 'TiO2_T' ... 'K2O_T' 'Fe2O3_T' 'Fe2O3_T' 'CaO_T'
    Coordinates:
      * mode     (mode) int64 40B 1 2 3 4 5
      * station  (station) object 6kB MultiIndex
      * x        (station) float64 6kB -6.197e+05 2.147e+05 ... -2.82e+05 -1.273e+05
      * y        (station) float64 6kB 6.805e+06 7.746e+06 ... 5.796e+06 6.523e+06


.. GENERATED FROM PYTHON SOURCE LINES 121-126 Let's visualize the spatial patterns of the chemical elements. As the stations are positioned on a irregular grid, we'll transform the ``llwc`` ``DataArray`` into a ``pandas.DataFrame``. After that, we can easily visualize it using the ``scatter`` method. For demonstation, we'll concentrate on the first mode: .. GENERATED FROM PYTHON SOURCE LINES 126-157 .. code-block:: Python llwc1_df = llwc.sel(mode=1).to_dataframe() elements = da.element.values n_elements = len(elements) colors = np.arange(n_elements) col_dict = {el: col for el, col in zip(elements, colors)} llwc1_df["colors"] = llwc1_df["largest_locally_weighted_components"].map(col_dict) cmap = sns.color_palette("tab10", n_colors=n_elements, as_cmap=True) fig = plt.figure(figsize=(10, 10)) ax = fig.add_subplot(111) background_df.plot.scatter(ax=ax, x="V1", y="V2", color=".3", marker=".", s=1) s = ax.scatter( x=llwc1_df["x"], y=llwc1_df["y"], c=llwc1_df["colors"], ec="w", s=40, cmap=cmap, vmin=-0.5, vmax=n_elements - 0.5, ) cbar = fig.colorbar(mappable=s, ax=ax, label="Largest locally weighted component") cbar.set_ticks(colors) cbar.set_ticklabels(elements) ax.set_title("Largest locally weighted element", loc="left", weight=800) plt.show() .. image-sg:: /auto_examples/1single/images/sphx_glr_plot_gwpca_001.png :alt: Largest locally weighted element :srcset: /auto_examples/1single/images/sphx_glr_plot_gwpca_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 158-163 In the final step, let's examine the explained variance. Like standard PCA, this gives us insight into the variance explained by each mode. But with a local PCA for every station, the explained variance varies spatially. Notably, the first mode's explained variance differs across countries, ranging from roughly 40% to 70%. .. GENERATED FROM PYTHON SOURCE LINES 163-176 .. code-block:: Python exp_var_ratio = gwpca.explained_variance_ratio() evr1_df = exp_var_ratio.sel(mode=1).to_dataframe() fig = plt.figure(figsize=(10, 10)) ax = fig.add_subplot(111) background_df.plot.scatter(ax=ax, x="V1", y="V2", color=".3", marker=".", s=1) evr1_df.plot.scatter( ax=ax, x="x", y="y", c="explained_variance_ratio", vmin=0.4, vmax=0.7 ) ax.set_title("Fraction of locally explained variance", loc="left", weight=800) plt.show() .. image-sg:: /auto_examples/1single/images/sphx_glr_plot_gwpca_002.png :alt: Fraction of locally explained variance :srcset: /auto_examples/1single/images/sphx_glr_plot_gwpca_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 34.593 seconds) .. _sphx_glr_download_auto_examples_1single_plot_gwpca.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gwpca.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gwpca.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_