xarray Logo

Introduction to Xarray

Overview

The examples in this tutorial focus on the fundamentals of working with gridded, labeled data using Xarray. Xarray works by introducing additional abstractions into otherwise ordinary data arrays. In this tutorial, we demonstrate the usefulness of these abstractions. The examples in this tutorial explain how the proper usage of Xarray abstractions generally leads to simpler, more robust code.

The following topics will be covered in this tutorial:

Create a DataArray, one of the core object types in Xarray
Understand how to use named coordinates and metadata in a DataArray
Combine individual DataArrays into a Dataset, the other core object type in Xarray
Subset, slice, and interpolate the data using named coordinates
Open netCDF data using Xarray
Basic subsetting and aggregation of a Dataset
Brief introduction to plotting with Xarray

Prerequisites

Concepts	Importance	Notes
NumPy Basics	Necessary
Intermediate NumPy	Helpful	Familiarity with indexing and slicing arrays
NumPy Broadcasting	Helpful	Familiarity with array arithmetic and broadcasting
Introduction to Pandas	Helpful	Familiarity with labeled data
Datetime	Helpful	Familiarity with time formats and the `timedelta` object
Understanding of NetCDF	Helpful	Familiarity with metadata structure

Time to learn: 40 minutes

Imports

In earlier tutorials, we explained the abbreviation of commonly used scientific Python package names in import statements. Just as numpy is abbreviated np, and just as pandas is abbreviated pd, the name xarray is often abbreviated xr in import statements. In addition, we also import pythia_datasets, which provides sample data used in these examples.

from datetime import timedelta

import numpy as np
import pandas as pd
import xarray as xr
from pythia_datasets import DATASETS

/home/runner/micromamba/envs/pythia-book-dev/lib/python3.11/site-packages/pythia_datasets/__init__.py:4: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import DistributionNotFound, get_distribution

Plotting with Xarray

As demonstrated earlier in this tutorial, there are many benefits to storing data as Xarray DataArrays and Datasets. In this section, we will cover another major benefit: Xarray greatly simplifies plotting of data stored as DataArrays and Datasets. One advantage of this is that many common plot elements, such as axis labels, are automatically generated and optimized for the data being plotted. The next set of examples demonstrates this and provides a general overview of plotting with Xarray.

Simple visualization with `.plot()`

Similarly to Pandas, Xarray includes a built-in plotting interface, which makes use of Matplotlib behind the scenes. In order to use this interface, you can call the .plot() method, which is included in every DataArray.

In this example, we show how to create a basic plot from a DataArray. In this case, we are using the prof DataArray defined above, which contains a Colorado mean temperature profile.

prof.plot()

[<matplotlib.lines.Line2D at 0x7f142fc1cc50>]

../../_images/6003301b7f01ad23c0140ee252941c8edd18107982f52e46ec8e1fb2e317a4c1.png

In the figure shown above, Xarray has generated a line plot, which uses the mean temperature profile and the 'isobaric' coordinate variable as axes. In addition, the axis labels and unit information have been read automatically from the DataArray’s metadata.

Customizing the plot

As mentioned above, the .plot() method of Xarray DataArrays uses Matplotlib behind the scenes. Therefore, knowledge of Matplotlib can help you more easily customize plots generated by Xarray.

In this example, we need to customize the air temperature profile plot created above. There are two changes that need to be made:

swap the axes, so that the Y (vertical) axis corresponds to isobaric levels
invert the Y axis to match the model of air pressure decreasing at higher altitudes

We can make these changes by adding certain keyword arguments when calling .plot(), as shown below:

prof.plot(y="isobaric1", yincrease=False)

[<matplotlib.lines.Line2D at 0x7f142fafb090>]

../../_images/1d4cc59205086956b22e44df47db41e91a6ea87177436711d945d5528e82b2f3.png

Plotting 2-D data

In the previous example, we used .plot() to generate a plot from 1-D data, and the result was a line plot. In this section, we illustrate plotting of 2-D data.

In this example, we illustrate basic plotting of a 2-D array:

temps.sel(isobaric1=1000).plot()

<matplotlib.collections.QuadMesh at 0x7f142fae08d0>

../../_images/f7071195b4b820082e31567b046457f1323faba30c0b74f52589422dbf55a8d2.png

The figure above is generated by Matplotlib’s pcolormesh method, which was automatically called by Xarray’s plot method. This occurred because Xarray recognized that the DataArray object calling the plot method contained two distinct coordinate variables.

The plot generated by the above example is a map of air temperatures over North America, on the 1000 hPa isobaric surface. If a different map projection or added geographic features are needed on this plot, the plot can easily be modified using Cartopy.

Summary

Xarray expands on Pandas’ labeled-data functionality, bringing the usefulness of labeled data operations to N-dimensional data. As such, it has become a central workhorse in the geoscience community for the analysis of gridded datasets. Xarray allows us to open self-describing NetCDF files and make full use of the coordinate axes, labels, units, and other metadata. By making use of labeled coordinates, our code is often easier to write, easier to read, and more robust.

What’s next?

Additional notebooks to appear in this section will describe the following topics in greater detail:

performing arithmetic and broadcasting operations with Xarray data structures
using “group by” operations
remote data access with OPeNDAP
more advanced visualization, including map integration with Cartopy

Resources and references

This tutorial contains content adapted from the material in Unidata’s Python Training.

Most basic questions and issues with Xarray can be resolved with help from the material in the Xarray documentation. Some of the most popular sections of this documentation are listed below:

Another resource you may find useful is this Xarray Tutorial collection, created from content hosted on GitHub.

Introduction to Xarray

Overview

Prerequisites

Imports

Introducing the `DataArray` and `Dataset`

Creation of a `DataArray` object

Generate a random numpy array

Wrap the array: first attempt

Assign dimension names

Create a `DataArray` with named Coordinates

Make time and space coordinates

Initialize the `DataArray` with complete coordinate info

Set useful attributes

Issues with preservation of attributes

The `Dataset`: a container for `DataArray`s with shared coordinates

Create a pressure `DataArray` using the same coordinates

Create a `Dataset` object

Access Data variables and Coordinates in a `Dataset`

Subsetting and selection by coordinate values

NumPy-like selection

Selecting with `.sel()`

Approximate selection and interpolation

Nearest-neighbor sampling

Interpolation

Slicing along coordinates

One more selection method: `.loc`

Opening netCDF data

Access netCDF data with `xr.open_dataset`

Subsetting the `Dataset`

Aggregation operations

Plotting with Xarray

Simple visualization with `.plot()`

Customizing the plot

Plotting 2-D data

Summary

What’s next?

Resources and references

About

Give Feedback

Contribute

Introduction to Xarray

Overview

Prerequisites

Imports

Introducing the DataArray and Dataset

Creation of a DataArray object

Generate a random numpy array

Wrap the array: first attempt

Assign dimension names

Create a DataArray with named Coordinates

Make time and space coordinates

Initialize the DataArray with complete coordinate info

Set useful attributes

Issues with preservation of attributes

The Dataset: a container for DataArrays with shared coordinates

Create a pressure DataArray using the same coordinates

Create a Dataset object

Access Data variables and Coordinates in a Dataset

Subsetting and selection by coordinate values

NumPy-like selection

Selecting with .sel()

Approximate selection and interpolation

Nearest-neighbor sampling

Interpolation

Slicing along coordinates

One more selection method: .loc

Opening netCDF data

Access netCDF data with xr.open_dataset

Subsetting the Dataset

Aggregation operations

Plotting with Xarray

Simple visualization with .plot()

Customizing the plot

Plotting 2-D data

Summary

What’s next?

Resources and references

Introducing the `DataArray` and `Dataset`

Creation of a `DataArray` object

Create a `DataArray` with named Coordinates

Initialize the `DataArray` with complete coordinate info

The `Dataset`: a container for `DataArray`s with shared coordinates

Create a pressure `DataArray` using the same coordinates

Create a `Dataset` object

Access Data variables and Coordinates in a `Dataset`

Selecting with `.sel()`

One more selection method: `.loc`

Access netCDF data with `xr.open_dataset`

Subsetting the `Dataset`

Simple visualization with `.plot()`