NumPy Basics

Overview

Welcome to your first Python library - NumPy! NumPy is the fundamental package for numerical operations with Python. It contains among other things:

a powerful N-dimensional array object
sophisticated (broadcasting) functions
useful linear algebra, Fourier transform, and random number capabilities

Let’s get you started with the basics! In this notebook we will cover

Creating an array
Math and calculations with arrays
Inspecting an array with slicing and indexing

Prerequisites

Concepts	Importance	Notes
Python Quickstart	Necessary	Lists, indexing, slicing, math

Time to learn: 35 minutes

Imports

A common convention you might encounter is to rename numpy to np on import to shorten it for the many times we will be calling on numpy for functionality.

import numpy as np

Create an array of ‘data’

The NumPy array represents a contiguous block of memory, holding entries of a given type (and hence fixed size). The entries are laid out in memory according to the shape, or list of dimension sizes. Let’s start by creating an array from a list of integers and taking a look at it,

a = np.array([1, 2, 3])
a

array([1, 2, 3])

We can inspect the number of dimensions our array is organized along with ndim, and how long each of these dimensions are with shape

a.ndim

a.shape

(3,)

So our 1-dimensional array has a shape of 3 along that dimension! Finally we can check out the underlying type of our underlying data,

a.dtype

dtype('int64')

Now, let’s expand this with a new data type, and by using a list of lists we can grow the dimensions of our array!

a = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
a

array([[1., 2., 3.],
       [4., 5., 6.]])

a.ndim

a.shape

(2, 3)

a.dtype

dtype('float64')

And as before we can use ndim, shape, and dtype to discover how many dimensions of what lengths are making up our array of floats.

Generation

NumPy also provides helper functions for generating arrays of data to save you typing for regularly spaced data. Don’t forget your Python indexing rules!

arange(start, stop, step) creates a range of values in the interval [start,stop) with step spacing.
linspace(start, stop, num) creates a range of num evenly spaced values over the range [start,stop].

arange

a = np.arange(5)
a

array([0, 1, 2, 3, 4])

a = np.arange(3, 11)
a

array([ 3,  4,  5,  6,  7,  8,  9, 10])

a = np.arange(1, 10, 2)
a

array([1, 3, 5, 7, 9])

linspace

b = np.linspace(0, 4, 5)
b

array([0., 1., 2., 3., 4.])

b.shape

(5,)

b = np.linspace(3, 10, 15)
b

array([ 3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,  6.5,  7. ,  7.5,  8. ,
        8.5,  9. ,  9.5, 10. ])

b = np.linspace(2.5, 10.25, 11)
b

array([ 2.5  ,  3.275,  4.05 ,  4.825,  5.6  ,  6.375,  7.15 ,  7.925,
        8.7  ,  9.475, 10.25 ])

b = np.linspace(0, 100, 30)
b

array([  0.        ,   3.44827586,   6.89655172,  10.34482759,
        13.79310345,  17.24137931,  20.68965517,  24.13793103,
        27.5862069 ,  31.03448276,  34.48275862,  37.93103448,
        41.37931034,  44.82758621,  48.27586207,  51.72413793,
        55.17241379,  58.62068966,  62.06896552,  65.51724138,
        68.96551724,  72.4137931 ,  75.86206897,  79.31034483,
        82.75862069,  86.20689655,  89.65517241,  93.10344828,
        96.55172414, 100.        ])

Perform calculations with NumPy

Arithmetic

In core Python, that is without NumPy, creating sequences of values and adding them together requires writing a lot of manual loops, just like one would do in C/C++:

a = list(range(5, 10))
b = [3 + i * 1.5 / 4 for i in range(5)]

a, b

([5, 6, 7, 8, 9], [3.0, 3.375, 3.75, 4.125, 4.5])

result = []
for x, y in zip(a, b):
    result.append(x + y)
print(result)

[8.0, 9.375, 10.75, 12.125, 13.5]

That is very verbose and not very intuitive. Using NumPy this becomes:

a = np.arange(5, 10)
b = np.linspace(3, 4.5, 5)

a + b

array([ 8.   ,  9.375, 10.75 , 12.125, 13.5  ])

Many major mathematical operations operate in the same way. They perform an element-by-element calculation of the two arrays.

a - b

array([2.   , 2.625, 3.25 , 3.875, 4.5  ])

a / b

array([1.66666667, 1.77777778, 1.86666667, 1.93939394, 2.        ])

a**b

array([  125.        ,   422.92218768,  1476.10635524,  5311.85481585,
       19683.        ])

Warning

These arrays must be the same shape!

b = np.linspace(3, 4.5, 6)
a.shape, b.shape

((5,), (6,))

a * b

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 1
----> 1 a * b

ValueError: operands could not be broadcast together with shapes (5,) (6,) 

Constants

NumPy provides us access to some useful constants as well - remember you should never be typing these in manually! Other libraries such as SciPy and MetPy have their own set of constants that are more domain specific.

np.pi

3.141592653589793

np.e

2.718281828459045

You can use these for classic calculations you might be familiar with! Here we can create a range t = [0, 2 pi] by pi/4,

t = np.arange(0, 2 * np.pi + np.pi / 4, np.pi / 4)
t

array([0.        , 0.78539816, 1.57079633, 2.35619449, 3.14159265,
       3.92699082, 4.71238898, 5.49778714, 6.28318531])

t / np.pi

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

Array math functions

NumPy also has math functions that can operate on arrays. Similar to the math operations, these greatly simplify and speed up these operations. Let’s start with calculating \(\sin(t)\)!

sin_t = np.sin(t)
sin_t

array([ 0.00000000e+00,  7.07106781e-01,  1.00000000e+00,  7.07106781e-01,
        1.22464680e-16, -7.07106781e-01, -1.00000000e+00, -7.07106781e-01,
       -2.44929360e-16])

and clean it up a bit by rounding to three decimal places.

np.round(sin_t, 3)

array([ 0.   ,  0.707,  1.   ,  0.707,  0.   , -0.707, -1.   , -0.707,
       -0.   ])

cos_t = np.cos(t)
cos_t

array([ 1.00000000e+00,  7.07106781e-01,  6.12323400e-17, -7.07106781e-01,
       -1.00000000e+00, -7.07106781e-01, -1.83697020e-16,  7.07106781e-01,
        1.00000000e+00])

Info

Check out NumPy’s list of mathematical functions here!

We can convert between degrees and radians with only NumPy, by hand

t / np.pi * 180

array([  0.,  45.,  90., 135., 180., 225., 270., 315., 360.])

or with built-in function rad2deg,

degrees = np.rad2deg(t)
degrees

array([  0.,  45.,  90., 135., 180., 225., 270., 315., 360.])

We are similarly provided algorithms for operations including integration, bulk summing, and cumulative summing.

sine_integral = np.trapz(sin_t, t)
np.round(sine_integral, 3)

/tmp/ipykernel_2675/67500735.py:1: DeprecationWarning: `trapz` is deprecated. Use `trapezoid` instead, or one of the numerical integration functions in `scipy.integrate`.
  sine_integral = np.trapz(sin_t, t)

np.float64(-0.0)

cos_sum = np.sum(cos_t)
cos_sum

np.float64(0.9999999999999996)

cos_csum = np.cumsum(cos_t)
print(cos_csum)

[ 1.00000000e+00  1.70710678e+00  1.70710678e+00  1.00000000e+00
  0.00000000e+00 -7.07106781e-01 -7.07106781e-01 -5.55111512e-16
  1.00000000e+00]

Indexing and subsetting arrays

Indexing

We can use integer indexing to reach into our arrays and pull out individual elements. Let’s make a toy 2-d array to explore. Here we create a 12-value arange and reshape it into a 3x4 array.

a = np.arange(12).reshape(3, 4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Recall that Python indexing starts at 0, and we can begin indexing our array with the list-style list[element] notation,

a[0]

array([0, 1, 2, 3])

to pull out just our first row of data within a. Similarly we can index in reverse with negative indices,

a[-1]

array([ 8,  9, 10, 11])

to pull out just the last row of data within a. This notation extends to as many dimensions as make up our array as array[m, n, p, ...]. The following diagram shows these indices for an example, 2-dimensional 6x6 array,

For example, let’s find the entry in our array corresponding to the 2nd row (m=1 in Python) and the 3rd column (n=2 in Python)

a[1, 2]

np.int64(6)

We can again use these negative indices to index backwards,

a[-1, -1]

np.int64(11)

and even mix-and-match along dimensions,

a[1, -2]

np.int64(6)

Slices

Slicing syntax is written as array[start:stop:step]. Note that all numbers are optional. Importantly, the step parameter is optional and can be omitted, in which case the slice uses a default step of 1.

defaults:
- start = 0
- stop = len(dim)
- step = 1
The second colon is also optional if no step is used.

Let’s pull out just the first row, m=0 of a and see how this works!

b = a[0]
b

array([0, 1, 2, 3])

Laying out our default slice to see the entire array explicitly looks something like this,

b[0:4:1]

array([0, 1, 2, 3])

where again, these default values are optional,

b[::]

array([0, 1, 2, 3])

and even the second : is optional

b[:]

array([0, 1, 2, 3])

Now to actually make our own slice, let’s select all elements from m=0 to m=2

b[0:2]

array([0, 1])

Warning

Slice notation is exclusive of the final index.

This means that slices will include every value up to your stop index and not this index itself, like a half-open interval [start, end). For example,

b[3]

np.int64(3)

reveals a different value than

b[0:3]

array([0, 1, 2])

Finally, a few more examples of this notation before reintroducing our 2-d array a.

b[2:]  # m=2 through the end, can leave off the number

array([2, 3])

b[:3]  # similarly, the same as our b[0:3]

array([0, 1, 2])

Multidimensional slicing

This entire syntax can be extended to each dimension of multidimensional arrays.

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

First let’s pull out rows 0 through 2, and then every : column for each of those

a[0:2, :]

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

Similarly, let’s get all rows for just column 2,

a[:, 2]

array([ 2,  6, 10])

or just take a look at the full row :, for every second column, ::2,

a[:, ::2]

array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])

For any shape of array, you can use ... to capture full slices of every non-specified dimension. Consider the 3-D array,

c = a.reshape(2, 2, 3)
c

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

c[0, ...]

array([[0, 1, 2],
       [3, 4, 5]])

and so this is equivalent to

c[0, :, :]

array([[0, 1, 2],
       [3, 4, 5]])

for extracting every dimension across our first row. We can also flip this around,

c[..., -1]

array([[ 2,  5],
       [ 8, 11]])

to investigate every preceding dimension along our the last entry of our last axis, the same as c[:, :, -1].

Summary

In this notebook we introduced NumPy and the ndarray that is so crucial to the entirety of the scientific Python community ecosystem. We created some arrays, used some of NumPy’s own mathematical functions to manipulate them, and then introduced the world of NumPy indexing and selecting for even multi-dimensional arrays.

What’s next?

This notebook is the gateway to nearly every other Pythia resource here. This information is crucial for understanding SciPy, pandas, xarray, and more. Continue into NumPy to explore some more intermediate and advanced topics!