Lists vs. Arrays

Python lists and numpy arrays

Lists and arrays are both useful data structures. When would we want to use one over the other?

Lists

Lists are built into python; there's no need to import anything. Lists are very flexible. They can hold any type of object and they can change their size and content on demand.

Creating lists

We can directly make a list of objects by wrapping a comma-separated sequence of objects with square brackets as shown below.

In [1]:
a = [0, "one", 2.0]
print(a)
[0, 'one', 2.0]

If we have an object that looks like a sequence, we make a list from that object using the list() function.

In [2]:
# create a tuple, an immutable (non-rewritable) data structure that behaves in many ways like a list
a = (0, "one", 2.0)
print("a = {}, type of a is {}".format(a, type(a)))

# create a list from this tuple
b = list(a)
print("b = {}, type of b is {}".format(b, type(b)))
a = (0, 'one', 2.0), type of a is <class 'tuple'>
b = [0, 'one', 2.0], type of b is <class 'list'>

We can create a range of integers with the range() function. This returns a special object (creatively called a range object) that can't directly be indexed like a list. If we wanted to make a loop over these integers, we could use this object directly, but if we wanted to use it like a list we would have to use the list() function

In [3]:
print(range.__doc__)
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
In [4]:
a = range(5)
print("a = {}, type of a is {}".format(a, type(a)))

b = list(a)
print("b = {}, type of b is {}".format(b, type(b)))
a = range(0, 5), type of a is <class 'range'>
b = [0, 1, 2, 3, 4], type of b is <class 'list'>

Accessing and changing list elements

We can index into a list (and most other sequencey objects in Python) with square brackets. Recall that in Python we start indexing from 0!

In [5]:
a = [0, "one", 2.0]

print(a[0])
print(a[1])
print(a[2])
0
one
2.0

We can access a range of elements with a colon in the index.

In [6]:
# 0:2 picks out elements from indices 0 (inclusive) to 2 (exclusive)
print(a[0:2])

# leaving the right index in a range empty will take everything from the specified index to the end
print(a[1:]) 

# similarly, leaving the left index in a range empty will take everything from the start to the specified index 
print(a[:2]) 

# indexing with negative integers counts back from the end of the list
print(a[-1])
print(a[-2])
print(a[-3])
[0, 'one']
['one', 2.0]
[0, 'one']
2.0
one
0

Common list methods

Lists have useful function built into them. These functions, accessed as attributes of lists in the "dot" notation, are generically called methods of the object.

In [7]:
a = [0, "one", 2.0]
print(a)
[0, 'one', 2.0]

Append an object to the back of the list.

In [8]:
a.append("3")
print(a)
[0, 'one', 2.0, '3']

"Pop", or spit out and remove, the element of the list at some index.

In [9]:
x = a.pop(2)
print(x)
print(a)
2.0
[0, 'one', '3']

Insert an element before some index.

In [10]:
x = 2.0
a.insert(2, x)
print(a)
[0, 'one', 2.0, '3']

Remove the first instance of some value.

In [11]:
a.remove('3')
print(a)
[0, 'one', 2.0]

Extend a list with another sequence.

In [12]:
a.extend(range(4, 6))
print(a)
[0, 'one', 2.0, 4, 5]

Reverse the list.

In [13]:
a.reverse()
print(a)
[5, 4, 2.0, 'one', 0]

Sort the list.

In [14]:
a = [5, 2, 7, 32]
a.sort()
print(a)
[2, 5, 7, 32]

Sorting only works if the elements of the list are comparable. For instance, it doesn't make sense to numerically rank the string "one" and the integer 0, where as it does make sense to compare the values of 5 and 2.

In [15]:
a = [0, "one", 2.0]

try:
    a.sort()
except TypeError as e:
    print("TypeError: {}".format(e))
TypeError: '<' not supported between instances of 'str' and 'int'

Binary operators on lists

We can concatenate two lists with some syntactic sugar of the addition operator.

In [16]:
a = [0, 1, 2]
b = [3, 4, 5, 6]
c = a + b
print(c)
[0, 1, 2, 3, 4, 5, 6]

If we use the multiplication operator with a list and an integer, we can copy and concatenate the same list.

In [17]:
a = [0, 1, 2]
b = 3 * a
print(b)
[0, 1, 2, 0, 1, 2, 0, 1, 2]

This multiplication will fail if we try just about anything else with a list.

In [18]:
try:
    a * 2.1
except TypeError as e:
    print("TypeError: {}".format(e))
TypeError: can't multiply sequence by non-int of type 'float'

Arrays

Arrays are sequences that in some ways behave like lists, but are more restricted in what they can do. In particular, they can only hold objects of one type of basic values, and they are fixed in size. While python has one implementation of arrays built into the standard library, we're going to be using a third-party numerical array library called numpy. numpy is the de-facto standard library for numerical computation in python.

In [19]:
import numpy as np

Creating arrays

For a complete list of array creation tools in numpy, see the docs.

The simplest way to create an array is to pass a sequence of numbers to the np.array() function.

In [20]:
a = np.array([1, 1, 2, 3, 5, 8])
print("a = {}, type of a is {}".format(a, type(a)))
a = [1 1 2 3 5 8], type of a is <class 'numpy.ndarray'>

We can reassign the value in an array, so long as they are of the same type.

In [21]:
a[0] = 42
print(a)
[42  1  2  3  5  8]
In [22]:
try:
    a[0] = 'This is not an integer!'
except ValueError as e:
    print("ValueError: {}".format(e))
ValueError: invalid literal for int() with base 10: 'This is not an integer!'

We can also use the function np.arange() in a similar way to the built-in range() function.

In [23]:
a = np.arange(5)
print(a)

a = np.arange(10, 15)
print(a)

a = np.arange(0, 10, 2)
print(a)
[0 1 2 3 4]
[10 11 12 13 14]
[0 2 4 6 8]

To generate linearly spaced numbers in some range, we can use the np.linspace() function.

In [24]:
a = np.linspace(0, 1, 11)
print(a)
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]

By default, giving only a start and a stop number will make an array with 50 values.

In [25]:
a = np.linspace(0, 98)
print(a)
[ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34.
 36. 38. 40. 42. 44. 46. 48. 50. 52. 54. 56. 58. 60. 62. 64. 66. 68. 70.
 72. 74. 76. 78. 80. 82. 84. 86. 88. 90. 92. 94. 96. 98.]

We can create arrays of zeros or one with the np.zeros() and np.ones() functions.

In [26]:
a = np.zeros(10)
print(a)

a = np.ones(10)
print(a)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Multi-dimensional arrays

numpy naturally deals with n-dimensional arrays.

We can create a multi-dimensional array in similar ways to a 1-d array. We can pass np.array() a sequence of sequences as shown below.

In [27]:
a = np.array([[1, 2], [3, 4]])
print(a)
[[1 2]
 [3 4]]

Note that this will do weird things if we pass some non-rectanglar nested lists. In particular, numpy will interpret this as a 1d array of lists rather than a 2d array.

In [28]:
a = np.array([[1, 2, 3], [4, 5]])
print(a)
[list([1, 2, 3]) list([4, 5])]

We can make multi-dimensional arrays of ones and zeros by passing a sequence of integers to np.ones() and np.zeros() respectively.

In [29]:
a = np.ones([3, 5])
print(a)
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
In [30]:
a = np.zeros([5, 2])
print(a)
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]

Attributes and methods of arrays

We can get the size (total number of elements) and shape (dimensionality) of an array with the size and shape attributes respectively.

In [31]:
a = np.zeros([5, 2])
print('a.size = {}, a.shape = {}'.format(a.size, a.shape))
a.size = 10, a.shape = (5, 2)

There are a vast number of useful methods built into arrays. You can get a sense of what is available in the online documentation.

np.min() and np.max() find the minimum and maximum of an array. np.argmin() and np.argmax() find the indices of the minimum and maximum values of an array. These functions can either be accessed through the top-level numpy module with an array as an argument, or as methods of an individual array.

In [32]:
a = np.array([3,78,3,7,3,21,7.1])
print(a)
[ 3.  78.   3.   7.   3.  21.   7.1]
In [33]:
print("minimum of a is {}, occurs at index i = {}".format(a.min(), a.argmin()))
print("maximum of a is {}, occurs at index i = {}".format(a.max(), a.argmax()))
minimum of a is 3.0, occurs at index i = 0
maximum of a is 78.0, occurs at index i = 1

We can sort an array with np.sort(). Note that calling this function as np.sort(a) does not sort the array in place! Rather, it returns a copy of the sorted array. If we call it as a method as a.sort(), then it does sort the array in place.

In [34]:
b = np.sort(a)
print(a)
print(b)
[ 3.  78.   3.   7.   3.  21.   7.1]
[ 3.   3.   3.   7.   7.1 21.  78. ]

We can calculate some basic statistics for an array.

In [35]:
print("mean of a is {:.2f}".format(np.mean(a)))
print("median of a is {:.2f}".format(np.median(a)))
print("standard deviation of a is {:.2f}".format(np.std(a)))
mean of a is 17.44
median of a is 7.00
standard deviation of a is 25.42

Indexing arrays

Arrays can be indexed like lists, but there are generally more options available to us. See the docs.

In [36]:
a = np.array([3,78,3,7,3,21,7.1])

print(a)
print("indices 2 -> 6: {}".format(a[2:6]))
[ 3.  78.   3.   7.   3.  21.   7.1]
indices 2 -> 6: [ 3.  7.  3. 21.]

Recalling that the third integer in this indexing "slice" gives us the number of steps to skip by, we can find a tricky way to reverse an array.

In [37]:
b = a[::-1]

print(a)
print(b)
[ 3.  78.   3.   7.   3.  21.   7.1]
[ 7.1 21.   3.   7.   3.  78.   3. ]

Note that indexing into an array like this will return a view or a reference into the array. That means that if we make a "new" array by indexing into an old one, changing one will change the other. If we explicitly call the np.copy() function, then the two arrays will not be connected.

In [38]:
b = a.copy() # make a copy of a
c = b[:] # take a view into all of b and store in c

print(b)
print(c)

b[0] = 100
print(b)
[ 3.  78.   3.   7.   3.  21.   7.1]
[ 3.  78.   3.   7.   3.  21.   7.1]
[100.   78.    3.    7.    3.   21.    7.1]
In [39]:
print("copied array: a = {}".format(a))
print("viewed array: c = {}".format(c))
copied array: a = [ 3.  78.   3.   7.   3.  21.   7.1]
viewed array: c = [100.   78.    3.    7.    3.   21.    7.1]

We can index into a multi-dimensional array by separating slices with commas. Left to right in indexing goes from outermost axis to innermost axis. Let's create a 3-d array with the np.reshape() function to try this out.

In [40]:
# create a 3 by 4 by 5 shaped array from 0 to 59
# outermost axis has dimension 3, middle axis has dimension 4, and innermost axis has dimension 5

a = np.arange(3 * 4 * 5).reshape((3, 4, 5))
print(a)
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]]

 [[40 41 42 43 44]
  [45 46 47 48 49]
  [50 51 52 53 54]
  [55 56 57 58 59]]]

We can pick out a single slice in this 3-d array.

In [41]:
print(a[0, :, :])
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
In [42]:
print(a[:, 0, :])
[[ 0  1  2  3  4]
 [20 21 22 23 24]
 [40 41 42 43 44]]
In [43]:
print(a[:, :, 0])
[[ 0  5 10 15]
 [20 25 30 35]
 [40 45 50 55]]

Or we could pick out multiple slices by giving a range of indices.

In [44]:
print(a[:, 1:3, 0])
[[ 5 10]
 [25 30]
 [45 50]]

Broadcasting and mathematical operations

Many functions in numpy work on arrays of any size or shape. Many of these functions can do calculations on an element-by-element basis much for effectively than we can by looping. You can find all of these available functions in the documentation. Many of these also work with the normal binary operators. For example, to add two arrays together elementwise, we can just do a + b rather than having to type out np.add(a, b).

In [45]:
a = np.arange(3 * 4).reshape((3, 4))
b = np.ones((3, 4))

print("a = \n{}\n".format(a))
print("b = \n{}\n".format(b))
print("a + b = \n{}\n".format(a + b))
print("exp(a) = \n{}\n".format(np.exp(a)))
print("sin(a) = \n{}\n".format(np.sin(a)))
a = 
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

b = 
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

a + b = 
[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9. 10. 11. 12.]]

exp(a) = 
[[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01]
 [5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03]
 [2.98095799e+03 8.10308393e+03 2.20264658e+04 5.98741417e+04]]

sin(a) = 
[[ 0.          0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155   0.6569866 ]
 [ 0.98935825  0.41211849 -0.54402111 -0.99999021]]

What's the benefit of using arrays?

Arrays are in many ways less flexible than lists, as we can't change their size or data type very easily. So why use them? Let's do a short experiment to find out. We'll try implementing one nice feature (easy elementwise addition) in lists and see how fast we can make it.

In [46]:
# create two large lists
n = 1000
a = list(range(n))
b = list(range(n))

There are a few ways of doing this, some of which may be more efficient than others.

In [47]:
def add_lists_elementwise(a, b):
    c = []
    for i in range(len(a)):
        c.append(a[i] + b[i])
    return c

The %timeit magic macro in IPython will run a set of timing tests on a line of python code.

In [48]:
%timeit add_lists_elementwise(a, b)
130 µs ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I get a little over 120 microseconds. Not too bad, certainly faster than I'll ever notice. Let's try the same task with arrays.

In [49]:
a_array = np.array(a)
b_array = np.array(b)
In [50]:
%timeit a_array + b_array
986 ns ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

And here I get around 1 microsecond, a speed up of over two orders of magnitude. Now if we only had to do this once, it would be no big deal. But if we have to do calculations like this millions of times on arrays that are even larger, this speedup makes difficult tasks easy and makes impossibly slow tasks feasible.

social