Array Indexing and Slicing¶
Array Indexing¶
This also follows zero based indexing like python lists
import numpy as np
1D Array Indexing¶
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences. The (start:stop:step) notation for slicing is used.
1D array at index
i
Returns the
ith
element of an arraySyntax:
array[i]
# Create an 1D arary
A1 = np.array([11, 22, 34, 12, 15])
# Select ith element of A1
A1[1]
# Negative indexing
A1[-1]
2D Array Indexing¶
2D array at index
[i][j]
Returns the
[i][j]
element of an arraySyntax:
array[i][j]
# Create an 2D array
A2 = np.array([[0, 1, 3], [4, 6, 7]])
# Select the first row of A2
A2[0]
# Select the first element of first row
A2[0][0]
Note
First,
A2[0]
= [0, 1, 3], which is the first row of arrayA2
Second,
A2[0]
select the first element of first row.
# Select the second row of A2
A2[1]
# Select the 3rd element of second row
A2[1][2]
Consider an array students, it contains the test scores in two courses of the students against their names
students = np.array([['Alice','Beth','Cathy','Dorothy'],
[65,78,90,81],
[71,82,79,92]])
students
students[0]
students[1]
students[2]
students[0,1]
Array Slicing¶
1D Array Slicing¶
# Create a 1D Array
A = np.array([11, 12, 13, 14, 15])
# Select all elements
A[:]
# Returns n-1
A[1:2]
# Select all except last element
A[:-1]
2D Array slicing¶
This will consider the rows 0 and 1, columns 2 and 3
# Create a 2D array of students info
students = np.array([['Alice','Beth','Cathy','Dorothy'],
[65,78,90,81],
[71,82,79,92]])
# All rows and column 1
students[:,1:2]
# All rows, columns 1 and 2
students[:,1:3]
# All columns, rows 0 and 1
students[0:2,:]
# All rows and columns
students[:]
# The last row
students[-1,:]
# 3rd from last to second from last row, last two columns
students[-3:-1,-2:]
Dots or ellipsis(…)¶
Slicing can also include ellipsis (…) to make a selection tuple of the same length as the dimension of an array. The dots (…) represent as many colons as needed to produce a complete indexing tuple
Equivalent to students[0] or students[0:1,:]¶
Select row 0 and all columns
students[0,...]
# All rows and column 1
students[...,1]
students[...,1].shape
students[:,1:2].shape
Fancy Indexing - Integer Arrays¶
NumPy arrays can be indexed with slices, but also with boolean or integer arrays (masks). It means passing an array of indices to access multiple array elements at once. This method is called fancy indexing. It creates copies not views.
a = np.arange(12)**2
a
Suppose we want to access three different elements. We could do it like this:
a[2],a[6],a[8]
Alternatively, we can pass a single list or array of indices to obtain the same result:
indx_1 = [2,6,8]
a[indx_1]
When using fancy indexing, the shape of the result reflects the shape of the index arrays rather than the shape of the array being indexed
indx_2 = np.array([[2,4],[8,10]])
indx_2
a[indx_2]
We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.
food = np.array([["blueberry","strawberry","cherry","blackberry"],
["pinenut","hazelnuts","cashewnut","coconut"],
["mustard","paprika","nutmeg","clove"]])
food
We will now select the corner elements of this array¶
row = np.array([[0,0],[2,2]])
col = np.array([[0,3],[0,3]])
food[row,col]
Notice that the first value in the result is food[0,0], next is food[0,3] , food[2,0] and lastly food[2,3]
food[2,0]
Modifying Values with Fancy Indexing¶
Just as fancy indexing can be used to access parts of an array, it can also be used to modify parts of an array.
food[row,col] = "000000"
food
We can use any assignment-type operator for this. Consider following example:
a
indx_1
a[indx_1] = 999
a
a[indx_1] -=100
a
Fancy Indexing - Boolean Arrays¶
When we index arrays with arrays of (integer) indices we are providing the list of indices to pick. With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don’t.
Frequently this type of indexing is used to select the elements of an array that satisfy some condition
a = np.arange(16).reshape(4,4)
a
Now we find the elements that are greater than 9. This will return a numpy array of the same shape as our original array.
indx_bool = a > 9
indx_bool
We use this array to select elements in a corresponding to ‘true’ values in the boolean array.
a[indx_boo]
We can do all of the above in a single concise statement
print(a[a > 9])
Structured Arrays¶
Structured arrays or record arrays are useful when you perform computations, and at the same time you could keep closely related data together. Structured arrays provide efficient storage for compound, heterogeneous data.
NumPy also provides powerful capabilities to create arrays of records, as multiple data types live in one NumPy array. However, one principle in NumPy that still needs to be honored is that the data type in each field (think of this as a column in the records) needs to be homogeneous.
Imagine that we have several categories of data on a number of students say, name, roll number, and test scores.
name = ["Alice","Beth","Cathy","Dorothy"]
studentId = [1,2,3,4]
score = [85.4,90.4,87.66,78.9]
There’s nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data.
Define the np array with the names of the ‘columns’ and the data format for each¶
U10 represents a 10-character Unicode string
i4 is short for int32 (i for int, 4 for 4 bytes)
f8 is shorthand for float64
student_data = np.zeros(4, dtype={'names':('name', 'studentId', 'score'),
'formats':('U10', 'i4', 'f8')})
np.zeros() for a string sets it to an empty string¶
student_data
print(student_data.dtype)
Now that we’ve created an empty container array, we can fill the array with our lists of values
student_data['name'] = name
student_data['studentId'] = studentId
student_data['score'] = score
print(student_data)
The handy thing with structured arrays is that you can now refer to values either by index or by name
student_data['name']
student_data['studentId']
student_data['score']
If you index student_data at position 1 you get a structure:
student_data[1]
Get the name attribute from the last row¶
student_data[-1]['name']
Get names where score is above 85¶
student_data[student_data['score'] > 85]['name']
Note that if you’d like to do any operations that are any more complicated than these, you should probably consider the Pandas package with provides a powerful data structure called data frames.