Splitting Arrays¶
import numpy as np
1.split¶
Split an array into multiple sub-arrays. By specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur
split(array, indices_or_sections, axis=0)
x = np.arange(9)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
print('Split the array in 3 equal-sized subarrays:' )
np.split(x, 3)
Split the array in 3 equal-sized subarrays:
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
The number of splits must be a divisor of the number of elements¶
Or Numpy will complain that an even split is not possible
np.split(x, 4)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
552 try:
--> 553 len(indices_or_sections)
554 except TypeError:
TypeError: object of type 'int' has no len()
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-19-8a4340bbd11d> in <module>()
----> 1 np.split(x, 4)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
557 if N % sections:
558 raise ValueError(
--> 559 'array split does not result in an equal division')
560 res = array_split(ary, indices_or_sections, axis)
561 return res
ValueError: array split does not result in an equal division
print('Split the array at positions indicated in 1-D array:' )
np.split(x,[4,7])
Split the array at positions indicated in 1-D array:
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8])]
2.hsplit¶
The numpy.hsplit is a special case of split() function where axis is 1 indicating a horizontal split regardless of the dimension of the input array.
In this example, the split will be performed along a column
y = np.array([("Germany","France", "Hungary","Austria"),
("Berlin","Paris", "Budapest","Vienna" )])
y
array([['Germany', 'France', 'Hungary', 'Austria'],
['Berlin', 'Paris', 'Budapest', 'Vienna']], dtype='<U8')
p1, p2 = np.hsplit(y, 2)
print(p1)
[['Germany' 'France']
['Berlin' 'Paris']]
print(p2)
[['Hungary' 'Austria']
['Budapest' 'Vienna']]
np.hsplit(y,4)
[array([['Germany'],
['Berlin']], dtype='<U8'), array([['France'],
['Paris']], dtype='<U8'), array([['Hungary'],
['Budapest']], dtype='<U8'), array([['Austria'],
['Vienna']], dtype='<U8')]
np.hsplit(y,3)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
552 try:
--> 553 len(indices_or_sections)
554 except TypeError:
TypeError: object of type 'int' has no len()
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-18-bcdd68c43d2e> in <module>()
----> 1 np.hsplit(y,3)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in hsplit(ary, indices_or_sections)
619 raise ValueError('hsplit only works on arrays of 1 or more dimensions')
620 if ary.ndim > 1:
--> 621 return split(ary, indices_or_sections, 1)
622 else:
623 return split(ary, indices_or_sections, 0)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
557 if N % sections:
558 raise ValueError(
--> 559 'array split does not result in an equal division')
560 res = array_split(ary, indices_or_sections, axis)
561 return res
ValueError: array split does not result in an equal division
3.vsplit¶
vsplit splits along the vertical axis
p_1,p_2 = np.vsplit(y, 2)
print(p_1)
[['Germany' 'France' 'Hungary' 'Austria']]
print(p_2)
[['Berlin' 'Paris' 'Budapest' 'Vienna']]
Array Unpacking¶
An alternative approach is array unpacking. In this example, we unpack the array into two variables. The array unpacks by row i.e Unpacking “unpacks” the first dimensions of an array
countries,capitals = y
print('Countries: ')
print(countries)
print('Capitals: ')
print(capitals)
Countries:
['Germany' 'France' 'Hungary' 'Austria']
Capitals:
['Berlin' 'Paris' 'Budapest' 'Vienna']
To get the columns, just transpose the array.
b1,b2,b3,b4 = y.T
print("b1: ")
print(b1)
print("b2: ")
print(b2)
print("b3: ")
print(b3)
print("b4: ")
print(b4)
b1:
['Germany' 'Berlin']
b2:
['France' 'Paris']
b3:
['Hungary' 'Budapest']
b4:
['Austria' 'Vienna']
We can not use the following code, reason being the first dimension of array now contains 4 rows. If we want to split in 2 arrays horizontally we should use split or hsplit.
z1,z2 = y.T
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-543fac61d801> in <module>()
----> 1 z1,z2 = y.T
ValueError: too many values to unpack (expected 2)