Splitting Arrays

import numpy as np

1.split

Split an array into multiple sub-arrays. By specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur

split(array, indices_or_sections, axis=0)

x = np.arange(9)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
print('Split the array in 3 equal-sized subarrays:' )
np.split(x, 3)
Split the array in 3 equal-sized subarrays:
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

The number of splits must be a divisor of the number of elements

Or Numpy will complain that an even split is not possible

np.split(x, 4)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
    552     try:
--> 553         len(indices_or_sections)
    554     except TypeError:

TypeError: object of type 'int' has no len()

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-19-8a4340bbd11d> in <module>()
----> 1 np.split(x, 4)

~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
    557         if N % sections:
    558             raise ValueError(
--> 559                 'array split does not result in an equal division')
    560     res = array_split(ary, indices_or_sections, axis)
    561     return res

ValueError: array split does not result in an equal division
print('Split the array at positions indicated in 1-D array:' )
np.split(x,[4,7])
Split the array at positions indicated in 1-D array:
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8])]

2.hsplit

The numpy.hsplit is a special case of split() function where axis is 1 indicating a horizontal split regardless of the dimension of the input array.
In this example, the split will be performed along a column

y = np.array([("Germany","France", "Hungary","Austria"),
              ("Berlin","Paris", "Budapest","Vienna" )]) 
y
array([['Germany', 'France', 'Hungary', 'Austria'],
       ['Berlin', 'Paris', 'Budapest', 'Vienna']], dtype='<U8')
p1, p2 = np.hsplit(y, 2)
print(p1)
[['Germany' 'France']
 ['Berlin' 'Paris']]
print(p2)
[['Hungary' 'Austria']
 ['Budapest' 'Vienna']]
np.hsplit(y,4)
[array([['Germany'],
        ['Berlin']], dtype='<U8'), array([['France'],
        ['Paris']], dtype='<U8'), array([['Hungary'],
        ['Budapest']], dtype='<U8'), array([['Austria'],
        ['Vienna']], dtype='<U8')]
np.hsplit(y,3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
    552     try:
--> 553         len(indices_or_sections)
    554     except TypeError:

TypeError: object of type 'int' has no len()

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-18-bcdd68c43d2e> in <module>()
----> 1 np.hsplit(y,3)

~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in hsplit(ary, indices_or_sections)
    619         raise ValueError('hsplit only works on arrays of 1 or more dimensions')
    620     if ary.ndim > 1:
--> 621         return split(ary, indices_or_sections, 1)
    622     else:
    623         return split(ary, indices_or_sections, 0)

~/anaconda3/lib/python3.6/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
    557         if N % sections:
    558             raise ValueError(
--> 559                 'array split does not result in an equal division')
    560     res = array_split(ary, indices_or_sections, axis)
    561     return res

ValueError: array split does not result in an equal division

3.vsplit

vsplit splits along the vertical axis

p_1,p_2 = np.vsplit(y, 2)
print(p_1)
[['Germany' 'France' 'Hungary' 'Austria']]
print(p_2)
[['Berlin' 'Paris' 'Budapest' 'Vienna']]

Array Unpacking

An alternative approach is array unpacking. In this example, we unpack the array into two variables. The array unpacks by row i.e Unpacking “unpacks” the first dimensions of an array

countries,capitals = y
print('Countries: ')
print(countries)
print('Capitals: ')
print(capitals)
Countries: 
['Germany' 'France' 'Hungary' 'Austria']
Capitals: 
['Berlin' 'Paris' 'Budapest' 'Vienna']

To get the columns, just transpose the array.

b1,b2,b3,b4 = y.T
print("b1: ")
print(b1)
print("b2: ")
print(b2)
print("b3: ")
print(b3)
print("b4: ")
print(b4)
b1: 
['Germany' 'Berlin']
b2: 
['France' 'Paris']
b3: 
['Hungary' 'Budapest']
b4: 
['Austria' 'Vienna']

We can not use the following code, reason being the first dimension of array now contains 4 rows. If we want to split in 2 arrays horizontally we should use split or hsplit.

z1,z2 = y.T
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-543fac61d801> in <module>()
----> 1 z1,z2 = y.T

ValueError: too many values to unpack (expected 2)