{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to NumPy\n", "\n", "## What is NumPy?\n", "\n", "NumPy is a Python package which stands for ‘Numerical Python’. It is the core library for scientific computing, which contains a powerful n-dimensional array object, provide tools for integrating C, C++ etc. It is also useful in linear algebra, random number capability etc. NumPy array can also be used as an efficient multi-dimensional container for generic data. Now, let me tell you what exactly is a python numpy array.\n", "\n", "## Keypoints \n", "- Numpy stands for numerical Python\n", "- Fundamental package for numerical computations in Python\n", "- a powerful N-dimensional array object\n", "- sophisticated (broadcasting) functions\n", "- tools for integrating C/C++ and Fortran code\n", "- useful linear algebra, Fourier transform, and random number capabilities\n", "\n", "## NumPy Array\n", "Numpy array is a powerful N-dimensional array object which is in the form of rows and columns. We can initialize numpy arrays from nested Python lists and access it elements. In order to perform these numpy operations.\n", "\n", "## N-dimensional Array\n", "- 1Dimensional(1D) Array\n", "- 2Dimensional(2D) Array\n", "- 3Dimensional(3D) Array\n", "![NdArray](../img/arrays.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started\n", "Use the following import convention\n", "```python\n", "import numpy as np\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why Numpy?\n", "- Less Memory\n", "- Fast\n", "- Convenient" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculation\n", "- Element wise sum is not possible in Python list. But numpy can do that it is an advantage of numpy array\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4, 5, 6]\n" ] } ], "source": [ "# add 2 lists \n", "L1 = [1, 2, 3]\n", "L2 = [4, 5, 6]\n", "print(L1+L2)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5 7 9]\n" ] } ], "source": [ "# element wise sum using numpy array \n", "import numpy as np \n", "A1 = np.array([1, 2, 3])\n", "A2 = np.array([4, 5, 6])\n", "print(A1+A2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Less Memory" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python List: 28000\n", "Numpy Array: 8000\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import sys\n", "S = range(1000)\n", "print(\"Python List: \", sys.getsizeof(5)*len(S))\n", " \n", "D = np.arange(1000)\n", "print(\"Numpy Array: \", D.size*D.itemsize)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Faster" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "196.61378860473633\n", "56.15043640136719\n" ] } ], "source": [ "import time\n", "import sys\n", " \n", "SIZE = 1000000\n", " \n", "L1 = range(SIZE)\n", "L2 = range(SIZE)\n", "A1 = np.arange(SIZE)\n", "A2 = np.arange(SIZE)\n", " \n", "start= time.time()\n", "result=[(x,y) for x,y in zip(L1,L2)]\n", "# time in ms \n", "print((time.time()-start)*1000)\n", " \n", "start = time.time()\n", "result = A1+A2\n", "# time in ms \n", "print((time.time()-start)*1000)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19.5 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%timeit sum(range(1000))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8.63 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" ] } ], "source": [ "%timeit np.sum(np.arange(1000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Arrays \n", "- **Array:** Ordered collection of elements of basic data types of given length.\n", "- **Syntax**\n", "```python \n", "np.array(object)\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# import numpy \n", "import numpy as np " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3]\n" ] } ], "source": [ "# Creating 1D array\n", "A = np.array([1, 2, 3])\n", "print(A)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# type \n", "print(type(A))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array with Categorical Entities \n", "- Numpy can handle different categorical entities. \n", "- All elements are coerced into same data type " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['12' '13' 'n']\n" ] } ], "source": [ "# create an array with categorical entities. \n", "X = np.array([12, 13, \"n\"])\n", "print(X)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# type \n", "print(type(X))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[3 4 5]\n", " [7 8 9]]\n" ] } ], "source": [ "# Creating 2D array\n", "A2 = np.array([[3, 4, 5], [7, 8, 9]])\n", "print(A2) " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[ 1 2 3]\n", " [ 4 5 6]]\n", "\n", " [[ 7 8 9]\n", " [10 11 12]]]\n" ] } ], "source": [ "# Creating 3D array\n", "A3 = np.array([[(1, 2, 3), (4, 5, 6)], [(7, 8, 9), (10, 11, 12)]])\n", "print(A3) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspecting array properties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Size \n", "- Returns number of elements in array\n", "- **Syntax:** `array.size`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A1 = np.array([1, 2, 3,4, 5])\n", "# size \n", "A1.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Shape\n", "- Returns dimensions of array (rows,columns)\n", "- **Syntax:** `array.shape`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A2 = np.array([[4, 5, 6], [7, 8, 9]])\n", "# shape \n", "A2.shape " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get row \n", "A2.shape[0]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get column\n", "A2.shape[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Type\n", "- Returns type of elements in array\n", "- **Syntax:** `array.size`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A3 = np.linspace(0, 100, 6)\n", "# dtypes \n", "A3.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ## Type Conversion \n", " - Convert array elements to type dtype\n", " - **Syntax:** `array.astype(dtype)`\n", " - dtype - data type " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1.],\n", " [1., 1., 1.]], dtype=float16)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A4 = np.ones((2,3))\n", "# convert \n", "A4.astype(np.float16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy array to Python List \n", "- Returns the Python list \n", "- **Syntax:** `array.tolist()`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.0,\n", " 5.2631578947368425,\n", " 10.526315789473685,\n", " 15.789473684210527,\n", " 21.05263157894737,\n", " 26.315789473684212,\n", " 31.578947368421055,\n", " 36.8421052631579,\n", " 42.10526315789474,\n", " 47.36842105263158,\n", " 52.631578947368425,\n", " 57.89473684210527,\n", " 63.15789473684211,\n", " 68.42105263157896,\n", " 73.6842105263158,\n", " 78.94736842105263,\n", " 84.21052631578948,\n", " 89.47368421052633,\n", " 94.73684210526316,\n", " 100.0]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A5 = np.linspace(0, 100, 20)\n", "# array to list \n", "A5.tolist() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Help: View documentation\n", "- Returns a documentation\n", "- **Syntax:** `np.info(np.function)`\n", " - function - linspace, logspace, eye, ones, zeros etc." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " linspace(*args, **kwargs)\n", "\n", "Return evenly spaced numbers over a specified interval.\n", "\n", "Returns `num` evenly spaced samples, calculated over the\n", "interval [`start`, `stop`].\n", "\n", "The endpoint of the interval can optionally be excluded.\n", "\n", ".. versionchanged:: 1.16.0\n", " Non-scalar `start` and `stop` are now supported.\n", "\n", "Parameters\n", "----------\n", "start : array_like\n", " The starting value of the sequence.\n", "stop : array_like\n", " The end value of the sequence, unless `endpoint` is set to False.\n", " In that case, the sequence consists of all but the last of ``num + 1``\n", " evenly spaced samples, so that `stop` is excluded. Note that the step\n", " size changes when `endpoint` is False.\n", "num : int, optional\n", " Number of samples to generate. Default is 50. Must be non-negative.\n", "endpoint : bool, optional\n", " If True, `stop` is the last sample. Otherwise, it is not included.\n", " Default is True.\n", "retstep : bool, optional\n", " If True, return (`samples`, `step`), where `step` is the spacing\n", " between samples.\n", "dtype : dtype, optional\n", " The type of the output array. If `dtype` is not given, infer the data\n", " type from the other input arguments.\n", "\n", " .. versionadded:: 1.9.0\n", "\n", "axis : int, optional\n", " The axis in the result to store the samples. Relevant only if start\n", " or stop are array-like. By default (0), the samples will be along a\n", " new axis inserted at the beginning. Use -1 to get an axis at the end.\n", "\n", " .. versionadded:: 1.16.0\n", "\n", "Returns\n", "-------\n", "samples : ndarray\n", " There are `num` equally spaced samples in the closed interval\n", " ``[start, stop]`` or the half-open interval ``[start, stop)``\n", " (depending on whether `endpoint` is True or False).\n", "step : float, optional\n", " Only returned if `retstep` is True\n", "\n", " Size of spacing between samples.\n", "\n", "\n", "See Also\n", "--------\n", "arange : Similar to `linspace`, but uses a step size (instead of the\n", " number of samples).\n", "geomspace : Similar to `linspace`, but with numbers spaced evenly on a log\n", " scale (a geometric progression).\n", "logspace : Similar to `geomspace`, but with the end points specified as\n", " logarithms.\n", "\n", "Examples\n", "--------\n", ">>> np.linspace(2.0, 3.0, num=5)\n", "array([2. , 2.25, 2.5 , 2.75, 3. ])\n", ">>> np.linspace(2.0, 3.0, num=5, endpoint=False)\n", "array([2. , 2.2, 2.4, 2.6, 2.8])\n", ">>> np.linspace(2.0, 3.0, num=5, retstep=True)\n", "(array([2. , 2.25, 2.5 , 2.75, 3. ]), 0.25)\n", "\n", "Graphical illustration:\n", "\n", ">>> import matplotlib.pyplot as plt\n", ">>> N = 8\n", ">>> y = np.zeros(N)\n", ">>> x1 = np.linspace(0, 10, N, endpoint=True)\n", ">>> x2 = np.linspace(0, 10, N, endpoint=False)\n", ">>> plt.plot(x1, y, 'o')\n", "[]\n", ">>> plt.plot(x2, y + 0.5, 'o')\n", "[]\n", ">>> plt.ylim([-0.5, 1])\n", "(-0.5, 1)\n", ">>> plt.show()\n" ] } ], "source": [ "np.info(np.linspace)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "- https://numpy.org/\n", "- https://www.edureka.co/blog/python-numpy-tutorial/\n", "- https://github.com/enthought/Numpy-Tutorial-SciPyConf-2019\n", "- [Python Machine Learning Cookbook](https://www.amazon.com/Python-Machine-Learning-Cookbook-Prateek/dp/1786464470)\n", "
\n", "\n", "*This notebook was created by [Jubayer Hossain](https://jhossain.me/) | Copyright © 2020, [Jubayer Hossain](https://jhossain.me/)*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }