Contents

Methods and Attributes

Remember

  • Methods ends with parentheses, while attributes don’t

  • df.shape: Attribute

  • df.info(): Method

# import pandas 
import pandas as pd 
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('http://bit.ly/imdbratings')
# example method: show the first 5 rows 
movies.head()
star_rating title content_rating genre duration actors_list
0 9.3 The Shawshank Redemption R Crime 142 [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1 9.2 The Godfather R Crime 175 [u'Marlon Brando', u'Al Pacino', u'James Caan']
2 9.1 The Godfather: Part II R Crime 200 [u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3 9.0 The Dark Knight PG-13 Action 152 [u'Christian Bale', u'Heath Ledger', u'Aaron E...
4 8.9 Pulp Fiction R Crime 154 [u'John Travolta', u'Uma Thurman', u'Samuel L....
# example method: calculate summary statistics
movies.describe()
star_rating duration
count 979.000000 979.000000
mean 7.889785 120.979571
std 0.336069 26.218010
min 7.400000 64.000000
25% 7.600000 102.000000
50% 7.800000 117.000000
75% 8.100000 134.000000
max 9.300000 242.000000
# example attribute: number of rows and columns 
movies.shape
(979, 6)
# example attribute: data type of each column
movies.dtypes
star_rating       float64
title              object
content_rating     object
genre              object
duration            int64
actors_list        object
dtype: object
# use an optional parameter to the describe method to summarize only 'object' column
movies.describe(include='object')
title content_rating genre actors_list
count 979 976 979 979
unique 975 12 16 969
top True Grit R Drama [u'Daniel Radcliffe', u'Emma Watson', u'Rupert...
freq 2 460 278 6