Selecting Series from DataFrame

Single Series

# conventional way to import pandas
import pandas as pd
# read a dataset of UFO reports into DataFrame 
ufo = pd.read_table('http://bit.ly/uforeports', sep=',')

# read a csv is equivalent to read_table, except it assumes a comma separator 
ufo = pd.read_csv('http://bit.ly/uforeports')
# examine first 5 rows 
ufo.head()
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00
# select 'City' Series using bracket notation
ufo['City']
0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object
type(ufo['City'])
# select 'City' Series using dot(.) notation
ufo.City
0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

Note

  • Bracket notation will always work, whereas dot notation has limitations

  • Dot notation doesn’t work if there are spaces in the Series name

  • Dot notation doesn’t work if the Series has the same name as a DataFrame method or attribute (like ‘head’ or ‘shape’)

  • Dot notation can’t be used to define the name of a new Series (see below)

# create a new 'Location' Series (must use bracket notation to define the Series name)
ufo['Location'] = ufo.City + ', ' + ufo.State
ufo.head()
City Colors Reported Shape Reported State Time Location
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00 Ithaca, NY
1 Willingboro NaN OTHER NJ 6/30/1930 20:00 Willingboro, NJ
2 Holyoke NaN OVAL CO 2/15/1931 14:00 Holyoke, CO
3 Abilene NaN DISK KS 6/1/1931 13:00 Abilene, KS
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00 New York Worlds Fair, NY

Multiple Series

# select multiple series from dataframe 
ufo[['City', 'State', 'Time']]
City State Time
0 Ithaca NY 6/1/1930 22:00
1 Willingboro NJ 6/30/1930 20:00
2 Holyoke CO 2/15/1931 14:00
3 Abilene KS 6/1/1931 13:00
4 New York Worlds Fair NY 4/18/1933 19:00
... ... ... ...
18236 Grant Park IL 12/31/2000 23:00
18237 Spirit Lake IA 12/31/2000 23:00
18238 Eagle River WI 12/31/2000 23:45
18239 Eagle River WI 12/31/2000 23:45
18240 Ybor FL 12/31/2000 23:59

18241 rows × 3 columns