Technology Fundamentals for Business Analytics Lab 4¶

Overview¶

Subsetting data, merging data, and aggregating data in Python.

#This is going to create some sample data that we can work with for our analysis. 

import pandas as pd
import numpy as np
 
#Create a dataframe from a random numpy array 
#http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randn.html
df = pd.DataFrame((20+np.random.randn(10, 4)*5),  columns=['a', 'b', 'c', 'd'] )
print df
df2 = pd.DataFrame(df) # This creates a new dataframe from df 
df3 = pd.DataFrame(df)
df4 = pd.DataFrame((20+np.random.randn(10, 4)*5),  columns=['e', 'f', 'g', 'h'] )
print df2
print df3
print df4

#Notice how it automatically generates an index 0-9.

           a          b          c          d
0  20.966500  26.149632  18.416406   5.737357
1  18.108707  15.533413  31.860974  18.567691
2  26.906641  25.743868  22.561520  22.533756
3  25.783053  19.113882  17.197225  15.789697
4  24.028175   8.235366  21.432091  24.244823
5  20.022652  18.769690  13.452382  21.823068
6  20.806974  22.350539  20.195054  13.126667
7  20.459724  20.323686  22.118194  17.553785
8  13.578215  20.332386  26.693867  16.566077
9  24.931810  17.613739  21.605902  20.668964
           a          b          c          d
0  20.966500  26.149632  18.416406   5.737357
1  18.108707  15.533413  31.860974  18.567691
2  26.906641  25.743868  22.561520  22.533756
3  25.783053  19.113882  17.197225  15.789697
4  24.028175   8.235366  21.432091  24.244823
5  20.022652  18.769690  13.452382  21.823068
6  20.806974  22.350539  20.195054  13.126667
7  20.459724  20.323686  22.118194  17.553785
8  13.578215  20.332386  26.693867  16.566077
9  24.931810  17.613739  21.605902  20.668964
           a          b          c          d
0  20.966500  26.149632  18.416406   5.737357
1  18.108707  15.533413  31.860974  18.567691
2  26.906641  25.743868  22.561520  22.533756
3  25.783053  19.113882  17.197225  15.789697
4  24.028175   8.235366  21.432091  24.244823
5  20.022652  18.769690  13.452382  21.823068
6  20.806974  22.350539  20.195054  13.126667
7  20.459724  20.323686  22.118194  17.553785
8  13.578215  20.332386  26.693867  16.566077
9  24.931810  17.613739  21.605902  20.668964
           e          f          g          h
0  15.602570  13.959673  20.825387  22.639154
1  16.899620  27.812643  14.706201  19.191837
2  14.567181  21.955734  21.035244  14.765904
3  19.816062  18.350397  14.726465  18.150497
4  33.768035   9.751505  29.089492  25.721509
5  22.641559  26.843551  22.691911  19.679208
6  23.127548  17.296965  26.012906  25.210397
7  18.213551  13.677088  21.306722  21.756867
8  13.716335  18.647608  19.078827  29.794494
9  17.225318  16.816098  26.180554  29.732283

Selecting columns¶

The Panda's package includes with it functionality to do a wide variety of different methods. You can read more about the documentation for the ix method here.

#Here we can see that there is a similar structure to R, with selecting the desired columns by passing a list.

print df.ix[:,0:2]     #All rows, column 0-2
print df.ix[:,[0,2,3]]     #All rows, column 0-2
print df.ix[:,'a':'b'] #All rows, column a-b
print df.ix[:,['a','c','d']] #All rows, columns a, c, d
columns = ['a','c','d']
icolumns= [0,2,3]
print df.ix[:,columns ] #All rows, columns a, c, d
print df.ix[:,icolumns ] #All rows, columns a, c, d

           a          b
0   9.621240  18.132585
1  18.138869  20.814594
2  22.087314   5.119496
3  30.887575  19.966378
4  30.573405  16.890415
5  18.468208  21.384847
6  20.376342  17.546125
7  10.140928  25.543481
8  15.397550  17.977359
9  15.409230  16.876306
           a          c          d
0   9.621240  22.988847  18.548486
1  18.138869  17.016862  15.822263
2  22.087314  17.934707  20.088930
3  30.887575  17.157069  15.753578
4  30.573405  22.917900  25.146258
5  18.468208  23.592275  15.926295
6  20.376342  26.749858  20.687370
7  10.140928  20.881006  14.882369
8  15.397550  16.533816  25.428877
9  15.409230  12.364177  23.109493
           a          b
0   9.621240  18.132585
1  18.138869  20.814594
2  22.087314   5.119496
3  30.887575  19.966378
4  30.573405  16.890415
5  18.468208  21.384847
6  20.376342  17.546125
7  10.140928  25.543481
8  15.397550  17.977359
9  15.409230  16.876306
           a          c          d
0   9.621240  22.988847  18.548486
1  18.138869  17.016862  15.822263
2  22.087314  17.934707  20.088930
3  30.887575  17.157069  15.753578
4  30.573405  22.917900  25.146258
5  18.468208  23.592275  15.926295
6  20.376342  26.749858  20.687370
7  10.140928  20.881006  14.882369
8  15.397550  16.533816  25.428877
9  15.409230  12.364177  23.109493
           a          c          d
0   9.621240  22.988847  18.548486
1  18.138869  17.016862  15.822263
2  22.087314  17.934707  20.088930
3  30.887575  17.157069  15.753578
4  30.573405  22.917900  25.146258
5  18.468208  23.592275  15.926295
6  20.376342  26.749858  20.687370
7  10.140928  20.881006  14.882369
8  15.397550  16.533816  25.428877
9  15.409230  12.364177  23.109493
           a          c          d
0   9.621240  22.988847  18.548486
1  18.138869  17.016862  15.822263
2  22.087314  17.934707  20.088930
3  30.887575  17.157069  15.753578
4  30.573405  22.917900  25.146258
5  18.468208  23.592275  15.926295
6  20.376342  26.749858  20.687370
7  10.140928  20.881006  14.882369
8  15.397550  16.533816  25.428877
9  15.409230  12.364177  23.109493

#Here, we can remove columns specifically from a dataframe useing the drop method.
#df.drop([Column Name or list],inplace=True,axis=1)
df2.drop(['a','c'], inplace=True, axis=1)
print df2

           b          d
0  22.836678  20.393262
1   9.621118  21.057799
2  24.759717  15.643264
3  24.508768  18.295099
4  18.945111  22.177052
5  15.870739  17.891313
6  14.351874  24.533532
7  27.078430  16.055526
8  10.151625  26.425613
9  13.873434  13.262932

Selecting Rows¶

Similarly, we also might want to select out rows, and we can utilize the same syntax.

print df2

           a          b          c          d
0  25.527686  22.407352  22.317725  20.534511
1  16.093873  24.891329  19.787384  18.769327
2  19.956117  17.632472  16.891735  26.756932
3  26.076711  19.088475  28.210971  19.038794
4  14.086152  20.004384  21.171145  18.870483
5  28.200848  18.926232  13.378511  11.371261
6  26.140128  18.097206  25.776707  20.295516
7  20.780529  19.037016  16.275165  15.460639
8  23.273232  25.909209  15.742647  32.962132
9  21.805960  20.098517  16.789451  16.357269

## Selecting rows
print df.ix[0:5,:]     #Select rows 1-5
print df.ix[0:5,]      #Select rows 1-5
print df.ix[0:5]       #Select rows 1-5
print df.ix[[1,2,4]]   #Select rows 1, 2, and 4

           a          b          c          d
0   9.621240  18.132585  22.988847  18.548486
1  18.138869  20.814594  17.016862  15.822263
2  22.087314   5.119496  17.934707  20.088930
3  30.887575  19.966378  17.157069  15.753578
4  30.573405  16.890415  22.917900  25.146258
5  18.468208  21.384847  23.592275  15.926295
           a          b          c          d
0   9.621240  18.132585  22.988847  18.548486
1  18.138869  20.814594  17.016862  15.822263
2  22.087314   5.119496  17.934707  20.088930
3  30.887575  19.966378  17.157069  15.753578
4  30.573405  16.890415  22.917900  25.146258
5  18.468208  21.384847  23.592275  15.926295
           a          b          c          d
0   9.621240  18.132585  22.988847  18.548486
1  18.138869  20.814594  17.016862  15.822263
2  22.087314   5.119496  17.934707  20.088930
3  30.887575  19.966378  17.157069  15.753578
4  30.573405  16.890415  22.917900  25.146258
5  18.468208  21.384847  23.592275  15.926295
           a          b          c          d
1  18.138869  20.814594  17.016862  15.822263
2  22.087314   5.119496  17.934707  20.088930
4  30.573405  16.890415  22.917900  25.146258

#We can now generate a vector based on a critera and then use this for selection
select = df['a']>=20
print select
print df.ix[select] #Notice by including only one variable we are selecting rows and all columns.

select2 = (df['a']>20) & (df['c'] < 30)  #More complex criteria
print select2
print df.ix[select2]

0    False
1    False
2     True
3     True
4     True
5    False
6     True
7    False
8    False
9    False
Name: a, dtype: bool
           a          b          c          d
2  22.087314   5.119496  17.934707  20.088930
3  30.887575  19.966378  17.157069  15.753578
4  30.573405  16.890415  22.917900  25.146258
6  20.376342  17.546125  26.749858  20.687370
0    False
1    False
2     True
3     True
4     True
5    False
6     True
7    False
8    False
9    False
dtype: bool

Exercises¶

These toy examples are fine, but for this exercise we will use the Titaantic dataset. We want to split this out in a few different ways.

1P. Subset only the survived, sex, and age columns and create a new dataframe with the results.

2P. Drop the ticket and the cabin and create a new dataframe with the results.

3P. Split the dataset into a train dataframe having the first 80% of the records and a test dataframe having the last 20%.

4P. Randomely sample the dataset so that there are 2 samples.

5P. Create one dataframe with all of the males and a second with all of the females.

import csv
import urllib2

url = 'https://raw.githubusercontent.com/RPI-Analytics/MGMT6963-2015/master/data/titantic_train.csv'
response = urllib2.urlopen(url)
cr = pd.read_csv(response)
cr

Merging Data¶

In R we used rbind for data that had the same structure but with new rows. Here we can do the same with the concat method. You can read more about concat here. If a key is not specified, it will use the default index of the dataframe. As you can see in the documentation though, it is also possible to specify a key.

 

dfbyrow=pd.concat([df, df3])  #This is equivalent to a rowbind in R. 
print dfbyrow

dfbyrow.append(df3)
print dfbyrow

           a          b          c          d
0  25.289622  18.179422  14.356260  19.259194
1  18.116701  16.448435  21.732393  12.654283
2  12.682688  17.281966  25.048048  17.929218
3  21.582657  16.690173  27.146375  26.151657
4  12.767195  22.253991  24.193309  18.315352
5  27.359973  20.633188  16.102421  19.323444
6  18.104457  10.378091  26.475561  25.473413
7  22.144911  23.449593  13.033869  10.413575
8  13.486282  16.818658  25.171337  13.634255
9  24.588644  20.248421  11.477827  14.714172
           a          b          c          d
0  25.289622  18.179422  14.356260  19.259194
1  18.116701  16.448435  21.732393  12.654283
2  12.682688  17.281966  25.048048  17.929218
3  21.582657  16.690173  27.146375  26.151657
4  12.767195  22.253991  24.193309  18.315352
5  27.359973  20.633188  16.102421  19.323444
6  18.104457  10.378091  26.475561  25.473413
7  22.144911  23.449593  13.033869  10.413575
8  13.486282  16.818658  25.171337  13.634255
9  24.588644  20.248421  11.477827  14.714172
           a          b          c          d
0  25.289622  18.179422  14.356260  19.259194
1  18.116701  16.448435  21.732393  12.654283
2  12.682688  17.281966  25.048048  17.929218
3  21.582657  16.690173  27.146375  26.151657
4  12.767195  22.253991  24.193309  18.315352
5  27.359973  20.633188  16.102421  19.323444
6  18.104457  10.378091  26.475561  25.473413
7  22.144911  23.449593  13.033869  10.413575
8  13.486282  16.818658  25.171337  13.634255
9  24.588644  20.248421  11.477827  14.714172
0  25.289622  18.179422  14.356260  19.259194
1  18.116701  16.448435  21.732393  12.654283
2  12.682688  17.281966  25.048048  17.929218
3  21.582657  16.690173  27.146375  26.151657
4  12.767195  22.253991  24.193309  18.315352
5  27.359973  20.633188  16.102421  19.323444
6  18.104457  10.378091  26.475561  25.473413
7  22.144911  23.449593  13.033869  10.413575
8  13.486282  16.818658  25.171337  13.634255
9  24.588644  20.248421  11.477827  14.714172
           a          b          c          d
0  25.289622  18.179422  14.356260  19.259194
1  18.116701  16.448435  21.732393  12.654283
2  12.682688  17.281966  25.048048  17.929218
3  21.582657  16.690173  27.146375  26.151657
4  12.767195  22.253991  24.193309  18.315352
5  27.359973  20.633188  16.102421  19.323444
6  18.104457  10.378091  26.475561  25.473413
7  22.144911  23.449593  13.033869  10.413575
8  13.486282  16.818658  25.171337  13.634255
9  24.588644  20.248421  11.477827  14.714172
0  25.289622  18.179422  14.356260  19.259194
1  18.116701  16.448435  21.732393  12.654283
2  12.682688  17.281966  25.048048  17.929218
3  21.582657  16.690173  27.146375  26.151657
4  12.767195  22.253991  24.193309  18.315352
5  27.359973  20.633188  16.102421  19.323444
6  18.104457  10.378091  26.475561  25.473413
7  22.144911  23.449593  13.033869  10.413575
8  13.486282  16.818658  25.171337  13.634255
9  24.588644  20.248421  11.477827  14.714172

# View how the index here from df has been reset and incremented while in the earlier example the index was kept. 
addition = df.append(df, ignore_index=True)
print result

            a          b          c          d
0   25.289622  18.179422  14.356260  19.259194
1   18.116701  16.448435  21.732393  12.654283
2   12.682688  17.281966  25.048048  17.929218
3   21.582657  16.690173  27.146375  26.151657
4   12.767195  22.253991  24.193309  18.315352
5   27.359973  20.633188  16.102421  19.323444
6   18.104457  10.378091  26.475561  25.473413
7   22.144911  23.449593  13.033869  10.413575
8   13.486282  16.818658  25.171337  13.634255
9   24.588644  20.248421  11.477827  14.714172
10  25.289622  18.179422  14.356260  19.259194
11  18.116701  16.448435  21.732393  12.654283
12  12.682688  17.281966  25.048048  17.929218
13  21.582657  16.690173  27.146375  26.151657
14  12.767195  22.253991  24.193309  18.315352
15  27.359973  20.633188  16.102421  19.323444
16  18.104457  10.378091  26.475561  25.473413
17  22.144911  23.449593  13.033869  10.413575
18  13.486282  16.818658  25.171337  13.634255
19  24.588644  20.248421  11.477827  14.714172

#Merging additional columns also uses the concat function

dfbycolumns = pd.concat([df, df4], axis=1, join='inner')
print dfbycolumns

           a          b          c          d          e          f  \
0  25.289622  18.179422  14.356260  19.259194  14.261130  13.030898   
1  18.116701  16.448435  21.732393  12.654283  19.314614  14.275413   
2  12.682688  17.281966  25.048048  17.929218  27.116462  24.018898   
3  21.582657  16.690173  27.146375  26.151657  20.057643  21.835530   
4  12.767195  22.253991  24.193309  18.315352  27.831429  20.197306   
5  27.359973  20.633188  16.102421  19.323444  16.312991  17.699751   
6  18.104457  10.378091  26.475561  25.473413   8.668287  19.783580   
7  22.144911  23.449593  13.033869  10.413575  26.230189  21.859006   
8  13.486282  16.818658  25.171337  13.634255  23.176259  15.032589   
9  24.588644  20.248421  11.477827  14.714172  18.104855  17.333459   

           g          h  
0  12.785679  11.116347  
1   9.961737  28.673931  
2  15.457469  31.437990  
3  13.647138  25.170408  
4  21.066324  18.478853  
5   8.974080  26.271553  
6  14.592943  22.046170  
7  24.837221  26.182974  
8  17.706703  15.145252  
9  22.473041  21.062288

#Here we are generating a small dataframe to be used in merging so you can see the differences in specifying inner & outer, 
shortdf=df[0:5]
dfbycolumns = pd.concat([df, shortdf], axis=1, join='inner')
print dfbycolumns

           a          b          c          d          a          b  \
0  25.289622  18.179422  14.356260  19.259194  25.289622  18.179422   
1  18.116701  16.448435  21.732393  12.654283  18.116701  16.448435   
2  12.682688  17.281966  25.048048  17.929218  12.682688  17.281966   
3  21.582657  16.690173  27.146375  26.151657  21.582657  16.690173   
4  12.767195  22.253991  24.193309  18.315352  12.767195  22.253991   

           c          d  
0  14.356260  19.259194  
1  21.732393  12.654283  
2  25.048048  17.929218  
3  27.146375  26.151657  
4  24.193309  18.315352

#Here, the outer does the equivalent of a left outer join for this dataset. 
shortdf=df[0:5]
dfbycolumns = pd.concat([df, shortdf], axis=1, join='outer')
print dfbycolumns

           a          b          c          d          a          b  \
0  25.289622  18.179422  14.356260  19.259194  25.289622  18.179422   
1  18.116701  16.448435  21.732393  12.654283  18.116701  16.448435   
2  12.682688  17.281966  25.048048  17.929218  12.682688  17.281966   
3  21.582657  16.690173  27.146375  26.151657  21.582657  16.690173   
4  12.767195  22.253991  24.193309  18.315352  12.767195  22.253991   
5  27.359973  20.633188  16.102421  19.323444        NaN        NaN   
6  18.104457  10.378091  26.475561  25.473413        NaN        NaN   
7  22.144911  23.449593  13.033869  10.413575        NaN        NaN   
8  13.486282  16.818658  25.171337  13.634255        NaN        NaN   
9  24.588644  20.248421  11.477827  14.714172        NaN        NaN   

           c          d  
0  14.356260  19.259194  
1  21.732393  12.654283  
2  25.048048  17.929218  
3  27.146375  26.151657  
4  24.193309  18.315352  
5        NaN        NaN  
6        NaN        NaN  
7        NaN        NaN  
8        NaN        NaN  
9        NaN        NaN

import csv
import urllib2

url = 'https://raw.githubusercontent.com/RPI-Analytics/MGMT6963-2015/master/data/titantic_train.csv'
response = urllib2.urlopen(url)
titantic = pd.read_csv(response)

titantic

6P. You will find another dataset on github that contains additional rows of individuals here. (In reality they are duplicates but let's say we didn't know that). Provide the code to first import thist dataset and then combine it with the origional titantic dataframe into a new one called titantic_.

7P. You will find another dataset with additional variables that may help in the analysis here. Note. This is a little bit more tricky, as there currently isn't a key in the origional dataset. As a result, do the following substeps: (a). Provide code to verify the number of records in the origional and new dataset. (b). Generate a key for the origional dataset in the pattern of the new dataset. (c). Provide code to merge the two datasets titanticcats.

	survived	pclass	name	sex	age	sibsp	parch	ticket	fare	cabin	embarked
0	0	3	Braund, Mr. Owen Harris	male	22	1	0	A/5 21171	7.2500	NaN	S
1	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38	1	0	PC 17599	71.2833	C85	C
2	1	3	Heikkinen, Miss. Laina	female	26	0	0	STON/O2. 3101282	7.9250	NaN	S
3	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	0	113803	53.1000	C123	S
4	0	3	Allen, Mr. William Henry	male	35	0	0	373450	8.0500	NaN	S
5	0	3	Moran, Mr. James	male	NaN	0	0	330877	8.4583	NaN	Q
6	0	1	McCarthy, Mr. Timothy J	male	54	0	0	17463	51.8625	E46	S
7	0	3	Palsson, Master. Gosta Leonard	male	2	3	1	349909	21.0750	NaN	S
8	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27	0	2	347742	11.1333	NaN	S
9	1	2	Nasser, Mrs. Nicholas (Adele Achem)	female	14	1	0	237736	30.0708	NaN	C
10	1	3	Sandstrom, Miss. Marguerite Rut	female	4	1	1	PP 9549	16.7000	G6	S
11	1	1	Bonnell, Miss. Elizabeth	female	58	0	0	113783	26.5500	C103	S
12	0	3	Saundercock, Mr. William Henry	male	20	0	0	A/5. 2151	8.0500	NaN	S
13	0	3	Andersson, Mr. Anders Johan	male	39	1	5	347082	31.2750	NaN	S
14	0	3	Vestrom, Miss. Hulda Amanda Adolfina	female	14	0	0	350406	7.8542	NaN	S
15	1	2	Hewlett, Mrs. (Mary D Kingcome)	female	55	0	0	248706	16.0000	NaN	S
16	0	3	Rice, Master. Eugene	male	2	4	1	382652	29.1250	NaN	Q
17	1	2	Williams, Mr. Charles Eugene	male	NaN	0	0	244373	13.0000	NaN	S
18	0	3	Vander Planke, Mrs. Julius (Emelia Maria Vande...	female	31	1	0	345763	18.0000	NaN	S
19	1	3	Masselmani, Mrs. Fatima	female	NaN	0	0	2649	7.2250	NaN	C
20	0	2	Fynney, Mr. Joseph J	male	35	0	0	239865	26.0000	NaN	S
21	1	2	Beesley, Mr. Lawrence	male	34	0	0	248698	13.0000	D56	S
22	1	3	McGowan, Miss. Anna "Annie"	female	15	0	0	330923	8.0292	NaN	Q
23	1	1	Sloper, Mr. William Thompson	male	28	0	0	113788	35.5000	A6	S
24	0	3	Palsson, Miss. Torborg Danira	female	8	3	1	349909	21.0750	NaN	S
25	1	3	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	female	38	1	5	347077	31.3875	NaN	S
26	0	3	Emir, Mr. Farred Chehab	male	NaN	0	0	2631	7.2250	NaN	C
27	0	1	Fortune, Mr. Charles Alexander	male	19	3	2	19950	263.0000	C23 C25 C27	S
28	1	3	O'Dwyer, Miss. Ellen "Nellie"	female	NaN	0	0	330959	7.8792	NaN	Q
29	0	3	Todoroff, Mr. Lalio	male	NaN	0	0	349216	7.8958	NaN	S
...	...	...	...	...	...	...	...	...	...	...	...
861	0	2	Giles, Mr. Frederick Edward	male	21	1	0	28134	11.5000	NaN	S
862	1	1	Swift, Mrs. Frederick Joel (Margaret Welles Ba...	female	48	0	0	17466	25.9292	D17	S
863	0	3	Sage, Miss. Dorothy Edith "Dolly"	female	NaN	8	2	CA. 2343	69.5500	NaN	S
864	0	2	Gill, Mr. John William	male	24	0	0	233866	13.0000	NaN	S
865	1	2	Bystrom, Mrs. (Karolina)	female	42	0	0	236852	13.0000	NaN	S
866	1	2	Duran y More, Miss. Asuncion	female	27	1	0	SC/PARIS 2149	13.8583	NaN	C
867	0	1	Roebling, Mr. Washington Augustus II	male	31	0	0	PC 17590	50.4958	A24	S
868	0	3	van Melkebeke, Mr. Philemon	male	NaN	0	0	345777	9.5000	NaN	S
869	1	3	Johnson, Master. Harold Theodor	male	4	1	1	347742	11.1333	NaN	S
870	0	3	Balkic, Mr. Cerin	male	26	0	0	349248	7.8958	NaN	S
871	1	1	Beckwith, Mrs. Richard Leonard (Sallie Monypeny)	female	47	1	1	11751	52.5542	D35	S
872	0	1	Carlsson, Mr. Frans Olof	male	33	0	0	695	5.0000	B51 B53 B55	S
873	0	3	Vander Cruyssen, Mr. Victor	male	47	0	0	345765	9.0000	NaN	S
874	1	2	Abelson, Mrs. Samuel (Hannah Wizosky)	female	28	1	0	P/PP 3381	24.0000	NaN	C
875	1	3	Najib, Miss. Adele Kiamie "Jane"	female	15	0	0	2667	7.2250	NaN	C
876	0	3	Gustafsson, Mr. Alfred Ossian	male	20	0	0	7534	9.8458	NaN	S
877	0	3	Petroff, Mr. Nedelio	male	19	0	0	349212	7.8958	NaN	S
878	0	3	Laleff, Mr. Kristo	male	NaN	0	0	349217	7.8958	NaN	S
879	1	1	Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)	female	56	0	1	11767	83.1583	C50	C
880	1	2	Shelley, Mrs. William (Imanita Parrish Hall)	female	25	0	1	230433	26.0000	NaN	S
881	0	3	Markun, Mr. Johann	male	33	0	0	349257	7.8958	NaN	S
882	0	3	Dahlberg, Miss. Gerda Ulrika	female	22	0	0	7552	10.5167	NaN	S
883	0	2	Banfield, Mr. Frederick James	male	28	0	0	C.A./SOTON 34068	10.5000	NaN	S
884	0	3	Sutehall, Mr. Henry Jr	male	25	0	0	SOTON/OQ 392076	7.0500	NaN	S
885	0	3	Rice, Mrs. William (Margaret Norton)	female	39	0	5	382652	29.1250	NaN	Q
886	0	2	Montvila, Rev. Juozas	male	27	0	0	211536	13.0000	NaN	S
887	1	1	Graham, Miss. Margaret Edith	female	19	0	0	112053	30.0000	B42	S
888	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.4500	NaN	S
889	1	1	Behr, Mr. Karl Howell	male	26	0	0	111369	30.0000	C148	C
890	0	3	Dooley, Mr. Patrick	male	32	0	0	370376	7.7500	NaN	Q

	survived	pclass	name	sex	age	sibsp	parch	ticket	fare	cabin	embarked
0	0	3	Braund, Mr. Owen Harris	male	22	1	0	A/5 21171	7.2500	NaN	S
1	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38	1	0	PC 17599	71.2833	C85	C
2	1	3	Heikkinen, Miss. Laina	female	26	0	0	STON/O2. 3101282	7.9250	NaN	S
3	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	0	113803	53.1000	C123	S
4	0	3	Allen, Mr. William Henry	male	35	0	0	373450	8.0500	NaN	S
5	0	3	Moran, Mr. James	male	NaN	0	0	330877	8.4583	NaN	Q
6	0	1	McCarthy, Mr. Timothy J	male	54	0	0	17463	51.8625	E46	S
7	0	3	Palsson, Master. Gosta Leonard	male	2	3	1	349909	21.0750	NaN	S
8	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27	0	2	347742	11.1333	NaN	S
9	1	2	Nasser, Mrs. Nicholas (Adele Achem)	female	14	1	0	237736	30.0708	NaN	C
10	1	3	Sandstrom, Miss. Marguerite Rut	female	4	1	1	PP 9549	16.7000	G6	S
11	1	1	Bonnell, Miss. Elizabeth	female	58	0	0	113783	26.5500	C103	S
12	0	3	Saundercock, Mr. William Henry	male	20	0	0	A/5. 2151	8.0500	NaN	S
13	0	3	Andersson, Mr. Anders Johan	male	39	1	5	347082	31.2750	NaN	S
14	0	3	Vestrom, Miss. Hulda Amanda Adolfina	female	14	0	0	350406	7.8542	NaN	S
15	1	2	Hewlett, Mrs. (Mary D Kingcome)	female	55	0	0	248706	16.0000	NaN	S
16	0	3	Rice, Master. Eugene	male	2	4	1	382652	29.1250	NaN	Q
17	1	2	Williams, Mr. Charles Eugene	male	NaN	0	0	244373	13.0000	NaN	S
18	0	3	Vander Planke, Mrs. Julius (Emelia Maria Vande...	female	31	1	0	345763	18.0000	NaN	S
19	1	3	Masselmani, Mrs. Fatima	female	NaN	0	0	2649	7.2250	NaN	C
20	0	2	Fynney, Mr. Joseph J	male	35	0	0	239865	26.0000	NaN	S
21	1	2	Beesley, Mr. Lawrence	male	34	0	0	248698	13.0000	D56	S
22	1	3	McGowan, Miss. Anna "Annie"	female	15	0	0	330923	8.0292	NaN	Q
23	1	1	Sloper, Mr. William Thompson	male	28	0	0	113788	35.5000	A6	S
24	0	3	Palsson, Miss. Torborg Danira	female	8	3	1	349909	21.0750	NaN	S
25	1	3	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	female	38	1	5	347077	31.3875	NaN	S
26	0	3	Emir, Mr. Farred Chehab	male	NaN	0	0	2631	7.2250	NaN	C
27	0	1	Fortune, Mr. Charles Alexander	male	19	3	2	19950	263.0000	C23 C25 C27	S
28	1	3	O'Dwyer, Miss. Ellen "Nellie"	female	NaN	0	0	330959	7.8792	NaN	Q
29	0	3	Todoroff, Mr. Lalio	male	NaN	0	0	349216	7.8958	NaN	S
...	...	...	...	...	...	...	...	...	...	...	...
861	0	2	Giles, Mr. Frederick Edward	male	21	1	0	28134	11.5000	NaN	S
862	1	1	Swift, Mrs. Frederick Joel (Margaret Welles Ba...	female	48	0	0	17466	25.9292	D17	S
863	0	3	Sage, Miss. Dorothy Edith "Dolly"	female	NaN	8	2	CA. 2343	69.5500	NaN	S
864	0	2	Gill, Mr. John William	male	24	0	0	233866	13.0000	NaN	S
865	1	2	Bystrom, Mrs. (Karolina)	female	42	0	0	236852	13.0000	NaN	S
866	1	2	Duran y More, Miss. Asuncion	female	27	1	0	SC/PARIS 2149	13.8583	NaN	C
867	0	1	Roebling, Mr. Washington Augustus II	male	31	0	0	PC 17590	50.4958	A24	S
868	0	3	van Melkebeke, Mr. Philemon	male	NaN	0	0	345777	9.5000	NaN	S
869	1	3	Johnson, Master. Harold Theodor	male	4	1	1	347742	11.1333	NaN	S
870	0	3	Balkic, Mr. Cerin	male	26	0	0	349248	7.8958	NaN	S
871	1	1	Beckwith, Mrs. Richard Leonard (Sallie Monypeny)	female	47	1	1	11751	52.5542	D35	S
872	0	1	Carlsson, Mr. Frans Olof	male	33	0	0	695	5.0000	B51 B53 B55	S
873	0	3	Vander Cruyssen, Mr. Victor	male	47	0	0	345765	9.0000	NaN	S
874	1	2	Abelson, Mrs. Samuel (Hannah Wizosky)	female	28	1	0	P/PP 3381	24.0000	NaN	C
875	1	3	Najib, Miss. Adele Kiamie "Jane"	female	15	0	0	2667	7.2250	NaN	C
876	0	3	Gustafsson, Mr. Alfred Ossian	male	20	0	0	7534	9.8458	NaN	S
877	0	3	Petroff, Mr. Nedelio	male	19	0	0	349212	7.8958	NaN	S
878	0	3	Laleff, Mr. Kristo	male	NaN	0	0	349217	7.8958	NaN	S
879	1	1	Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)	female	56	0	1	11767	83.1583	C50	C
880	1	2	Shelley, Mrs. William (Imanita Parrish Hall)	female	25	0	1	230433	26.0000	NaN	S
881	0	3	Markun, Mr. Johann	male	33	0	0	349257	7.8958	NaN	S
882	0	3	Dahlberg, Miss. Gerda Ulrika	female	22	0	0	7552	10.5167	NaN	S
883	0	2	Banfield, Mr. Frederick James	male	28	0	0	C.A./SOTON 34068	10.5000	NaN	S
884	0	3	Sutehall, Mr. Henry Jr	male	25	0	0	SOTON/OQ 392076	7.0500	NaN	S
885	0	3	Rice, Mrs. William (Margaret Norton)	female	39	0	5	382652	29.1250	NaN	Q
886	0	2	Montvila, Rev. Juozas	male	27	0	0	211536	13.0000	NaN	S
887	1	1	Graham, Miss. Margaret Edith	female	19	0	0	112053	30.0000	B42	S
888	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.4500	NaN	S
889	1	1	Behr, Mr. Karl Howell	male	26	0	0	111369	30.0000	C148	C
890	0	3	Dooley, Mr. Patrick	male	32	0	0	370376	7.7500	NaN	Q