原文地址:https://chrisalbon.com/python/data_wrangling/pandas_apply_operations_to_dataframes/
Applying Operations Over pandas Dataframes
20 Dec 2017
Import Modules
import pandas as pd
import numpy as np
Create a dataframe
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'year': [2012, 2012, 2013, 2014, 2014],
'reports': [4, 24, 31, 2, 3],
'coverage': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df
coverage | name | reports | year | |
---|---|---|---|---|
Cochice | 25 | Jason | 4 | 2012 |
Pima | 94 | Molly | 24 | 2012 |
Santa Cruz | 57 | Tina | 31 | 2013 |
Maricopa | 62 | Jake | 2 | 2014 |
Yuma | 70 | Amy | 3 | 2014 |
Create a capitalization lambda function
capitalizer = lambda x: x.upper()
Apply the capitalizer function over the column ‘name’
apply() can apply a function along any axis of the dataframe
df['name'].apply(capitalizer)
Cochice JASON
Pima MOLLY
Santa Cruz TINA
Maricopa JAKE
Yuma AMY
Name: name, dtype: object
Map the capitalizer lambda function over each element in the series ‘name’
map() applies an operation over each element of a series
df['name'].map(capitalizer)
Cochice JASON
Pima MOLLY
Santa Cruz TINA
Maricopa JAKE
Yuma AMY
Name: name, dtype: object
Apply a square root function to every single cell in the whole data frame
applymap() applies a function to every single element in the entire dataframe.
# Drop the string variable so that applymap() can run
df = df.drop('name', axis=1)
# Return the square root of every cell in the dataframe
df.applymap(np.sqrt)
coverage | reports | year | |
---|---|---|---|
Cochice | 5.000000 | 2.000000 | 44.855323 |
Pima | 9.695360 | 4.898979 | 44.855323 |
Santa Cruz | 7.549834 | 5.567764 | 44.866469 |
Maricopa | 7.874008 | 1.414214 | 44.877611 |
Yuma | 8.366600 | 1.732051 | 44.877611 |
Applying A Function Over A Dataframe
Create a function that multiplies all non-strings by 100
# create a function called times100
def times100(x):
# that, if x is a string,
if type(x) is str:
# just returns it untouched
return x
# but, if not, return it multiplied by 100
elif x:
return 100 * x
# and leave everything else
else:
return
Apply the times100 over every cell in the dataframe
df.applymap(times100)
coverage | reports | year | |
---|---|---|---|
Cochice | 2500 | 400 | 201200 |
Pima | 9400 | 2400 | 201200 |
Santa Cruz | 5700 | 3100 | 201300 |
Maricopa | 6200 | 200 | 201400 |
Yuma | 7000 | 300 | 201400 |