zoukankan html css js c++ java

Pandas系列教程（4）Pandas新增数据列

Pandas新增数据列

在进行数据分析时，经常需要按照一定的条件创建新的数据列，然后进行进一步分析

直接复制
df.apply方法
df.assign方法
按照条件选择分组分别赋值

1、读取csv数据到dataframe

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
print(df.head())

2、直接赋值方法

实例：清理温度列，变成数字列

# 设定索引为日期，方便按日期筛选
df.set_index('ymd', inplace=True)
# 替换温度的后缀℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

实例：计算温度差

# 注意df['bWendu']其实是一个Series,后面的减法返回的是Series
df.loc[:, 'wencha'] = df['bWendu'] - df['yWendu']

完整代码：

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

# 替换温度的后缀℃, 并转为int32（修改列）
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

print(df.head())
print('*' * 50, '
')

# 计算温度差(新增列)
# 注意df['bWendu']其实是一个Series,后面的减法返回的是Series
df.loc[:, 'wencha'] = df['bWendu'] - df['yWendu']
print(df.head())

3、df.apply方法

实例：添加一列温度类型

如果温度大于33度就是高温
低于-10度就是低温
否则是常温

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

# 替换温度的后缀℃, 并转为int32（修改列）
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

print(df.head())
print('*' * 50, '
')


def get_wendu_type(x):
    if x['bWendu'] > 33:
        return "高温"
    elif x['yWendu'] < -10:
        return "低温"
    else:
        return "常温"


# 注意需要设置axis--1,这时Series的index是columns
df.loc[:, 'wendu_type'] = df.apply(get_wendu_type, axis=1)
# 打印前几行数据
print(df.head())
print('*' * 50, '
')
# 查看温度类型的计数
print(df['wendu_type'].value_counts())

4、df.assign方法

实例：将温度从摄氏度变成华氏度

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

# 替换温度的后缀℃, 并转为int32（修改列）
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

print(df.head())
print('*' * 50, '
')

df_huashi = df.assign(
    yWendu_huashi=lambda x: x['yWendu'] * 9 / 5 + 32,
    bWendu_huashi=lambda x: x['bWendu'] * 9 / 5 + 32
)

print(df_huashi.head())
print('*' * 50, '
')

5、按条件选择分组分别赋值

按条件先选择数据，然后对着部分数据赋值新列

实例：高低温差大于10度，则认为温差较大

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

# 替换温度的后缀℃, 并转为int32（修改列）
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 打印前几行数据
print(df.head())
print('*' * 50, '
')

# 先创建空列（这是第一种创建新列的方法）
df['wencha_type'] = ""

df.loc[df['bWendu'] - df['yWendu'] > 10, 'wencha_type'] = "温差大"
df.loc[df['bWendu'] - df['yWendu'] <= 10, 'wencha_type'] = "温差正常"

# 打印前几行数据
print(df.head())
print('*' * 50, '
')

# 查看温差类型的计数
print(df['wencha_type'].value_counts())

查看全文

相关阅读:
MYSQL函数 Cast和convert的用法详解
 MySQL5.7.9（GA）的安装
 TMS Scripter importtool的使用
 MySQL 高可用架构在业务层面的应用分析
 UNIGUI:How to redirect and close session?
HTML URL 编码:请参阅：http://www.w3school.com.cn/tags/html_ref_urlencode.html
js 解决函数加载的问题
 必备函数
 Action 分离
 JavaScript．Ｒｅｍｏｖｅ

原文地址：https://www.cnblogs.com/xingxingnbsp/p/13851672.html