zoukankan      html  css  js  c++  java
  • python数据清洗(pandas使用)

    对于给定的样例数据:

     对其进行缺失值填补、名字切分、删除重复值操作:

    import pandas as pd
    from pandas import DataFrame,Series
    df = DataFrame(pd.read_excel("F:\python入门\数据1\food.xlsx"))
    print('原始数据为:
    ',df)
    #利用均值填充缺失值
    df['ounces'].fillna(df['ounces'].mean(),inplace=True)
    print('填充均值后的数据:
    ',df)
    #将food列拆分成两列
    df[['first_name','last_name']]=df['food'].str.split(expand=True)
    df.drop('food',axis=1,inplace=True)
    print('将食物名称拆分后的数据:
    ',df)
    #删除重复数据
    df.drop_duplicates(['first_name','last_name'],inplace=True)
    print('删除重复值后的数据:
    ',df)
    #df.to_excel("F:\python入门\数据1\food_new.xlsx")

    结果:

    原始数据为:
               food  ounces  animal
    0        bacon     4.0     pig
    1  pulled pork     3.0     pig
    2        bacon     NaN     pig
    3     Pastrami     6.0     cow
    4  corned beef     7.5     cow
    5        Bacon     8.0     pig
    6     pastrami    -3.0     cow
    7    honey ham     5.0     pig
    8     nova lox     6.0  salmon
    填充均值后的数据:
               food  ounces  animal
    0        bacon  4.0000     pig
    1  pulled pork  3.0000     pig
    2        bacon  4.5625     pig
    3     Pastrami  6.0000     cow
    4  corned beef  7.5000     cow
    5        Bacon  8.0000     pig
    6     pastrami -3.0000     cow
    7    honey ham  5.0000     pig
    8     nova lox  6.0000  salmon
    将食物名称拆分后的数据:
        ounces  animal first_name last_name
    0  4.0000     pig      bacon      None
    1  3.0000     pig     pulled      pork
    2  4.5625     pig      bacon      None
    3  6.0000     cow   Pastrami      None
    4  7.5000     cow     corned      beef
    5  8.0000     pig      Bacon      None
    6 -3.0000     cow   pastrami      None
    7  5.0000     pig      honey       ham
    8  6.0000  salmon       nova       lox
    删除重复值后的数据:
        ounces  animal first_name last_name
    0     4.0     pig      bacon      None
    1     3.0     pig     pulled      pork
    3     6.0     cow   Pastrami      None
    4     7.5     cow     corned      beef
    5     8.0     pig      Bacon      None
    6    -3.0     cow   pastrami      None
    7     5.0     pig      honey       ham
    8     6.0  salmon       nova       lox

  • 相关阅读:
    初识NuGet
    NHibernate之映射文件配置说
    NHibernate 数据查询之QueryOver<T>
    一、NHibernate配置所支持的属性
    javascript forEach无法break,使用every代替
    missing ) after argument list
    数组分组chunk的一种写法
    call和apply第一个参数为null/undefined,函数this指向全局对象
    apply的理解和数组降维
    javascript push 和 concat 的区别
  • 原文地址:https://www.cnblogs.com/xiao02fang/p/13451507.html
Copyright © 2011-2022 走看看