zoukankan      html  css  js  c++  java
  • pandas 处理数据中NaN数据

    使用dropna()函数去掉NaN的行或列

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    print(df.dropna(axis=0,how='any'))

    输出:

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                 A     B     C   D
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23

    使用fillna()函数替换NaN值

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #将NaN值替换为0
    print(df.fillna(value=0))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                 A     B     C   D
    2018-03-10   0   0.0   2.0   3
    2018-03-11   4   5.0   0.0   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23

    使用isnull()函数判断数据是否丢失

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #矩阵用布尔来进行表示 是nan为ture 不是nan为false
    print(pd.isnull(df))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                    A      B      C      D
    2018-03-10  False   True  False  False
    2018-03-11  False  False   True  False
    2018-03-12  False  False  False  False
    2018-03-13  False  False  False  False
    2018-03-14  False  False  False  False
    2018-03-15  False  False  False  False

    #判断数据中是否会存在NaN值 

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #判断数据中是否会存在NaN值
    print(np.any(df.isnull()))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
    True

  • 相关阅读:
    面向对象一: 数据加载器完成缓存
    软件开发模式总结
    失业求职随便接个单
    恭喜蓝网5巨头输了
    mysql安装及改端口
    解决NAVICAT 无法连接MYSQL8.0.12_可视化工具无法连接 MYSQL 8.0
    c#截取两个指定字符串中间的字符串
    匹配2关键字得结果
    怎么才能更好伪原创
    AntiCrawlerSolution(反爬解决方案)
  • 原文地址:https://www.cnblogs.com/sea-stream/p/10319470.html
Copyright © 2011-2022 走看看