zoukankan      html  css  js  c++  java
  • python学习笔记(四):pandas基础

    pandas 基础

    serise

    import pandas as pd
    from pandas import Series, DataFrame
    obj = Series([4, -7, 5, 3])
    obj
    
    0    4
    1   -7
    2    5
    3    3
    dtype: int64
    
    obj.values
    
    array([ 4, -7,  5,  3], dtype=int64)
    
    obj.index
    
    RangeIndex(start=0, stop=4, step=1)
    
    obj[[1,3]]
    # 跳着选取数据
    
    1   -7
    3    3
    dtype: int64
    
    obj[1:3]
    
    1   -7
    2    5
    dtype: int64
    
    pd.isnull(obj)
    
    0    False
    1    False
    2    False
    3    False
    dtype: bool
    
    • reindex可以用来插值
    obj.reindex(range(5), method = 'ffill')
    
    0    4
    1   -7
    2    5
    3    3
    4    3
    dtype: int64
    
    • 标签切片是闭区间的

    dataframe

    data = {'state': ['asd','qwe','sdf','ert'],
           'year': [2000, 2001, 2002, 2003],
           'pop': [1.5,1.7,3.6,2.4]}
    data = DataFrame(data)
    data
    
    pop state year
    0 1.5 asd 2000
    1 1.7 qwe 2001
    2 3.6 sdf 2002
    3 2.4 ert 2003
    data.year
    # 比r里提取列要方便点
    
    0    2000
    1    2001
    2    2002
    3    2003
    Name: year, dtype: int64
    
    data['debt'] = range(4)
    data
    
    pop state year debt
    0 1.5 asd 2000 0
    1 1.7 qwe 2001 1
    2 3.6 sdf 2002 2
    3 2.4 ert 2003 3
    • index是不能修改的
    a = data.index
    a[1] = 6
    
    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    <ipython-input-9-57677294f950> in <module>()
          1 a = data.index
    ----> 2 a[1] = 6
    
    
    F:Anacondalibsite-packagespandascoreindexesase.py in __setitem__(self, key, value)
       1668 
       1669     def __setitem__(self, key, value):
    -> 1670         raise TypeError("Index does not support mutable operations")
       1671 
       1672     def __getitem__(self, key):
    
    
    TypeError: Index does not support mutable operations
    
    data.columns
    
    Index(['pop', 'state', 'year', 'debt'], dtype='object')
    
    • .ix标签索引功能,输入行和列
    • 不加.ix只能选取其中的某列或某行,不能列与行同时选取
    data[:3]
    
    pop state year debt
    0 1.5 asd 2000 0
    1 1.7 qwe 2001 1
    2 3.6 sdf 2002 2
    data.ix[:,:3]
    
    pop state year
    0 1.5 asd 2000
    1 1.7 qwe 2001
    2 3.6 sdf 2002
    3 2.4 ert 2003
    • 删除某列用drop,axis = 0表示行,1表示列
    • 删除后原数据不变
    data.drop(0,axis=0)
    
    pop state year debt
    1 1.7 qwe 2001 1
    2 3.6 sdf 2002 2
    3 2.4 ert 2003 3
    data.drop('year', axis=1)
    
    pop state debt
    0 1.5 asd 0
    1 1.7 qwe 1
    2 3.6 sdf 2
    3 2.4 ert 3
    data
    
    pop state year debt
    0 1.5 asd 2000 0
    1 1.7 qwe 2001 1
    2 3.6 sdf 2002 2
    3 2.4 ert 2003 3
    import numpy as np
    df = DataFrame(np.arange(9).reshape(3, 3))
    df
    
    0 1 2
    0 0 1 2
    1 3 4 5
    2 6 7 8
    • applymap()可以对dataframe每一个元素运用函数
    • apply()可以对每一维数组运用函数
    df.applymap(lambda x: '%.2f' % x)
    
    0 1 2
    0 0.00 1.00 2.00
    1 3.00 4.00 5.00
    2 6.00 7.00 8.00
    data.sort_values(by='pop')
    # 对某一列排序
    
    pop state year debt
    0 1.5 asd 2000 0
    1 1.7 qwe 2001 1
    3 2.4 ert 2003 3
    2 3.6 sdf 2002 2
    data.describe()
    
    pop year debt
    count 4.000000 4.000000 4.000000
    mean 2.300000 2001.500000 1.500000
    std 0.948683 1.290994 1.290994
    min 1.500000 2000.000000 0.000000
    25% 1.650000 2000.750000 0.750000
    50% 2.050000 2001.500000 1.500000
    75% 2.700000 2002.250000 2.250000
    max 3.600000 2003.000000 3.000000
    df.isin([1])
    
    0 1 2
    0 False True False
    1 False False False
    2 False False False
    • None、NaN会被当作NA处理
    • df.shape不加括号相当于dim()
    df.shape
    
    (3, 3)
    
    • dropna删除缺失值
    df.ix[:1, :1] = None
    df
    
    0 1 2
    0 NaN NaN 2
    1 NaN NaN 5
    2 6.0 7.0 8
    • 填充缺失值可以调用字典,不同行添加不同值
    df.fillna({0:11, 1:22})
    
    0 1 2
    0 11.0 22.0 2
    1 11.0 22.0 5
    2 6.0 7.0 8
    df
    
    0 1 2
    0 NaN NaN 2
    1 NaN NaN 5
    2 6.0 7.0 8
    df.fillna({0:11, 1:22}, inplace=True)
    
    0 1 2
    0 11.0 22.0 2
    1 11.0 22.0 5
    2 6.0 7.0 8
    df
    
    0 1 2
    0 11.0 22.0 2
    1 11.0 22.0 5
    2 6.0 7.0 8
    • inplace修改对象不产生副本
  • 相关阅读:
    LeetCode449. 序列化和反序列化二叉搜索树
    LeetCode448. 找到所有数组中消失的数字
    一行代码如何隐藏 Linux 进程?
    C语言这么厉害,它自身又是用什么语言写的?
    了解C语言,是否代表了解C ++的一半?
    C语言小白那些不知道的事儿
    6 条 Git 实用技巧
    干货来袭,收藏方便找到该网站
    零基础小白如何入门Shell,快来看看(收藏)这篇大总结!!
    Java用于嵌入式系统的优点和局限
  • 原文地址:https://www.cnblogs.com/xihehe/p/9026860.html
Copyright © 2011-2022 走看看