zoukankan      html  css  js  c++  java
  • python之pandas&&DataFrame

    1.Series  Series是一个一维数组

    pandas会默认从0开始作为Series的index

    >>> test = pd.Series(['num0','num1','num2','num3'])
    >>> test
    0    num0
    1    num1
    2    num2
    3    num3
    dtype: object

    也可以自己指定index

    >>> test = pd.Series(['num0','num1','num2','num3'],index=['A','B','C','D'])
    >>> test
    A    num0
    B    num1
    C    num2
    D    num3
    dtype: object

    Series还可以用dictionary来构造一个Series

    >>> cities = {'beijing':55000,'shanghai':60000,'shenzhen':20000,'guangzhou':25000,'suzhou':None}
    >>> test = pd.Series(cities)
    >>> test
    beijing      55000.0
    guangzhou    25000.0
    shanghai     60000.0
    shenzhen     20000.0
    suzhou           NaN
    dtype: float64
    >>> print type(test)
    <class 'pandas.core.series.Series'>
    >>> test['beijing']
    55000.0
    >>> test[['beijing','shanghai','shenzhen']]
    beijing     55000.0
    shanghai    60000.0
    shenzhen    20000.0
    dtype: float64

    2.DataFrame DataFrame是一个二维的数组 DataFrame可以由一个dictionary构造得到

    创建DataFrame

    >>> data = {'city':['beijing','shanghai','guangzhou','shenzhen','hangzhou','chognqing'],'years':[2010,2011,2012,2013,2014,2015],'population':[2100,2300,2400,2500,
    >>> print data
    {'city': ['beijing', 'shanghai', 'guangzhou', 'shenzhen', 'hangzhou', 'chognqing'], 'population': [2100, 2300, 2400, 2500, 2600, 2600], 'years': [2010, 2011, 2012, 2013, 2014, 2015]}
    >>> pd.DataFrame(data)
            city  population  years
    0    beijing        2100   2010
    1   shanghai        2300   2011
    2  guangzhou        2400   2012
    3   shenzhen        2500   2013
    4   hangzhou        2600   2014
    5  chognqing        2600   2015

    调整列的排序和行的名称

    >>> pd.DataFrame(data,columns= ['years','city','population'])
       years       city  population
    0   2010    beijing        2100
    1   2011   shanghai        2300
    2   2012  guangzhou        2400
    3   2013   shenzhen        2500
    4   2014   hangzhou        2600
    5   2015  chognqing        2600
    >>> pd.DataFrame(data,columns= ['years','city','population'],index = ['A','B','C','D','E','F'])
       years       city  population
    A   2010    beijing        2100
    B   2011   shanghai        2300
    C   2012  guangzhou        2400
    D   2013   shenzhen        2500
    E   2014   hangzhou        2600
    F   2015  chognqing        2600
    >>> 

    DataFrame的每一个列,每一行都是一个Series

    >>> mmap = pd.DataFrame(data,columns= ['years','city','population'],index = ['A','B','C','D','E','F'])
    >>> print mmap
       years       city  population
    A   2010    beijing        2100
    B   2011   shanghai        2300
    C   2012  guangzhou        2400
    D   2013   shenzhen        2500
    E   2014   hangzhou        2600
    F   2015  chognqing        2600
    >>> type(mmap)
    <class 'pandas.core.frame.DataFrame'>
    >>> type(mmap['city'])
    <class 'pandas.core.series.Series'>
    >>> 
    >>> mmap.ix['C']
    years              2012
    city          guangzhou
    population         2400
    Name: C, dtype: object
    >>> type(mmap.ix['C'])
    <class 'pandas.core.series.Series'>

    DataFrame的赋值操作

    >>> mmap['population']['A']
    2100
    >>> mmap['population']['A'] = 2000
    __main__:1: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame
     
    See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
    >>> mmap['population']['A']
    2000
    >>> mmap['years'] = 2017
    >>> mmap
       years       city  population
    A   2017    beijing        2000
    B   2017   shanghai        2300
    C   2017  guangzhou        2400
    D   2017   shenzhen        2500
    E   2017   hangzhou        2600
    F   2017  chognqing        2600
    >>> 

    赋值操作

    >>> mmap.years = np.arange(6)
    >>> mmap
       years       city  population
    A      0    beijing        2000
    B      1   shanghai        2300
    C      2  guangzhou        2400
    D      3   shenzhen        2500
    E      4   hangzhou        2600
    F      5  chognqing        2600
    >>> val = pd.Series([200,300,400],index=['A','B','C'])
    >>> val
    A    200
    B    300
    C    400
    dtype: int64
    >>> mmap['year] = val
      File "<stdin>", line 1
        mmap['year] = val
                        ^
    SyntaxError: EOL while scanning string literal
    >>> mmap['year'] = val
    >>> mmap
       years       city  population   year
    A      0    beijing        2000  200.0
    B      1   shanghai        2300  300.0
    C      2  guangzhou        2400  400.0
    D      3   shenzhen        2500    NaN
    E      4   hangzhou        2600    NaN
    F      5  chognqing        2600    NaN
    >>> mmap['years'] = 2017
    >>> mmap
       years       city  population   year
    A   2017    beijing        2000  200.0
    B   2017   shanghai        2300  300.0
    C   2017  guangzhou        2400  400.0
    D   2017   shenzhen        2500    NaN
    E   2017   hangzhou        2600    NaN
    F   2017  chognqing        2600    NaN
    >>> mmap.columns
    Index([u'years', u'city', u'population', u'year'], dtype='object')
    >>> mmap.index
    Index([u'A', u'B', u'C', u'D', u'E', u'F'], dtype='object')
  • 相关阅读:
    每日总结
    每日总结
    每日总结
    每日总结
    每日总结
    每日总结
    每日总结
    每日总结
    每日总结
    Windows邮件添加QQ邮箱
  • 原文地址:https://www.cnblogs.com/chenyang920/p/7979702.html
Copyright © 2011-2022 走看看