1.Series Series是一个一维数组
pandas会默认从0开始作为Series的index
>>> test = pd.Series(['num0','num1','num2','num3']) >>> test 0 num0 1 num1 2 num2 3 num3 dtype: object
也可以自己指定index
>>> test = pd.Series(['num0','num1','num2','num3'],index=['A','B','C','D']) >>> test A num0 B num1 C num2 D num3 dtype: object
Series还可以用dictionary来构造一个Series
>>> cities = {'beijing':55000,'shanghai':60000,'shenzhen':20000,'guangzhou':25000,'suzhou':None} >>> test = pd.Series(cities) >>> test beijing 55000.0 guangzhou 25000.0 shanghai 60000.0 shenzhen 20000.0 suzhou NaN dtype: float64 >>> print type(test) <class 'pandas.core.series.Series'> >>> test['beijing'] 55000.0 >>> test[['beijing','shanghai','shenzhen']] beijing 55000.0 shanghai 60000.0 shenzhen 20000.0 dtype: float64
2.DataFrame DataFrame是一个二维的数组 DataFrame可以由一个dictionary构造得到
创建DataFrame
>>> data = {'city':['beijing','shanghai','guangzhou','shenzhen','hangzhou','chognqing'],'years':[2010,2011,2012,2013,2014,2015],'population':[2100,2300,2400,2500, >>> print data {'city': ['beijing', 'shanghai', 'guangzhou', 'shenzhen', 'hangzhou', 'chognqing'], 'population': [2100, 2300, 2400, 2500, 2600, 2600], 'years': [2010, 2011, 2012, 2013, 2014, 2015]} >>> pd.DataFrame(data) city population years 0 beijing 2100 2010 1 shanghai 2300 2011 2 guangzhou 2400 2012 3 shenzhen 2500 2013 4 hangzhou 2600 2014 5 chognqing 2600 2015
调整列的排序和行的名称
>>> pd.DataFrame(data,columns= ['years','city','population']) years city population 0 2010 beijing 2100 1 2011 shanghai 2300 2 2012 guangzhou 2400 3 2013 shenzhen 2500 4 2014 hangzhou 2600 5 2015 chognqing 2600 >>> pd.DataFrame(data,columns= ['years','city','population'],index = ['A','B','C','D','E','F']) years city population A 2010 beijing 2100 B 2011 shanghai 2300 C 2012 guangzhou 2400 D 2013 shenzhen 2500 E 2014 hangzhou 2600 F 2015 chognqing 2600 >>>
DataFrame的每一个列,每一行都是一个Series
>>> mmap = pd.DataFrame(data,columns= ['years','city','population'],index = ['A','B','C','D','E','F']) >>> print mmap years city population A 2010 beijing 2100 B 2011 shanghai 2300 C 2012 guangzhou 2400 D 2013 shenzhen 2500 E 2014 hangzhou 2600 F 2015 chognqing 2600 >>> type(mmap) <class 'pandas.core.frame.DataFrame'> >>> type(mmap['city']) <class 'pandas.core.series.Series'> >>> >>> mmap.ix['C'] years 2012 city guangzhou population 2400 Name: C, dtype: object >>> type(mmap.ix['C']) <class 'pandas.core.series.Series'>
DataFrame的赋值操作
>>> mmap['population']['A'] 2100 >>> mmap['population']['A'] = 2000 __main__:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy >>> mmap['population']['A'] 2000 >>> mmap['years'] = 2017 >>> mmap years city population A 2017 beijing 2000 B 2017 shanghai 2300 C 2017 guangzhou 2400 D 2017 shenzhen 2500 E 2017 hangzhou 2600 F 2017 chognqing 2600 >>>
赋值操作
>>> mmap.years = np.arange(6) >>> mmap years city population A 0 beijing 2000 B 1 shanghai 2300 C 2 guangzhou 2400 D 3 shenzhen 2500 E 4 hangzhou 2600 F 5 chognqing 2600 >>> val = pd.Series([200,300,400],index=['A','B','C']) >>> val A 200 B 300 C 400 dtype: int64 >>> mmap['year] = val File "<stdin>", line 1 mmap['year] = val ^ SyntaxError: EOL while scanning string literal >>> mmap['year'] = val >>> mmap years city population year A 0 beijing 2000 200.0 B 1 shanghai 2300 300.0 C 2 guangzhou 2400 400.0 D 3 shenzhen 2500 NaN E 4 hangzhou 2600 NaN F 5 chognqing 2600 NaN >>> mmap['years'] = 2017 >>> mmap years city population year A 2017 beijing 2000 200.0 B 2017 shanghai 2300 300.0 C 2017 guangzhou 2400 400.0 D 2017 shenzhen 2500 NaN E 2017 hangzhou 2600 NaN F 2017 chognqing 2600 NaN >>> mmap.columns Index([u'years', u'city', u'population', u'year'], dtype='object') >>> mmap.index Index([u'A', u'B', u'C', u'D', u'E', u'F'], dtype='object')