zoukankan html css js c++ java

pandas库的使用DataFrame

pandas库的使用之DataFrame

DataFrame类型

DataFrame类型由共用相同索引的一组列组成

index_0     data_a    data_1 ……data_w
index_1     data_b    data_2 ……data_x
index_2     data_c    data_3 ……data_y
index_3     data_d    data_4 ……data_z
#这是索引            这里是多列数据
#可以看成是一个表格！！！   
#ps：
    列称之为index
    行称之为column
    索引那列称为axis=0  即0轴
    表示数据的为axis=1  即1轴

DataFrame类型概述
DataFrame是一个表格型的数据类型，每列值类型可以不同。
DataFrame既有行索引，也有列索引
DataFrame常用于表达二维数据，但也可以表达多维数据
dataFramel类型可以由如下类型创建：

1.二维ndarray对象

2.由一维ndarray对象、列表、字典、元组、或Series构成的字典

3.Series类型

4.其他的DataFrame类型

代码示例：

#二维ndarray对象创建
In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: d = pd.DataFrame(np.arange(10).reshape(2,5))

In [4]: d
Out[4]:
   0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9
#纵向上自动生成了列索引
#横向上自动生成了行索引
#自动生成的索引都是默认从0开始

#从一维ndarray对象字典创建

In [8]: import pandas as pd

In [9]: dt = { 'one':pd.Series([1,2,3],index = ['a','b','c']),
   ...: 'two':pd.Series([9,8,7,6],index = ['a','b','c','d'])}

In [10]: d = pd.DataFrame(dt)

In [11]: d
Out[11]:
   one  two
a  1.0    9
b  2.0    8
c  3.0    7
d  NaN    6

In [12]: pd.DataFrame(dt,index = ['b','b','d'],columns = ['two','three'])
Out[12]:
   two three
b    8   NaN
b    8   NaN
d    6   NaN
#1.字典的key自动成为列索引，index自动成为行索引
#2.数据会根据行列索引自动补齐

#从列表类型的字典创建

In [14]: import pandas as pd

In [15]: dl = { 'one':[1,2,3,4],'two':[9,8,7,6]}

In [16]: d = pd.DataFrame(dl,index = ['a','b','c','d'])

In [17]: d
Out[17]:
   one  two
a    1    9
b    2    8
c    3    7
d    4    6
#1.字典的key自动成为列索引，index自动成为行索引

 #表格数据的示例
    
In [19]: import pandas as pd

In [20]: dl = {'城市':['北京','上海','广州','深圳','沈阳'],
    ...: '环比':[101.5,101.2,101.3,102.0,100.1],
    ...: '同比':[120.7,127.3,119.4,140.9,101.4],
    ...: '定基':[121.4,127.8,120.0,145.5,101.6]}

In [21]: d = pd.DataFrame(dl,index =['c1','c2','c3','c4','c5'])

In [22]: d
Out[22]:
    城市     环比     同比     定基
c1  北京  101.5  120.7  121.4
c2  上海  101.2  127.3  127.8
c3  广州  101.3  119.4  120.0
c4  深圳  102.0  140.9  145.5
c5  沈阳  100.1  101.4  101.6

In [23]: d.index
Out[23]: Index(['c1', 'c2', 'c3', 'c4', 'c5'], dtype='object')
#.index方法获得DataFrame对象的index
In [24]: d.columns
Out[24]: Index(['城市', '环比', '同比', '定基'], dtype='object')
#.columns方法获得DataFrame对象的行索引
In [25]: d.values
    #.values方法获得DataFrame对象的所有值
Out[25]:
array([['北京', 101.5, 120.7, 121.4],
       ['上海', 101.2, 127.3, 127.8],
       ['广州', 101.3, 119.4, 120.0],
       ['深圳', 102.0, 140.9, 145.5],
       ['沈阳', 100.1, 101.4, 101.6]], dtype=object)
In [26]: d['同比']
Out[26]:
c1    120.7
c2    127.3
c3    119.4
c4    140.9
c5    101.4
Name: 同比, dtype: float64

In [27]: d.loc['c2']
    #获取一行用.loc['行索引']获取一行的值
Out[27]:
城市       上海
环比    101.2
同比    127.3
定基    127.8
Name: c2, dtype: object

In [28]: d['同比']['c2']
    #d['列索引']['行索引']获取某个元素
Out[28]: 127.3
    #总结：
        DataFrame是二维带"标签"的数组
        DataFrame的基本操作类似于Series,依据行列索引

df.dtypes #返回每个元素的类型

df.head()#返回前几行

df.tail()#返回后几行

df.as_matrix()#返回一个numpy的ndarray数组

df.to_numpy()#有pandas版本没有这个方法参考as_matrix方法

df.describe()

count 列中元素个数

mean 列中元素平均值

std 差方

min 列中元素最小值

max 列中元素最大值

50% 列中大小居中的值

25%

75%

1、 loc和iloc函数都是用来选择某行的

iloc与loc的不同是：

iloc是按照行索引所在的位置来选取数据，参数只能是整数。

而loc是按照索引名称来选取数据，参数类型依索引类型而定； 2、 at和iat函数是只能选择某个位置的值，iat是按照行索引和列索引的位置来选取数据的。而at是按照行索引和列索引来选取数据； 3、 loc和iloc函数的功能包含at和iat函数的功能。

查看全文

相关阅读:
use sortedset
关于WPF 的快捷键
 关于WPF的UI控件焦点问题
 Hive2.x 版本的安装及配置以及要注意的事项
 nginx长连接设置
 nginx响应时间监控脚本
 ［转］细说Redis监控和告警
 mongodb高级查询
 Python导入自定义包或模块
 [转]大数据hadoop集群硬件选择

原文地址：https://www.cnblogs.com/pythonyeyu/p/10734399.html