zoukankan html css js c++ java

python学习笔记（四）：pandas基础

pandas 基础

serise

import pandas as pd
from pandas import Series, DataFrame
obj = Series([4, -7, 5, 3])
obj

0    4
1   -7
2    5
3    3
dtype: int64

obj.values

array([ 4, -7,  5,  3], dtype=int64)

obj.index

RangeIndex(start=0, stop=4, step=1)

obj[[1,3]]
# 跳着选取数据

1   -7
3    3
dtype: int64

obj[1:3]

1   -7
2    5
dtype: int64

pd.isnull(obj)

0    False
1    False
2    False
3    False
dtype: bool

reindex可以用来插值

obj.reindex(range(5), method = 'ffill')

0    4
1   -7
2    5
3    3
4    3
dtype: int64

标签切片是闭区间的

dataframe

data = {'state': ['asd','qwe','sdf','ert'],
       'year': [2000, 2001, 2002, 2003],
       'pop': [1.5,1.7,3.6,2.4]}
data = DataFrame(data)
data

	pop	state	year
0	1.5	asd	2000
1	1.7	qwe	2001
2	3.6	sdf	2002
3	2.4	ert	2003

data.year
# 比r里提取列要方便点

0    2000
1    2001
2    2002
3    2003
Name: year, dtype: int64

data['debt'] = range(4)
data

	pop	state	year	debt
0	1.5	asd	2000	0
1	1.7	qwe	2001	1
2	3.6	sdf	2002	2
3	2.4	ert	2003	3

index是不能修改的

a = data.index
a[1] = 6

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-9-57677294f950> in <module>()
      1 a = data.index
----> 2 a[1] = 6


F:Anacondalibsite-packagespandascoreindexesase.py in __setitem__(self, key, value)
   1668 
   1669     def __setitem__(self, key, value):
-> 1670         raise TypeError("Index does not support mutable operations")
   1671 
   1672     def __getitem__(self, key):


TypeError: Index does not support mutable operations

data.columns

Index(['pop', 'state', 'year', 'debt'], dtype='object')

.ix标签索引功能，输入行和列
不加.ix只能选取其中的某列或某行，不能列与行同时选取

data[:3]

	pop	state	year	debt
0	1.5	asd	2000	0
1	1.7	qwe	2001	1
2	3.6	sdf	2002	2

data.ix[:,:3]

	pop	state	year
0	1.5	asd	2000
1	1.7	qwe	2001
2	3.6	sdf	2002
3	2.4	ert	2003

删除某列用drop，axis = 0表示行，1表示列
删除后原数据不变

data.drop(0,axis=0)

	pop	state	year	debt
1	1.7	qwe	2001	1
2	3.6	sdf	2002	2
3	2.4	ert	2003	3

data.drop('year', axis=1)

	pop	state	debt
0	1.5	asd	0
1	1.7	qwe	1
2	3.6	sdf	2
3	2.4	ert	3

data

	pop	state	year	debt
0	1.5	asd	2000	0
1	1.7	qwe	2001	1
2	3.6	sdf	2002	2
3	2.4	ert	2003	3

import numpy as np
df = DataFrame(np.arange(9).reshape(3, 3))
df

	0	1	2
0	0	1	2
1	3	4	5
2	6	7	8

applymap()可以对dataframe每一个元素运用函数
apply()可以对每一维数组运用函数

df.applymap(lambda x: '%.2f' % x)

	0	1	2
0	0.00	1.00	2.00
1	3.00	4.00	5.00
2	6.00	7.00	8.00

data.sort_values(by='pop')
# 对某一列排序

	pop	state	year	debt
0	1.5	asd	2000	0
1	1.7	qwe	2001	1
3	2.4	ert	2003	3
2	3.6	sdf	2002	2

data.describe()

	pop	year	debt
count	4.000000	4.000000	4.000000
mean	2.300000	2001.500000	1.500000
std	0.948683	1.290994	1.290994
min	1.500000	2000.000000	0.000000
25%	1.650000	2000.750000	0.750000
50%	2.050000	2001.500000	1.500000
75%	2.700000	2002.250000	2.250000
max	3.600000	2003.000000	3.000000

df.isin([1])

	0	1	2
0	False	True	False
1	False	False	False
2	False	False	False

None、NaN会被当作NA处理
df.shape不加括号相当于dim()

df.shape

(3, 3)

dropna删除缺失值

df.ix[:1, :1] = None
df

	0	1	2
0	NaN	NaN	2
1	NaN	NaN	5
2	6.0	7.0	8

填充缺失值可以调用字典，不同行添加不同值

df.fillna({0:11, 1:22})

	0	1	2
0	11.0	22.0	2
1	11.0	22.0	5
2	6.0	7.0	8

df

	0	1	2
0	NaN	NaN	2
1	NaN	NaN	5
2	6.0	7.0	8

df.fillna({0:11, 1:22}, inplace=True)

	0	1	2
0	11.0	22.0	2
1	11.0	22.0	5
2	6.0	7.0	8

df

	0	1	2
0	11.0	22.0	2
1	11.0	22.0	5
2	6.0	7.0	8

inplace修改对象不产生副本

查看全文

相关阅读:
springboot文件上传：单个文件上传和多个文件上传
 Eclipse：很不错的插件-devStyle，将你的eclipse变成idea风格
 springboot项目搭建：结构和入门程序
 POJ 3169 Layout 差分约束系统
 POJ 3723 Conscription 最小生成树
 POJ 3255 Roadblocks 次短路
 UVA 11367 Full Tank? 最短路
 UVA 10269 Adventure of Super Mario 最短路
 UVA 10603 Fill 最短路
 POJ 2431 Expedition 优先队列

原文地址：https://www.cnblogs.com/xihehe/p/9026860.html