pandas 读写sql数据库

zoukankan html css js c++ java

pandas 读写sql数据库
如何从数据库中读取数据到DataFrame中？

使用pandas.io.sql模块中的sql.read_sql_query(sql_str,conn)和sql.read_sql_table(table_name,conn)就好了。

第一个是使用sql语句，第二个是直接将一个table转到dataframe中。

pandas提供这这样的接口完成此工作——read_sql()。下面我们用离子来说明这个方法。

我们要从sqlite数据库中读取数据，引入相关模块
1. read_sql接受两个参数，一个是sql语句，这个你可能需要单独学习；一个是con（数据库连接）、read_sql直接返回一个DataFrame对象
2. 打印一下，可以看到已经成功的读取了数据
4. 我们还可以使用index_col参数来规定将那一列数据设置为index
6. 结果输出为：
8. 当然，我们可以设置多个index，只要将index_col的值设置为列表
10. 输出结果为：
12. 写入数据库也很简单，下面第二句用于删除数据库中已有的表"weather_2012"，然后将df保存到数据库中的"weather_2012"表
13. 假如我们使用的是mysql数据库也没问题，我们只需要建立与mysql的连接即可，用下面的con代替上面的con可以达到的效果相同。
补充：

（1）DateFrane 可以将结果转换成DataFrame

import pandas as pd

import pymysql

conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1')

cursor = conn.cursor()

# cursor.execute("DROP TABLE IF EXISTS test")#必须用cursor才行

sql = "select * from user"

df = pd.read_sql(sql,conn,)

aa=pd.DataFrame(df)

print aa

（2）存储

pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除

pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='replace')#必须制定flavor='mysql'

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import pandas as pd

import pymysql

conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1')

cursor = conn.cursor()

# cursor.execute("DROP TABLE IF EXISTS user_copy")#必须用cursor才行

sql = "select * from user"

df = pd.read_sql(sql,conn,chunksize=2)

for piece in df:

aa=pd.DataFrame(piece)

# pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除

pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='replace')#必须制定flavor='mysql'

(3)根据条件添加一列数据

piece['xb'] = list(map(lambda x: '男' if x == '123' else '女', piece['pwd']))

(4)如果有汉字，链接时必须知道字符类型   charset="utf8"

(5)最后实现代码（迭代读取数据，根据一列内容新增一列，）

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import pandas as pd

import pymysql

conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1',charset="utf8")

cursor = conn.cursor()

# cursor.execute("DROP TABLE IF EXISTS user_copy")#必须用cursor才行

sql = "select * from user"

df = pd.read_sql(sql,conn,chunksize=2)

for piece in df:

# pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除

piece['xb'] = list(map(lambda x: '男' if x == '123' else '女', piece['pwd']))

print(piece)

pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='append')#必须制定flavor='mysql'

(7)sqlalchemy链接需要制定一些中文 create_engine("mysql+pymysql://root:123456@127.0.0.1:3306/jd?charset=utf8", max_overflow=5)

# 用sqlalchemy链接

from sqlalchemy import create_engine

engine = create_engine("mysql+pymysql://root:123456@127.0.0.1:3306/db1?charset=utf8")

sql = "select * from user"

df = pd.read_sql(sql,engine,chunksize=2)

for piece in df:

print(piece)

pd.io.sql.to_sql(piece, "user_copy", engine, flavor='mysql', if_exists='append')

pandas 选取数据 iloc和 loc的用法不太一样，iloc是根据索引， loc是根据行的数值

>>> import pandas as pd
>>> import os
>>> os.chdir("D:\")
>>> d = pd.read_csv("GWAS_water.qassoc", delimiter= "s+")
>>> d.loc[1:3]
CHR SNP BP NMISS BETA SE R2 T P
1 1 . 447 44 0.1800 0.1783 0.02369 1.009 0.3185
2 1 . 449 44 0.2785 0.2473 0.02931 1.126 0.2665
3 1 . 452 44 0.1800 0.1783 0.02369 1.009 0.3185

>>> d.loc[0:3]
CHR SNP BP NMISS BETA SE R2 T P
0 1 . 410 44 0.2157 0.1772 0.03406 1.217 0.2304
1 1 . 447 44 0.1800 0.1783 0.02369 1.009 0.3185
2 1 . 449 44 0.2785 0.2473 0.02931 1.126 0.2665
3 1 . 452 44 0.1800 0.1783 0.02369 1.009 0.3185

>>> d.iloc[0:3]
CHR SNP BP NMISS BETA SE R2 T P
0 1 . 410 44 0.2157 0.1772 0.03406 1.217 0.2304
1 1 . 447 44 0.1800 0.1783 0.02369 1.009 0.3185
2 1 . 449 44 0.2785 0.2473 0.02931 1.126 0.2665

>>> d.iloc[1:3,2]
1 447
2 449
Name: BP, dtype: int64

>>> d.iloc[0:3,2]
0 410
1 447
2 449
Name: BP, dtype: int64

>>> d.head()
CHR SNP BP NMISS BETA SE R2 T P
0 1 . 410 44 0.2157 0.1772 0.03406 1.2170 0.2304
1 1 . 447 44 0.1800 0.1783 0.02369 1.0090 0.3185
2 1 . 449 44 0.2785 0.2473 0.02931 1.1260 0.2665
3 1 . 452 44 0.1800 0.1783 0.02369 1.0090 0.3185
4 1 . 462 44 0.2548 0.2744 0.02012 0.9286 0.3584

>>> d.tail(3)
CHR SNP BP NMISS BETA SE R2 T P
418704 12 . 19345588 44 -0.2207 0.2558 0.01743 -0.8631 0.393
418705 12 . 19345598 44 -0.2207 0.2558 0.01743 -0.8631 0.393
418706 12 . 19345611 44 -0.2207 0.2558 0.01743 -0.8631 0.393

>>> d.describe()
CHR BP NMISS BETA SE
count 418707.000000 4.187070e+05 418707.0 4.186820e+05 418682.00000
mean 5.805738 1.442822e+07 44.0 -4.271777e-03 0.21433
std 3.392930 8.933882e+06 0.0 2.330019e-01 0.05190
min 1.000000 4.100000e+02 44.0 -1.610000e+00 0.10130
25% 3.000000 7.345860e+06 44.0 -1.638000e-01 0.17320
50% 5.000000 1.371612e+07 44.0 -1.826000e-16 0.20670
75% 9.000000 2.051322e+07 44.0 1.391000e-01 0.25010
max 12.000000 4.238896e+07 44.0 1.467000e+00 0.67580

R2 T P
count 418682.000000 4.186820e+05 4.186820e+05
mean 0.026268 -1.910774e-02 4.772397e-01
std 0.035903 1.095115e+00 2.944290e-01
min 0.000000 -5.582000e+00 2.034000e-08
25% 0.002969 -7.955000e-01 2.179000e-01
50% 0.012930 -8.468000e-16 4.624000e-01
75% 0.035910 6.712000e-01 7.254000e-01
max 0.531200 6.898000e+00 1.000000e+00

>>> d.sort_values(by="P").iloc[0:15]
CHR SNP BP NMISS BETA SE R2 T P
42870 1 . 32316680 44 1.1870 0.1721 0.5312 6.898 2.034000e-08
29301 1 . 22184568 44 1.1870 0.1721 0.5312 6.898 2.034000e-08
29302 1 . 22184590 44 1.1870 0.1721 0.5312 6.898 2.034000e-08
29306 1 . 22184654 44 1.1870 0.1721 0.5312 6.898 2.034000e-08
29305 1 . 22184628 44 1.1870 0.1721 0.5312 6.898 2.034000e-08
29304 1 . 22184624 44 1.1870 0.1721 0.5312 6.898 2.034000e-08
112212 3 . 14365699 44 1.4670 0.2255 0.5018 6.504 7.490000e-08
29254 1 . 22167448 44 1.0780 0.1723 0.4822 6.254 1.713000e-07
69291 2 . 9480651 44 1.1140 0.1829 0.4690 6.091 2.939000e-07
29299 1 . 22180991 44 0.8527 0.1458 0.4488 5.848 6.574000e-07
101391 3 . 6959715 44 0.6782 0.1166 0.4462 5.817 7.285000e-07
29333 1 . 22198267 44 0.9252 0.1616 0.4383 5.724 9.888000e-07
195513 5 . 20178388 44 1.0350 0.1817 0.4359 5.697 1.082000e-06
29295 1 . 22180901 44 0.7469 0.1320 0.4324 5.657 1.236000e-06
29300 1 . 22181119 44 0.7469 0.1320 0.4324 5.657 1.236000e-06
>>> sort_D = d.sort_values(by="P").iloc[0:5]
>>> m_D = d.dropna() #remove NA

>>> sort_C = d.sort_values(["P","CHR", "BP"])
>>> sort_C.to_csv(file_name, sep=' ', encoding='utf-8')

>>> d.sort_values(by="C", ascending=True)

>>> sort_D.to_csv("result.txt", sep= " ")
>>> sort_D.to_csv("result_no_index.txt", sep= " ", index=False)
>>>

参考

for m, i in enumerate(list(range(1,10))):
for n, j in enumerate(list(range(m+1,10))):
print i * j

安装：

pip install pandas
导入:

import pandas as pd

from pandas import Series,DataFrame

#Series

数据类型： Series,DataFrame

Series：与numpy中的一维数组相似

初始化：
方式一：

data = [1,2,3,4,5] #一般为序列
series_data = Series(data) #不传入任何参数,索引默认从0开始
方式二：

indexes = ['name','shuxue','yuwen','huaxue','yingyu']
series_data =Series(['lizhen',1,2,3,4],index=indexes) #索引为指定的索引值,此时索引为指定的值，索引的长度与值的长度一定要相等

方式三：

data = {'huaxue': 3, 'name': 'lizhen', 'shuxue': 1, 'yingyu': 4, 'yuwen': 2}
series_from_dict = Series(data)

查看索引：series_data.index

根据索引修改值： series_data.'shuxue' = 3

查看全部数据：series_data.values

设置数据名称： series_data.index.name = 'type'

根据索引查找列的值： series_data['yuwen']

获取多个索引的值： series_data[['yingyu','yuwen']]

导出数据到指定格式(dict,clipboard,csv,json,string,sql)：

  series_from_dict.to_dict()

两个Series相加：

具有相同的索引才可以相加, 当索引不同时,相加的结果为 NaN

只有值为整数时才有意义

判断索引是否存在：

  index_name in series_data #返回True 或者 False

#DataFrame类似表或电子表格

初始化时传入等长列表或numpy数组组成的字典，自动增加索引，且全部列都会被有序排列

方式一：

data = {'state': ['Ohio','Ohio','Ohio'],
'year': [2000,2001,2002],
'pop': [1.5,1.7,3.6]
}

frame = DataFrame(data)  #

方式二：

data = {'state': ['Ohio','Ohio','Ohio'],
'year': [2000,2001,2002],
'pop': [1.5,1.7,3.6]
}

frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three'])

#数据展示按照column指定的格式

#若传入的列未找到,默认为NaN

方式三：

data = {'Nevada': {2001:2.4,2002:2.9},
'Ohio':{2000:1.5,2001s:1.7,2002:2.4},
}
frame = DataFrame(data)

#外层key解释为column name, 内层key解释为 index name, 内层key不存在时,对应的column默认NaN补齐
设置索引的名称： frame.idnex.name = 'self_index_name'
设置列的名称： frame.columns.name = 'self_columns_name'
查看所有的值： frame.values
查看所有的列名： frame.columns
查看指定列的值：frame[column_name] 或 frame.column_name
查看前N行的值： frame.head(n)
查看后N行值： frame.tail(n)
查看指定索引行的值： frame.ix[[index_name1[,index_name2]]]
修改指定列的值： frame['column_name'] = 'new_value'
注意：当指定的值为单一值时, 会自动在所有的行上广播
指定多个值时, 长度需要和frame的行的长度相等
指定的值可以为Series, Series的索引必须与frame的索引名称相同,索引名不同时，默认插入NaN
删除不需要的列： del frame['column_name']
注意: 索引的名称无法更改

在使用pandas框架的DataFrame的过程中，如果需要处理一些字符串的特性，例如判断某列是否包含一些关键字，某列的字符长度是否小于3等等这种需求，如果掌握str列内置的方法，处理起来会方便很多。

下面我们来详细了解一下，Series类的str自带的方法有哪些。

1、cat() 拼接字符串
例子：
>>> Series(['a', 'b', 'c']).str.cat(['A', 'B', 'C'], sep=',')
0 a,A
1 b,B
2 c,C
dtype: object
>>> Series(['a', 'b', 'c']).str.cat(sep=',')
'a,b,c'
>>> Series(['a', 'b']).str.cat([['x', 'y'], ['1', '2']], sep=',')
0 a,x,1
1 b,y,2
dtype: object

2、split() 切分字符串
>>> import numpy,pandas;
>>> s = pandas.Series(['a_b_c', 'c_d_e', numpy.nan, 'f_g_h'])
>>> s.str.split('_')
0 [a, b, c]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object
>>> s.str.split('_', -1)
0 [a, b, c]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object
>>> s.str.split('_', 0)
0 [a, b, c]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object
>>> s.str.split('_', 1)
0 [a, b_c]
1 [c, d_e]
2 NaN
3 [f, g_h]
dtype: object
>>> s.str.split('_', 2)
0 [a, b, c]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object
>>> s.str.split('_', 3)
0 [a, b, c]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object

3、get() 获取指定位置的字符串

>>> s.str.get(0)
0 a
1 c
2 NaN
3 f
dtype: object
>>> s.str.get(1)
0 _
1 _
2 NaN
3 _
dtype: object
>>> s.str.get(2)
0 b
1 d
2 NaN
3 g
dtype: object

4、join() 对每个字符都用给点的字符串拼接起来，不常用

>>> s.str.join("!")
0 a!_!b!_!c
1 c!_!d!_!e
2 NaN
3 f!_!g!_!h
dtype: object
>>> s.str.join("?")
0 a?_?b?_?c
1 c?_?d?_?e
2 NaN
3 f?_?g?_?h
dtype: object
>>> s.str.join(".")
0 a._.b._.c
1 c._.d._.e
2 NaN
3 f._.g._.h
dtype: object

5、contains() 是否包含表达式

>>> s.str.contains('d')
0 False
1 True
2 NaN
3 False
dtype: object

6、replace() 替换

>>> s.str.replace("_", ".")
0 a.b.c
1 c.d.e
2 NaN
3 f.g.h
dtype: object

7、repeat() 重复

>>> s.str.repeat(3)
0 a_b_ca_b_ca_b_c
1 c_d_ec_d_ec_d_e
2 NaN
3 f_g_hf_g_hf_g_h
dtype: object

8、pad() 左右补齐

>>> s.str.pad(10, fillchar="?")
0 ?????a_b_c
1 ?????c_d_e
2 NaN
3 ?????f_g_h
dtype: object
>>>
>>> s.str.pad(10, side="right", fillchar="?")
0 a_b_c?????
1 c_d_e?????
2 NaN
3 f_g_h?????
dtype: object

9、center() 中间补齐，看例子
>>> s.str.center(10, fillchar="?")
0 ??a_b_c???
1 ??c_d_e???
2 NaN
3 ??f_g_h???
dtype: object

10、ljust() 右边补齐，看例子

>>> s.str.ljust(10, fillchar="?")
0 a_b_c?????
1 c_d_e?????
2 NaN
3 f_g_h?????
dtype: object

11、rjust() 左边补齐，看例子

>>> s.str.rjust(10, fillchar="?")
0 ?????a_b_c
1 ?????c_d_e
2 NaN
3 ?????f_g_h
dtype: object

12、zfill() 左边补0

>>> s.str.zfill(10)
0 00000a_b_c
1 00000c_d_e
2 NaN
3 00000f_g_h
dtype: object

13、wrap() 在指定的位置加回车符号

>>> s.str.wrap(3)
0 a_b _c
1 c_d _e
2 NaN
3 f_g _h
dtype: object

14、slice() 按给点的开始结束位置切割字符串
>>> s.str.slice(1,3)
0 _b
1 _d
2 NaN
3 _g
dtype: object

15、slice_replace() 使用给定的字符串，替换指定的位置的字符
>>> s.str.slice_replace(1, 3, "?")
0 a?_c
1 c?_e
2 NaN
3 f?_h
dtype: object
>>> s.str.slice_replace(1, 3, "??")
0 a??_c
1 c??_e
2 NaN
3 f??_h
dtype: object

16、count() 计算给定单词出现的次数
>>> s.str.count("a")
0 1
1 0
2 NaN
3 0
dtype: float64

17、startswith() 判断是否以给定的字符串开头
>>> s.str.startswith("a");
0 True
1 False
2 NaN
3 False
dtype: object

18、endswith() 判断是否以给定的字符串结束
>>> s.str.endswith("e");
0 False
1 True
2 NaN
3 False
dtype: object

19、findall() 查找所有符合正则表达式的字符，以数组形式返回
>>> s.str.findall("[a-z]");
0 [a, b, c]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object

20、match() 检测是否全部匹配给点的字符串或者表达式
>>> s
0 a_b_c
1 c_d_e
2 NaN
3 f_g_h
dtype: object
>>> s.str.match("[d-z]");
0 False
1 False
2 NaN
3 True
dtype: object

21、extract() 抽取匹配的字符串出来，注意要加上括号，把你需要抽取的东西标注上
>>> s.str.extract("([d-z])");
0 NaN
1 d
2 NaN
3 f
dtype: object

22、len() 计算字符串的长度
>>> s.str.len()
0 5
1 5
2 NaN
3 5
dtype: float64

23、strip() 去除前后的空白字符
>>> idx = pandas.Series([' jack', 'jill ', ' jesse ', 'frank'])
>>> idx.str.strip()
0 jack
1 jill
2 jesse
3 frank
dtype: object

24、rstrip() 去除后面的空白字符

25、lstrip() 去除前面的空白字符

26、partition() 把字符串数组切割称为DataFrame，注意切割只是切割称为三部分，分隔符前，分隔符，分隔符后

27、rpartition() 从右切起
>>> s.str.partition('_')
0 1 2
0 a _ b_c
1 c _ d_e
2 NaN NaN NaN
3 f _ g_h
>>> s.str.rpartition('_')
0 1 2
0 a_b _ c
1 c_d _ e
2 NaN NaN NaN
3 f_g _ h

28、lower() 全部小写
29、upper() 全部大写
30、find() 从左边开始，查找给定字符串的所在位置
>>> s.str.find('d')
0 -1
1 2
2 NaN
3 -1
dtype: float64

31、rfind() 从右边开始，查找给定字符串的所在位置

32、index() 查找给定字符串的位置，注意，如果不存在这个字符串，那么会报错！

33、rindex() 从右边开始查找，给定字符串的位置

>>> s.str.index('_')
0 1
1 1
2 NaN
3 1
dtype: float64
34、capitalize() 首字符大写
>>> s.str.capitalize()
0 A_b_c
1 C_d_e
2 NaN
3 F_g_h
dtype: object
35、swapcase() 大小写互换
>>> s.str.swapcase()
0 A_B_C
1 C_D_E
2 NaN
3 F_G_H
dtype: object
36、normalize() 序列化数据，数据分析很少用到，咱们就不研究了

37、isalnum() 是否全部是数字和字母组成

>>> s.str.isalnum()
0 False
1 False
2 NaN
3 False
dtype: object

38、isalpha() 是否全部是字母

>>> s.str.isalpha()
0 False
1 False
2 NaN
3 False
dtype: object

39、isdigit() 是否全部都是数字

>>> s.str.isdigit()
0 False
1 False
2 NaN
3 False
dtype: object

40、isspace() 是否空格

>>> s.str.isspace()
0 False
1 False
2 NaN
3 False
dtype: object

41、islower() 是否全部小写

42、isupper() 是否全部大写

>>> s.str.islower()
0 True
1 True
2 NaN
3 True
dtype: object
>>> s.str.isupper()
0 False
1 False
2 NaN
3 False
dtype: object

43、istitle() 是否只有首字母为大写，其他字母为小写

>>> s.str.istitle()
0 False
1 False
2 NaN
3 False
dtype: object
44、isnumeric() 是否是数字
45、isdecimal() 是否全是数字

pandas获取列数据位常用功能，但在写法上还有些要注意的地方，在这里总结一下：

import pandas as pd
data1 = pd.DataFrame(...) #任意初始化一个列数为3的DataFrame
data1.columns=['a', 'b', 'c']

1.
data1['b']
#这里取到第2列（即b列）的值

2.
data1.b
#效果同1，取第2列（即b列）

#这里b为列名称，但必须是连续字符串，不能有空格。如果列明有空格，则只能采取第1种方法

3.
data1[data1.columns[1:]]

#这里取data1的第2列和第3列的所有数据

番外1.
data1[5:10]

#这里取6到11行的所有数据，而不是列数据

番外2.
data_raw_by_tick[2]
#非法，返回“KeyError: 2”

导出mysql数据，利用pandas生成excel文档，并发送邮件

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas
import pandas as pd
import MySQLdb
import MySQLdb.cursors
import os
import datetime
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import smtplib

#返回SQL结果的函数
def retsql(sql):
db_user = MySQLdb.connect('IP','用户名','密码','j数据库名(可以不指定)',cursorclass=MySQLdb.cursors.DictCursor(设置返回结果以字典的格式))
cursor = db_user.cursor()
cursor.execute("SET NAMES utf8;"(设置字符集为utf-8，不然在返回的结果中会显示乱码，即使数据库的编码设置就是utf-8))
cursor.execute(sql)
ret = cursor.fetchall()
db_user.close()

return ret

#生成xls文件的函数
def retxls(ret,dt):
file_name = datetime.datetime.now().strftime("/path/to/store/%Y-%m-%d-%H:%M") + dt + ".sql.xlsx"
dret = pd.DataFrame.from_records(ret)
dret.to_excel(filename,"Sheet1",engine="openpyxl"）###z注意openpyxl这个库可能在生成xls的时候出错，pip install openpyxls==1.8.6，其他版本似乎与pandas有点冲突，安装1.8.6的即可

print "Ok!!! the file in",file_name
return filename

#发送邮件的函数
##传入主题，显示名，目标邮箱，附件名
def sendm(sub,cttstr,to_list,file):
msg = MIMEMultipart()
att = MIMEText(open(file,'rb').read(),"base64","utf-8")
att["Content-Type"] = "application/octet-stream"
att["Content-Disposition"] = 'attachment; filename="sql查询结果.xlsx"'

msg['from'] = '发件人地址'
msg['subject'] = sub
ctt = MIMEText(cttstr,'plain','utf-8')

msg.attach(att)
msg.attach(ctt)
try:
server = smtplib.SMTP()
#server.set_debuglevel(1) ###如果问题可打开此选项以便调试
server.connect("mail.example.com",'25')
server.starttls() ###如果开启了ssl或者tls加密，开启加密
server.login("可用邮箱用户名","密码")
server.sendmail(msg['from'],to_list,msg.as_string())
server.quit()
print 'ok!!!'
except Exception,e:
print str(e)

###想要查询的sql语句
sql="""sql语句"""

#接收邮件的用户列表
to_list = ['test1@example.com',
'test2@example.com']

#执行sql并将结果传递给ret
ret = retsql(sql)

#将结果文件路径结果传给retfile
retfile = retxls(ret,"1")

#发送邮件
#发送sql语句内容
sendm(sub1,sub1,to_list,retfile1)

Python之ipython、notebook、matplotlib安装使用

#!/usr/bin/python
# -*- coding: UTF-8 -*-

以下进行逐步安装配置
python 3.5.2, ipython 5.1.0, jupyter notebook, matplotlib

1、安装python3.5

具体安装请参考官方文档。安装程序时注意勾选配置环境变量。https://www.python.org/downloads/windows/

2、升级pip

python -m pip install --upgrade pip

3、使用pip安装ipython

pip.exe install ipython

4、使用pip安装notebook

pip install notebook

5、安装画图工具 matplotlib

pip install matplotlib

pip install matplotlib --upgrade

6、实例

import numpy as np

import matplotlib.pyplot as plt

N = 5
menMeans = (20, 35, 30, 35, 27)
menStd = (2, 3, 4, 1, 2)
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(ind, menMeans, width, color='r', yerr=menStd)
womenMeans = (25, 32, 34, 20, 25)
womenStd = (3, 5, 2, 3, 3)
rects2 = ax.bar(ind+width, womenMeans, width, color='y', yerr=womenStd)
# add some
ax.set_ylabel('Scores')
ax.set_title('Scores by group and gender')
ax.set_xticks(ind+width)
ax.set_xticklabels( ('G1', 'G2', 'G3', 'G4', 'G5') )
ax.legend( (rects1[0], rects2[0]), ('Men', 'Women') )
def autolabel(rects):
# attach some text labels
for rect in rects:
height = rect.get_height()
ax.text(rect.get_x()+rect.get_width()/2., 1.05*height, '%d'%int(height),
ha='center', va='bottom')
autolabel(rects1)
autolabel(rects2)
plt.show()

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(9)
y = np.sin(x)
plt.plot(x,y)
plt.show()

import matplotlib.pyplot as plt

plt.bar(left = 0,height = 1)
plt.show()

首先我们import了matplotlib.pyplot ，然后直接调用其bar方法，最后用show显示图像。

我解释一下bar中的两个参数：

left：柱形的左边缘的位置，如果我们指定1那么当前柱形的左边缘的x值就是1.0了

height：这是柱形的高度，也就是Y轴的值了

left，height除了可以使用单独的值（此时是一个柱形），也可以使用元组来替换（此时代表多个矩形）。例如，下面的例子：

import matplotlib.pyplot as plt

plt.bar(left = (0,1),height = (1,0.5))
plt.show()

可以看到 left = (0,1)的意思就是总共有两个矩形，第一个的左边缘为0，第二个的左边缘为1。height参数同理。
当然，可能你还觉得这两个矩形“太胖”了。此时我们可以通过指定bar的width参数来设置它们的宽度。

import matplotlib.pyplot as plt

plt.bar(left = (0,1),height = (1,0.5),width = 0.35)
plt.show()

此时又来需求了，我需要标明x，y轴的说明。比如x轴是性别，y轴是人数。实现也很简单，看代码：

import matplotlib.pyplot as plt

plt.xlabel(u'性别')

plt.ylabel(u'人数')

plt.bar(left = (0,1),height = (1,0.5),width = 0.35)

plt.show()

注意这里的中文一定要用u（3.0以上好像不用，我用的2.7），因为matplotlib只支持unicode。接下来，让我们在x轴上的每个bar进行说明。比如第一个是“男”，第二个是“女”。

import matplotlib.pyplot as plt

plt.xlabel(u'性别')

plt.ylabel(u'人数')

plt.xticks((0,1),(u'男',u'女'))

plt.bar(left = (0,1),height = (1,0.5),width = 0.35)

plt.show()

plt.xticks的用法和我们前面说到的left,height的用法差不多。如果你有几个bar，那么就是几维的元组。第一个是文字的位置，第二个是具体的文字说明。不过这里有个问题，很显然我们指定的位置有些“偏移”，最理想的状态应该在每个矩形的中间。你可以更改(0,1)=>( (0+0.35)/2 ,(1+0.35)/2 )不过这样比较麻烦。我们可以通过直接指定bar方法里面的align="center"就可以让文字居中了。

import matplotlib.pyplot as plt

plt.xlabel(u'性别')

plt.ylabel(u'人数')

plt.xticks((0,1),(u'男',u'女'))

plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")

plt.show()

接下来，我们还可以给图标加入标题。当然，还有图例也少不掉:

import matplotlib.pyplot as plt

plt.xlabel(u'性别')

plt.ylabel(u'人数')


plt.title(u"性别比例分析")

plt.xticks((0,1),(u'男',u'女'))

rect = plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")

plt.legend((rect,),(u"图例",))

plt.show()

注意这里的legend方法，里面的参数必须是元组。即使你只有一个图例，不然显示不正确。

接下来，我们还可以在每个矩形的上面标注它具体点Y值。这里，我们需要用到一个通用的方法：

def autolabel(rects):
for rect in rects:
height = rect.get_height()
plt.text(rect.get_x()+rect.get_width()/2., 1.03*height, '%s' % float(height))

其中plt.text的参数分别是：x坐标，y坐标，要显示的文字。所以，调用代码如下：

import matplotlib.pyplot as plt

def autolabel(rects):
for rect in rects:
height = rect.get_height()

plt.text(rect.get_x()+rect.get_width()/2., 1.03*height, '%s' % float(height))

plt.xlabel(u'性别')

plt.ylabel(u'人数')


plt.title(u"性别比例分析")

plt.xticks((0,1),(u'男',u'女'))

rect = plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")

plt.legend((rect,),(u"图例",))

autolabel(rect)

plt.show()

matplotlib所绘制的图表的每个组成部分都和一个对象对应，我们可以通过调用这些对象的属性设置方法set_*()或者pyplot模块的属性设置函数setp()设置它们的属性值。

因为matplotlib实际上是一套面向对象的绘图库，因此也可以直接获取对象的属性

配置文件

绘制一幅图需要对许多对象的属性进行配置，例如颜色、字体、线型等等。我们在绘图时，并没有逐一对这些属性进行配置，许多都直接采用了matplotlib的缺省配置。

matplotlib将这些缺省配置保存在一个名为“matplotlibrc”的配置文件中，通过修改配置文件，我们可以修改图表的缺省样式。配置文件的读入可以使用rc_params()，它返回一个配置字典；在matplotlib模块载入时会调用rc_params()，并把得到的配置字典保存到rcParams变量中；matplotlib将使用rcParams字典中的配置进行绘图；用户可以直接修改此字典中的配置，所做的改变会反映到此后创建的绘图元素。

绘制多子图（快速绘图）

Matplotlib 里的常用类的包含关系为 Figure -> Axes -> (Line2D, Text, etc.)一个Figure对象可以包含多个子图(Axes)，在matplotlib中用Axes对象表示一个绘图区域，可以理解为子图。

可以使用subplot()快速绘制包含多个子图的图表，它的调用形式如下：

subplot(numRows, numCols, plotNum)

subplot将整个绘图区域等分为numRows行* numCols列个子区域，然后按照从左到右，从上到下的顺序对每个子区域进行编号，左上的子区域的编号为1。

如果numRows，numCols和plotNum这三个数都小于10的话，可以把它们缩写为一个整数，例如subplot(323)和subplot(3,2,3)是相同的。

subplot在plotNum指定的区域中创建一个轴对象。如果新创建的轴和之前创建的轴重叠的话，之前的轴将被删除。

subplot()返回它所创建的Axes对象，我们可以将它用变量保存起来，然后用sca()交替让它们成为当前Axes对象，并调用plot()在其中绘图。

绘制多图表（快速绘图）

如果需要同时绘制多幅图表，可以给figure()传递一个整数参数指定Figure对象的序号，如果序号所指定的Figure对象已经存在，将不创建新的对象，而只是让它成为当前的Figure对象。

import numpy as np

import matplotlib.pyplot as plt

plt.figure(1) # 创建图表1

plt.figure(2) # 创建图表2

ax1 = plt.subplot(211) # 在图表2中创建子图1

ax2 = plt.subplot(212) # 在图表2中创建子图2

x = np.linspace(0, 3, 100)

for i in xrange(5):

  plt.figure(1) # # 选择图表1

plt.plot(x, np.exp(i*x/3))

  plt.sca(ax1) # # 选择图表2的子图1

plt.plot(x, np.sin(i*x))

  plt.sca(ax2) # 选择图表2的子图2

plt.plot(x, np.cos(i*x))

plt.show()

在图表中显示中文

matplotlib的缺省配置文件中所使用的字体无法正确显示中文。为了让图表能正确显示中文，可以有几种解决方案。

在程序中直接指定字体。
在程序开头修改配置字典rcParams。
修改配置文件。
比较简便的方式是，中文字符串用unicode格式，例如：u''测试中文显示''，代码文件编码使用utf-8 加上" # coding = utf-8 "一行。

matplotlib输出图象的中文显示问题

面向对象画图

matplotlib API包含有三层，Artist层处理所有的高层结构，例如处理图表、文字和曲线等的绘制和布局。通常我们只和Artist打交道，而不需要关心底层的绘制细节。

直接使用Artists创建图表的标准流程如下：

创建Figure对象

用Figure对象创建一个或者多个Axes或者Subplot对象

调用Axies等对象的方法创建各种简单类型的Artists

import matplotlib.pyplot as plt

X1 = range(0, 50) Y1 = [num**2 for num in X1] # y = x^2 X2 = [0, 1] Y2 = [0, 1] # y = x

Fig = plt.figure(figsize=(8,4)) # Create a `figure' instance

Ax = Fig.add_subplot(111) # Create a `axes' instance in the figure

Ax.plot(X1, Y1, X2, Y2) # Create a Line2D instance in the axes

Fig.show()

Fig.savefig("test.pdf")

matplotlib还提供了一个名为pylab的模块，其中包括了许多NumPy和pyplot模块中常用的函数，方便用户快速进行计算和绘图，十分适合在IPython交互式环境中使用。这里使用下面的方式载入pylab模块：

>>> import pylab as pl

1 安装numpy和matplotlib

>>> import numpy

>>> numpy.__version__

>>> import matplotlib

>>> matplotlib.__version__

2 两种常用图类型：Line and scatter plots(使用plot()命令), histogram(使用hist()命令)

2.1 折线图&散点图 Line and scatter plots

2.1.1 折线图 Line plots(关联一组x和y值的直线)

import numpy as np

import pylab as pl

x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

pl.plot(x, y)

pl.show()

2.1.2 散点图 Scatter plots

把pl.plot(x, y)改成pl.plot(x, y, 'o')即可，下图的蓝色版本

2.2 美化 Making things look pretty

2.2.1 线条颜色 Changing the line color

红色：把pl.plot(x, y, 'o')改成pl.plot(x, y, ’or’)

2.2.2 线条样式 Changing the line style

虚线:plot(x,y, '--')

2.2.3 marker样式 Changing the marker style

蓝色星型markers：plot(x,y, ’b*’)

2.2.4 图和轴标题以及轴坐标限度 Plot and axis titles and limits

import numpy as np

import pylab as pl

x = [1, 2, 3, 4, 5]# Make an array of x values

y = [1, 4, 9, 16, 25]# Make an array of y values for each x value

pl.plot(x, y)# use pylab to plot x and y

pl.title(’Plot of y vs. x’)# give plot a title

pl.xlabel(’x axis’)# make axis labels

pl.ylabel(’y axis’)

pl.xlim(0.0, 7.0)# set axis limits

pl.ylim(0.0, 30.)

pl.show()# show the plot on the screen

2.2.5 在一个坐标系上绘制多个图 Plotting more than one plot on the same set of axes

做法是很直接的，依次作图即可:

import numpy as np

import pylab as pl

x1 = [1, 2, 3, 4, 5]# Make x, y arrays for each graph
y1 = [1, 4, 9, 16, 25]
x2 = [1, 2, 4, 6, 8]
y2 = [2, 4, 8, 12, 16]

pl.plot(x1, y1, ’r’)# use pylab to plot x and y
pl.plot(x2, y2, ’g’)

pl.title(’Plot of y vs. x’)# give plot a title

pl.xlabel(’x axis’)# make axis labels

pl.ylabel(’y axis’)

pl.xlim(0.0, 9.0)# set axis limits

pl.ylim(0.0, 30.)

pl.show()# show the plot on the screen

2.2.6 图例 Figure legends

pl.legend((plot1, plot2), (’label1, label2’), 'best’, numpoints=1)

其中第三个参数表示图例放置的位置:'best’‘upper right’, ‘upper left’, ‘center’, ‘lower left’, ‘lower right’.

如果在当前figure里plot的时候已经指定了label，如plt.plot(x,z,label="cos(x2)")，直接调用plt.legend()就可以了哦。

import numpy as np
import pylab as pl

x1 = [1, 2, 3, 4, 5]# Make x, y arrays for each graph
y1 = [1, 4, 9, 16, 25]
x2 = [1, 2, 4, 6, 8]
y2 = [2, 4, 8, 12, 16]

plot1 = pl.plot(x1, y1, ’r’)# use pylab to plot x and y : Give your plots names
plot2 = pl.plot(x2, y2, ’go’)

pl.title(’Plot of y vs. x’)# give plot a title
pl.xlabel(’x axis’)# make axis labels
pl.ylabel(’y axis’)

pl.xlim(0.0, 9.0)# set axis limits
pl.ylim(0.0, 30.)

pl.legend([plot1, plot2], (’red line’, ’green circles’), ’best’, numpoints=1) # make legend

pl.show()# show the plot on the screen

2.3 直方图 Histograms

import numpy as np
import pylab as pl

# make an array of random numbers with a gaussian distribution with
# mean = 5.0
# rms = 3.0
# number of points = 1000

data = np.random.normal(5.0, 3.0, 1000)

# make a histogram of the data array

pl.hist(data)

# make plot labels

pl.xlabel(’data’)

pl.show()

如果不想要黑色轮廓可以改为pl.hist(data, histtype=’stepfilled’)

2.3.1 自定义直方图bin宽度 Setting the width of the histogram bins manually

增加这两行

bins = np.arange(-5., 16., 1.) #浮点数版本的range

pl.hist(data, bins, histtype=’stepfilled’)

3 同一画板上绘制多幅子图 Plotting more than one axis per canvas

如果需要同时绘制多幅图表的话，可以是给figure传递一个整数参数指定图标的序号，如果所指定
序号的绘图对象已经存在的话，将不创建新的对象，而只是让它成为当前绘图对象。

fig1 = pl.figure(1)
pl.subplot(211)

subplot(211)把绘图区域等分为2行*1列共两个区域, 然后在区域1(上区域)中创建一个轴对象. pl.subplot(212)在区域2(下区域)创建一个轴对象。

import numpy as np
import pylab as pl

# Use numpy to load the data contained in the file
# ’fakedata.txt’ into a 2-D array called data
data = np.loadtxt(’fakedata.txt’)

# plot the first column as x, and second column as y
pl.plot(data[:,0], data[:,1], ’ro’)
pl.xlabel(’x’)
pl.ylabel(’y’)
pl.xlim(0.0, 10.)
pl.show()

4.2 写入数据到文件 Writing data to a text file

写文件的方法也很多，这里只介绍一种可用的写入文本文件的方法，更多的可以参考官方文档。

import numpy as np
# Let’s make 2 arrays (x, y) which we will write to a file
# x is an array containing numbers 0 to 10, with intervals of 1

x = np.arange(0.0, 10., 1.)

# y is an array containing the values in x, squared

y = x*x
print ’x = ’, x
print ’y = ’, y

# Now open a file to write the data to
# ’w’ means open for ’writing’
file = open(’testdata.txt’, ’w’)
# loop over each line you want to write to file
for i in range(len(x)):
# make a string for each line you want to write
# ’ ’ means ’tab’
# ’ ’ means ’newline’
# ’str()’ means you are converting the quantity in brackets to a string type
txt = str(x[i]) + ’ ’ + str(y[i]) + ’ ’
# write the txt to the file
file.write(txt)
# Close your file
file.close()

图例1

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

# Example data
people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
y_pos = np.arange(len(people))
performance = 3 + 10 * np.random.rand(len(people))
error = np.random.rand(len(people))

#barh(bottom, width, height=0.8, left=0, **kwargs)
plt.barh(y_pos, performance, xerr=error, height=0.8,align='center',alpha=0.4)
plt.yticks(y_pos, people)
plt.xlabel('Performance')
plt.title('How fast do you want to go today?')

plt.show()

图例 2

import numpy as np
import matplotlib.pyplot as plt
import pylab
from matplotlib.ticker import MaxNLocator

grade = 2
day = '2014-06-22' # Today in this year

numTests = 5
testNames = ['swap','memory', '/project', '/backup', '/root']
testMeta = ['', '', '', '','']
scores = [98,79, 39, 92,17]
lastweek_scores = ['97%','35%','86%','21%','70%']
#rankings = np.round(np.random.uniform(0, 1, numTests)*100, 0)
rankings = 3 + 10 * np.random.rand(numTests)

fig, ax1 = plt.subplots(figsize=(9, 7))
plt.subplots_adjust(left=0.115, right=0.88)
fig.canvas.set_window_title('Usage Chart')
pos = np.arange(numTests)+0.5 # Center bars on the Y-axis ticks
rects = ax1.barh(pos, scores, align='center', height=0.5, color='m')

ax1.axis([0, 100, 0, 5])
pylab.yticks(pos, testNames)
ax1.set_title('Server 18.32 Usage Chart')
plt.text(50, -0.5, 'date: ' + day,
horizontalalignment='center', size='small')

# Set the right-hand Y-axis ticks and labels and set X-axis tick marks at the
# deciles
ax2 = ax1.twinx()
ax2.plot([100, 100], [0, 5], 'white', alpha=0.1)
ax2.xaxis.set_major_locator(MaxNLocator(11))
xticks = pylab.setp(ax2, xticklabels=['0', '10', '20', '30', '40', '50', '60',
'70', '80', '90', '100'])
ax2.xaxis.grid(True, linestyle='--', which='major', color='grey',
alpha=0.25)
#Plot a solid vertical gridline to highlight the median position
plt.plot([50, 50], [0, 5], 'grey', alpha=0.25)

# Build up the score labels for the right Y-axis by first appending a carriage
# return to each string and then tacking on the appropriate meta information
# (i.e., 'laps' vs 'seconds'). We want the labels centered on the ticks, so if
# there is no meta info (like for pushups) then don't add the carriage return to
# the string

def withnew(i, scr):
if testMeta[i] != '':
return '%s ' % scr
else:
return scr

scoreLabels = [withnew(i, scr) for i, scr in enumerate(lastweek_scores)]
scoreLabels = [i+j for i, j in zip(scoreLabels, testMeta)]
# set the tick locations
ax2.set_yticks(pos)
# set the tick labels
ax2.set_yticklabels(scoreLabels)
# make sure that the limits are set equally on both yaxis so the ticks line up
ax2.set_ylim(ax1.get_ylim())

ax2.set_ylabel("Last Week's data",color='sienna')
#Make list of numerical suffixes corresponding to position in a list
# 0 1 2 3 4 5 6 7 8 9
suffixes = ['%', '%', '%', '%', '%', '%', '%', '%', '%', '%']
ax2.set_xlabel('Percentile Ranking Across ' + suffixes[grade]
+ ' Grade ' + 's')

# Lastly, write in the ranking inside each bar to aid in interpretation
for rect in rects:
# Rectangle widths are already integer-valued but are floating
# type, so it helps to remove the trailing decimal point and 0 by
# converting width to int type
width = int(rect.get_width())

# Figure out what the last digit (width modulo 10) so we can add
# the appropriate numerical suffix (e.g., 1st, 2nd, 3rd, etc)
lastDigit = width % 10
# Note that 11, 12, and 13 are special cases
if (width == 11) or (width == 12) or (width == 13):
suffix = 'th'
else:
suffix = suffixes[lastDigit]

rankStr = str(width) + suffix
if (width < 5): # The bars aren't wide enough to print the ranking inside
xloc = width + 1 # Shift the text to the right side of the right edge
clr = 'black' # Black against white background
align = 'left'
else:
xloc = 0.98*width # Shift the text to the left side of the right edge
clr = 'white' # White on magenta
align = 'right'

# Center the text vertically in the bar
yloc = rect.get_y()+rect.get_height()/2.0
ax1.text(xloc, yloc, rankStr, horizontalalignment=align,
verticalalignment='center', color=clr, weight='bold')

plt.show()

python结合matplotlib，统计svn的代码提交量

安装所需的依赖包

yum install -y numpy matplotlib

matplotlib.pyplot是一些命令行风格函数的集合，使matplotlib以类似于MATLAB的方式工作。每个pyplot函数对一幅图片(figure)做一些改动：比如创建新图片，在图片创建一个新的作图区域(plotting area)，在一个作图区域内画直线，给图添加标签(label)等。matplotlib.pyplot是有状态的，亦即它会保存当前图片和作图区域的状态，新的作图函数会作用在当前图片的状态基础之上。

import matplotlib.pyplot as plt

plt.plot([1,2,3,4])

plt.ylabel('some numbers')

plt.show()

上图的X坐标是1-3，纵坐标是1-4，这是因为如果你只提供给plot()函数一个列表或数组，matplotlib会认为这是一串Y值(Y向量)，并且自动生成X值(X向量)。而Python一般是从0开始计数的，所以X向量有和Y向量一样的长度(此处是4)，但是是从0开始，所以X轴的值为[0,1,2,3]。

也可以给plt.plot()函数传递多个序列(元组或列表)，每两个序列是一个X,Y向量对，在图中构成一条曲线，这样就会在同一个图里存在多条曲线。

为了区分同一个图里的多条曲线，可以为每个X,Y向量对指定一个参数来标明该曲线的表现形式，默认的参数是'b-'，亦即蓝色的直线，如果想用红色的圆点来表示这条曲线，可以：

import matplotlib.pyplot as plt

plt.plot([1,2,3,4],[1,4,9,16],'ro')

plt.axis([0,6,0,20])

axis()函数接受形如[xmin,xmax,ymin,ymax]的参数，指定了X,Y轴坐标的范围。

matplotlib不仅仅可以使用序列(列表和元组)作为参数，还可以使用numpy数组。实际上，所有的序列都被内在的转化为numpy数组。

import numpy as np
import matplotlib.pyplot as plt
t=np,arange(0.,5.,0.2)
plt.plot(t,t,'r--',t,t**2,'bs',t,t**3,'g^')

控制曲线的属性

曲线有许多我们可以设置的性质：曲线的宽度，虚线的风格，抗锯齿等等。有多种设置曲线属性的方法：

1.使用关键词参数：

plt.plot(x,y,linewidth=2.0)

2.使用Line2D实例的设置(Setter)方法。plot()返回的是曲线的列表，比如line1,line2=plot(x1,y1,x2,y2).我们取得plot()函数返回的曲线之后用Setter方法来设置曲线的属性。

line,=plt.plot(x,y,'-')

line.set)antialliased(False) #关闭抗锯齿

3.使用setp()命令：

lines=plt.plot(x1,y1,x2,y2)

plt.setp(lines,color='r',linewidth=2.0)

plt.setp(lines,'color','r','linewidth','2.0')

处理多个图和Axe

MATLAB和pyplot都有当前图和当前axe的概念。所有的作图命令都作用在当前axe。

函数gca()返回当前axe，gcf()返回当前图。

复制代码
import numpy as np
import matplotlib.pyplot as plt

def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)

t2 = np.arange(0.0, 5.0, 0.02)

plt.figure(1)

plt.subplot(211)

plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')

plt.subplot(212)

plt.plot(t2, np.cos(2*np.pi*t2), 'r--')

figure()命令是可选的，因为figure(1)会被默认创建，subplot(111)也会被默认创建。

subplot()命令会指定numrows,numcols,fignum，其中fignum的取值范围为从1到numrows*numcols。如果numrows*numcols小于10则subplot()命令中的逗号是可选的。所以subplot(2,1,1)与subplot(211)是完全一样的。

如果你想手动放置axe，而不是放置在矩形方格内，则可以使用axes()命令，其中的参数为axes([left,bottom,width,height])，每个参数的取值范围为(0,1)。

你可以使用多个figure()来创建多个图，每个图都可以有多个axe和subplot：

复制代码
import matplotlib.pyplot as plt
plt.figure(1) # the first figure
plt.subplot(211) # the first subplot in the first figure
plt.plot([1,2,3])
plt.subplot(212) # the second subplot in the first figure
plt.plot([4,5,6])

plt.figure(2)     # a second figure

plt.plot([4,5,6]) # creates a subplot(111) by default

plt.figure(1) # figure 1 current; subplot(212) still current

plt.subplot(211) # make subplot(211) in figure1 current

plt.title('Easy as 1,2,3') # subplot 211 title

复制代码
你可以使用clf()和cla()命令来清空当前figure和当前axe。

如果你创建了许多图，你需要显示的使用close()命令来释放该图所占用的内存，仅仅关闭显示在屏幕上的图是不会释放内存空间的。

处理文本

text()命令可以用来在任意位置上添加文本，xlabel(),ylabel(),title()可以用来在X轴，Y轴，标题处添加文本。

复制代码
import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)

plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$mu=100, sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)

每个text()命令都会返回一个matplotlib.text.Text实例，就像之前处理曲线一样，你可以通过使用setp()函数来传递关键词参数来定制文本的属性。

t=plt.xlabel('my data',fontsize=14,color='red')
在文本中使用数学表达式

matplotlib在任何文本中都接受Text表达式。

Tex表达式是有两个dollar符号环绕起来的,比如math-4cd9a23707.png的Tex表达式如下

plt.title(r'$sigma_i=15$')

用python的matplotlib画标准正态曲线

import math
import pylab as pl
import numpy as np
def gd(x,m,s):
left=1/(math.sqrt(2*math.pi)*s)
right=math.exp(-math.pow(x-m,2)/(2*math.pow(s,2)))
return left*right
def showfigure():
x=np.arange(-4,5,0.1)
y=[]
for i in x:
y.append(gd(i,0,1))
pl.plot(x,y)
pl.xlim(-4.0,5.0)
pl.ylim(-0.2,0.5)
#
ax = pl.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
#add param
label_f1 = "$mu=0, sigma=1$"
pl.text(2.5,0.3,label_f1,fontsize=15,verticalalignment="top",
horizontalalignment="left")
label_f2 = r"$f(x)=frac{1}{sqrt{2pi}sigma}exp(-frac{(x-mu)^2}{2sigma^2})$"
pl.text(1.5,0.4,label_f2,fontsize=15,verticalalignment="top"
,horizontalalignment="left")
pl.show()

python数据可视化matplotlib的使用

# -*- coding:UTF-8 -*-

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
from pylab import mpl

import sys
reload(sys)
sys.setdefaultencoding('utf8')

xmajorLocator = MultipleLocator(10* 1) #将x轴主刻度标签设置为10* 1的倍数
ymajorLocator = MultipleLocator(0.1* 1) #将y轴主刻度标签设置为0.1 * 1的倍数

# 设置中文字体
mpl.rcParams['font.sans-serif'] = ['SimHei']

# 导入文件数据
#data = np.loadtxt('test44.txt', delimiter=None, dtype=float )
#data = [[1,2],[3,4],[5,6]]
data = [[1,5,10,20,30,40,50,60,70,80,90,100],[0.0201,0.0262,0.0324,0.0295,0.0221,0.0258,0.0254,0.0299,0.0275,0.0299,0.0291,0.0328],
[0.0193,0.0254,0.0234,0.0684,0.0693,0.0803,0.1008,0.098,0.0947,0.0934,0.1971,0.2123],[0.0209,0.1176,0.2143,0.2295,0.4176,0.5258,0.6471,0.6484,0.8193,0.829,0.832,0.943]]

data = np.array(data)

# 截取数组数据

x = data[0] #时间
y = data[1] # 类别一的Y值
y2 = data[2] #类别二的Y值
y3 = data[3] #类别三的Y值

plt.figure(num=1, figsize=(8, 6))

ax = plt.subplot(111)
ax.xaxis.set_major_locator(xmajorLocator)
ax.yaxis.set_major_locator(ymajorLocator)
ax.xaxis.grid(True, which='major') #x坐标轴的网格使用主刻度
ax.yaxis.grid(True, which='major') #x坐标轴的网格使用主刻度

plt.xlabel('时间/t',fontsize='xx-large')#Valid font size are large, None, medium, smaller, small, x-large, xx-small, larger, x-small, xx-large
plt.ylabel('y-label',fontsize='xx-large')
plt.title('Title',fontsize='xx-large')
plt.xlim(0, 110)
plt.ylim(0, 1)

line1, = ax.plot(x, y, 'g.-',label="类别一",)

line2, = ax.plot(x,y2,'b*-',label="类别二",)

line3, = ax.plot(x,y3,'rD-',label="类别三",)

ax.legend((line1, line2,line3),('类别一','类别二','类别三'),loc=5) # loc可为1、2、3、4、5、6，分别为不同的位置
plt.show()

python matplotlib 生成x的三次方曲线图

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-100,100,100)
y = x**3
plt.figure(num=3,figsize=(8,5)) #num xuhao;figsize long width
l1=plt.plot(x,y,'p') # quta is to return name to plt.legend(handles)
plt.xlim((-100,100))
plt.ylim((-100000,100000))
plt.xlabel('X') #x zhou label
plt.ylabel('Y')
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none') ##don't display border
ax.xaxis.set_ticks_position('bottom') ##set x zhou
ax.yaxis.set_ticks_position('left')
ax.spines['bottom'].set_position(('data',0)) #y 0 postition is x position
ax.spines['left'].set_position(('data',0))
###tu li
# labels can just set one label just post one line
plt.legend(handles=l1,labels='y=x**3',loc='best') ##loc=location
plt.show()

python matplotlib 绘制三次函数图像

>>> from matplotlib import pyplot as pl
>>> import numpy as np
>>> from scipy import interpolate

>>> x = np.linspace(-10, 5, 100)
>>> y = -2*x**3 + 5*x**2 + 9
>>> pl.figure(figsize = (8, 4))

>>> pl.plot(x, y, color="blue", linewidth = 1.5)
[]
>>> pl.show()

pl.figure 设置绘图区大小

pl.plot 开始绘图, 并设置线条颜色, 以及线条宽度

pl.show 显示图像

python生成20个随机的DNA fasta格式文件

生成20个随机的文件，由于没有用到hash名字，文件名有可能会重复

每个文件中有30-50条序列每条序列的长度为70-120个碱基

import os
import random
import string

print (dir(string))

letter = string.ascii_letters

os.chdir("D:\")

bases = {1:"A", 2:"T", 3:"C", 4:"G"}

## Test random module , get random DNA base

Nth = random.randint(1,4)

print (bases[Nth])

## Create random DNA sequences

for i in range(20):
Number_of_Seq = random.randint(30,50)
filename = letter[i]
with open("Sequences"+filename +
str(Number_of_Seq)+ ".fasta", "w") as file_output:
for j in range(Number_of_Seq):
each_Seq=""
Rand_len = random.randint(70,120)
for k in range(Rand_len):
Nth = random.randint(1,4)
each_Seq += bases[Nth]

file_output.write(">seq_"+str(Number_of_Seq)+
"_"+str(Rand_len)+" ")
file_output.write(each_Seq+" ")
.lines.line2d>.figure.figure>

.lines.line2d>.figure.figure>
查看全文

相关阅读:
1分钟去除word文档编辑限制密码
 建行信用卡微信查询
 明目地黄丸
 发动机启停技术
 ORA-12170: TNS: 连接超时
 螃蟹放进冰箱冷冻保存前，要注意什么呢？
螃蟹要蒸多久
 总胆固醇偏高的注意措施及治疗方法
 codeforces 375D . Tree and Queries 启发式合并 || dfs序+莫队
 codeforces 374D. Inna and Sequence 线段树

原文地址：https://www.cnblogs.com/c-x-a/p/8600202.html

pandas 读写sql数据库

补充：