一、从文件读入
pandas支持文件类型,CSV, general delimited text files, Excel files, json, html tables, HDF5 and STATA。
1.Comma-separated value (CSV) files can be read using read_csv,
>>> from pandas import read_csv >>> csv_data = read_csv(’FTSE_1984_2012.csv’) >>> csv_data = csv_data.values >>> csv_data[:4] array([[’2012-02-15’, 5899.9, 5923.8, 5880.6, 5892.2, 801550000L, 5892.2], [’2012-02-14’, 5905.7, 5920.6, 5877.2, 5899.9, 832567200L, 5899.9], [’2012-02-13’, 5852.4, 5920.1, 5852.4, 5905.7, 643543000L, 5905.7], [’2012-02-10’, 5895.5, 5895.5, 5839.9, 5852.4, 948790200L, 5852.4]], dtype=object)
2、Excel files
使用read_excel函数,需要两个参数,一个文件名,一个sheet名。默认会省略掉第一行数据。
from pandas import read_excel
exceldate=read_excel('score.xlsx','Sheet1');
exceldate=exceldate.values
print type(exceldate)
print exceldate.shape
exceldate[0,:]
3、STATA files
>>> from pandas import read_stata
>>> stata_data = read_stata(’FTSE_1984_2012.dta’)
>>> stata_data = stata_data.values
>>> stata_data[:4,:2]
array([[ 0.00000000e+00, 4.09540000e+04],
[ 1.00000000e+00, 4.09530000e+04],
[ 2.00000000e+00, 4.09520000e+04],
[ 3.00000000e+00, 4.09490000e+04]])
4、不使用pandas来读取文件内容
对于Excel Files使用xlrd来读取,xlrd,负责读取excel,xlwt,负责写excel模块。
import xlrd wb = xlrd.open_workbook('score.xlsx'); sheetnames=wb.sheet_names() sheet = wb.sheet_by_name(sheetnames[0]) exceldate=[] for i in xrange(sheet.nrows): exceldate.append(sheet.row_values(i)); print '%d rows,'%len(exceldate),'%d columns'%len(exceldate[0]) adate=np.empty(len(exceldate)) for i in xrange(len(exceldate)): adate[i]=exceldate[i][0]; print adate.shape print adate 5 rows, 7 columns (5L,) [ 12. 15. 51. 65. 45.]
二、保存数据
1、numpy专有格式保存数据npz,
savez_compressed会在保存数据时进行压缩。
x=np.arange(10) y=np.zeros((100,100)) np.savez_compressed('date1',x,y) date=np.load('date1.npz') print date['arr_0'] np.savez_compressed('date2',x=x,ontherDate=y) date2=np.load('date2.npz'); print date2['x'] [0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4 5 6 7 8 9]
2、保存为csv文件,使用np.savatxt方法。
注意:pandas里面的read_csv和read_excel方法都会省略第一行,默认是标题
from pandas import read_csv x=np.random.randn(10,10); np.savetxt('date1.csv',x,delimiter=',') date=read_csv('date1.csv') date=date.values print x.shape print date.shape print x print date[0] (10L, 10L) (9L, 10L) [[ 1.77015084 -1.80554159 1.28403537 0.2009891 0.26291606 0.08448012 1.66140115 0.17728159 0.88959083 0.56291309] [ 0.58518743 1.44373927 0.54993558 0.01054313 0.59017053 -0.35133822 -0.42014888 -0.3079049 0.94373013 1.35954942] [-0.54426668 0.04622141 -0.66634713 0.45793767 -0.63685413 0.99976971 -0.39326027 -0.93163258 -0.79656236 0.72966639] [-0.39963295 -1.79753906 0.32433359 0.82947734 1.54987769 2.77115954 0.22080235 -0.60776182 2.57004264 0.59011931] [-0.19130441 -0.12465107 1.40619987 -0.61049826 -0.39827838 -1.25752483 -0.91058091 0.36020845 -0.10908816 1.45316786] [ 0.47408008 -0.28463786 -1.92910625 -0.50288128 -0.06007105 -0.12408027 -0.84164768 -0.42411635 0.69954835 -0.41664136] [ 0.42336169 0.23625584 1.11511232 -1.08894244 -0.79186067 -1.71206423 -0.02372556 -0.71933255 -1.33979181 -0.41698675] [-0.06578197 1.04509307 0.1279905 1.03185255 1.15403322 -0.18110707 -0.60340346 -0.33581049 0.02637558 -1.06997906] [-1.84514777 1.19496964 -1.70550266 1.30863094 -1.48711603 1.55044598 0.64066525 0.39086305 0.15076543 1.42276444] [-1.23244051 -0.03354092 0.84729912 0.15254869 -0.33402971 -0.59486921 -0.28056973 -1.72189462 -0.0156615 -1.22688771]] [ 0.58518743 1.44373927 0.54993558 0.01054313 0.59017053 -0.35133822 -0.42014888 -0.3079049 0.94373013 1.35954942]
三、数字精度
任何系统都有数字精度,在python中,数字精度是2.2204 × 10^−16 ,当两个数相差小于这个数时,会认为是相同的两个数。表示的最小和最大数是−1.7976×10^308和 1.7976×10^308.
x1=1 eps=np.finfo(float).eps x2=x1+eps/10 x1==x2 Out[4]: True