zoukankan      html  css  js  c++  java
  • 读书笔记4数据的读入和保存

    一、从文件读入

    pandas支持文件类型,CSV, general delimited text files, Excel files, json, html tables, HDF5 and STATA。

    1.Comma-separated value (CSV) files can be read using read_csv,

    >>> from pandas import read_csv
    >>> csv_data = read_csv(’FTSE_1984_2012.csv’)
    >>> csv_data = csv_data.values
    >>> csv_data[:4]
    array([[’2012-02-15’, 5899.9, 5923.8, 5880.6, 5892.2, 801550000L, 5892.2],
    [’2012-02-14’, 5905.7, 5920.6, 5877.2, 5899.9, 832567200L, 5899.9],
    [’2012-02-13’, 5852.4, 5920.1, 5852.4, 5905.7, 643543000L, 5905.7],
    [’2012-02-10’, 5895.5, 5895.5, 5839.9, 5852.4, 948790200L, 5852.4]], dtype=object)

    2、Excel files

    使用read_excel函数,需要两个参数,一个文件名,一个sheet名。默认会省略掉第一行数据。

    from pandas import read_excel
    exceldate=read_excel('score.xlsx','Sheet1');
    exceldate=exceldate.values
    print type(exceldate)
    print exceldate.shape
    exceldate[0,:]

    <type 'numpy.ndarray'>
    (4L, 7L)
    
    Out[6]:
    array([15, 65, 45, 48, 43, 26, 35], dtype=int64)

    3、STATA files

    >>> from pandas import read_stata
    >>> stata_data = read_stata(’FTSE_1984_2012.dta’)
    >>> stata_data = stata_data.values
    >>> stata_data[:4,:2]
    array([[ 0.00000000e+00, 4.09540000e+04],
    [ 1.00000000e+00, 4.09530000e+04],
    [ 2.00000000e+00, 4.09520000e+04],
    [ 3.00000000e+00, 4.09490000e+04]])

    4、不使用pandas来读取文件内容

    对于Excel Files使用xlrd来读取,xlrd,负责读取excel,xlwt,负责写excel模块。

    import xlrd
    wb = xlrd.open_workbook('score.xlsx');
    sheetnames=wb.sheet_names()
    sheet = wb.sheet_by_name(sheetnames[0])
    exceldate=[]
    for i in xrange(sheet.nrows):
        exceldate.append(sheet.row_values(i));
    print '%d rows,'%len(exceldate),'%d columns'%len(exceldate[0])
    ​
    adate=np.empty(len(exceldate))
    for i in xrange(len(exceldate)):
        adate[i]=exceldate[i][0];
    print adate.shape
    print adate
    ​
    ​
    5 rows, 7 columns
    (5L,)
    [ 12.  15.  51.  65.  45.]

    二、保存数据

    1、numpy专有格式保存数据npz,

    savez_compressed会在保存数据时进行压缩。
    x=np.arange(10)
    y=np.zeros((100,100))
    np.savez_compressed('date1',x,y)
    date=np.load('date1.npz')
    print date['arr_0']
    ​
    np.savez_compressed('date2',x=x,ontherDate=y)
    date2=np.load('date2.npz');
    print date2['x']
    ​
    [0 1 2 3 4 5 6 7 8 9]
    [0 1 2 3 4 5 6 7 8 9]

    2、保存为csv文件,使用np.savatxt方法。

    注意:pandas里面的read_csv和read_excel方法都会省略第一行,默认是标题

    from pandas import read_csv
    x=np.random.randn(10,10);
    np.savetxt('date1.csv',x,delimiter=',')
    date=read_csv('date1.csv')
    date=date.values
    ​
    print x.shape
    print date.shape
    print x
    print date[0]
    (10L, 10L)
    (9L, 10L)
    [[ 1.77015084 -1.80554159  1.28403537  0.2009891   0.26291606  0.08448012
       1.66140115  0.17728159  0.88959083  0.56291309]
     [ 0.58518743  1.44373927  0.54993558  0.01054313  0.59017053 -0.35133822
      -0.42014888 -0.3079049   0.94373013  1.35954942]
     [-0.54426668  0.04622141 -0.66634713  0.45793767 -0.63685413  0.99976971
      -0.39326027 -0.93163258 -0.79656236  0.72966639]
     [-0.39963295 -1.79753906  0.32433359  0.82947734  1.54987769  2.77115954
       0.22080235 -0.60776182  2.57004264  0.59011931]
     [-0.19130441 -0.12465107  1.40619987 -0.61049826 -0.39827838 -1.25752483
      -0.91058091  0.36020845 -0.10908816  1.45316786]
     [ 0.47408008 -0.28463786 -1.92910625 -0.50288128 -0.06007105 -0.12408027
      -0.84164768 -0.42411635  0.69954835 -0.41664136]
     [ 0.42336169  0.23625584  1.11511232 -1.08894244 -0.79186067 -1.71206423
      -0.02372556 -0.71933255 -1.33979181 -0.41698675]
     [-0.06578197  1.04509307  0.1279905   1.03185255  1.15403322 -0.18110707
      -0.60340346 -0.33581049  0.02637558 -1.06997906]
     [-1.84514777  1.19496964 -1.70550266  1.30863094 -1.48711603  1.55044598
       0.64066525  0.39086305  0.15076543  1.42276444]
     [-1.23244051 -0.03354092  0.84729912  0.15254869 -0.33402971 -0.59486921
      -0.28056973 -1.72189462 -0.0156615  -1.22688771]]
    [ 0.58518743  1.44373927  0.54993558  0.01054313  0.59017053 -0.35133822
     -0.42014888 -0.3079049   0.94373013  1.35954942]

     三、数字精度

    任何系统都有数字精度,在python中,数字精度是2.2204 × 10^16 ,当两个数相差小于这个数时,会认为是相同的两个数。表示的最小和最大数是1.7976×10^308和 1.7976×10^308.

    x1=1
    eps=np.finfo(float).eps
    x2=x1+eps/10
    x1==x2
    ​
    Out[4]:
    True
  • 相关阅读:
    掌握 ActionResult
    EF 的 霸气配置
    MVC 3 数据验证 Model Validation 详解
    Entity Framework 插入数据 解决主键非自增问题
    线程池(C#)
    socket 基础学习
    oracle创建job方法
    C# 任意类型数据转JSON格式
    JDBC(连接数据库的四个主要步骤)
    oracle 存储过程详细介绍(创建,删除存储过程,参数传递等)
  • 原文地址:https://www.cnblogs.com/zhaopengcheng/p/5403597.html
Copyright © 2011-2022 走看看