zoukankan      html  css  js  c++  java
  • Python使用xlrd、pandas包从Excel读取数据

    #coding=utf-8
    # pip install xlrd
    
    import xlrd
    def read_from_xls(filepath,index_col_list):
        #filepath:读取文件路径,例如:filepath = r'D:/Python_workspace/test.xlsx'
      #index_col_list:读取列的索引列表,例如第一、二、三、四列为:[1,2,3,4]
        # 设置GBK编码
        xlrd.Book.encoding = "gbk"
        rb = xlrd.open_workbook(filepath)
        #print(rb)
    
        sheet = rb.sheet_by_index(0) #表示Excel的第一个Sheet
        nrows = sheet.nrows
        data_tmp_x = []  #例如数据为x,y,z坐标数据
        data_tmp_y = []
        data_tmp_z = []
        for index_col in index_col_list: #依次选择第index_col列
            for i in range(nrows):
                tt=i+1   #读取第tt行,除去第一行的列名
                if tt >= nrows:
                    break
                else:
                    tmp = float(sheet.cell_value(tt,index_col)) #读取第几行第几列的数据内容
                    if index_col == 2:
                        data_tmp_x.append(tmp)
                    elif index_col == 3:
                        data_tmp_y.append(tmp)
                    elif index_col == 4:
                        data_tmp_z.append(tmp)
        data_tmp = np.mat([data_tmp_x,data_tmp_y,data_tmp_z])
        return data_tmp
    
    # 使用pandas读取excel
    # filepath: xlsx文件路径名
    import pandas as pd
    data = pd.read_excel(filepath)
    province_name = data['province'].values.tolist() # province为列名,结果形成列表
    province_people = data['count'].values.tolist()

     -------- pandas读取excel —— pd.read_excel --------

    部分参数说明:

    def read_excel(io,
                   sheet_name=0,
                   header=0,
                   names=None,
                   index_col=None,
                   usecols=None,
                   squeeze=False,
                   dtype=None,
                   engine=None,
                   converters=None,
                   true_values=None,
                   false_values=None,
                   skiprows=None,
                   nrows=None,
                   na_values=None,
                   parse_dates=False,
                   date_parser=None,
                   thousands=None,
                   comment=None,
                   skipfooter=0,
                   convert_float=True,
                   **kwds)

    io:excel文件路径

    sheet_name:string, int, mixed list of strings/ints, or None, default 0,sheet表名

        * Defaults to 0 -> 1st sheet as a DataFrame
        * 1 -> 2nd sheet as a DataFrame
        * "Sheet1" -> 1st sheet as a DataFrame
        * [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
        * None -> All sheets as a dictionary of DataFrames

    header:指定作为列名的行,默认为0,即取第一行作为列名;若数据不含列名,则设定 header = None

    names:可用列表等参数指定列名序列,如果没有列名,则需要先设置 header=None;如果只有一列,需要设置为列表形式,例如:['第一列'],否则会出现错误:TypeError: Index(...) must be called with a collection of some kind

    names : array-like, default None
        List of column names to use. If file contains no header row,
        then you should explicitly pass header=None

    index_col:以某一列作为行标签,也就是行索引

    skiprows:从头开始跳过的行数,可以传列表

    skipfooter:省略从末尾开始的行数

    na_values:识别NA/NaN数据,并替换为该值

    na_values : scalar, str, list-like, or dict, default None
        Additional strings to recognize as NA/NaN. If dict passed, specific
        per-column NA values. By default the following values are interpreted
        as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), 70, subsequent_indent="    ") + """'.

    squeeze:当传入数据只有一列时,返回序列Series,而不是Dataframe数据

    squeeze : boolean, default False
        If the parsed data only contains one column then return a Series

    nrows:要解析的行数

    nrows : int, default None
        Number of rows to parse
    
        .. versionadded:: 0.23.0
    ## 欢迎有错误进行指正,也可交流改进
  • 相关阅读:
    在ACCESS中LIKE的用法
    pip 在windows下的更新升级
    NAS、SAN、DAS 说明
    RAID 工作模式
    Linux mail 邮件发送
    Linux 邮件服务搭建
    HA 脑裂原理
    Tomcat 工作原理
    Nagios 工作原理
    Nginx 工作原理
  • 原文地址:https://www.cnblogs.com/qi-yuan-008/p/11672761.html
Copyright © 2011-2022 走看看