zoukankan      html  css  js  c++  java
  • Pandas之read_excel()和to_excel()函数解析

    read_excel()

    加载函数为read_excel(),其具体参数如下。

    read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,names=None, parse_cols=None, parse_dates=False,date_parser=None,na_values=None,thousands=None, convert_float=True, has_index_names=None, converters=None,dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)
    

    常用参数解析:

    • io : string, path object ; excel 路径。

    • sheetname : string, int, mixed list of strings/ints, or None, default 0 返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe

    • header : int, list of ints, default 0 指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None

    • skiprows : list-like,Rows to skip at the beginning,省略指定行数的数据

    • skip_footer : int,default 0, 省略从尾部数的int行数据

    • index_col : int, list of ints, default None指定列为索引列,也可以使用u”strings”

    • names : array-like, default None, 指定列的名字。

    数据源:

    sheet1:
    ID  NUM-1   NUM-2   NUM-3
    36901   142 168 661
    36902   78  521 602
    36903   144 600 521
    36904   95  457 468
    36905   69  596 695
    
    sheet2:
    ID  NUM-1   NUM-2   NUM-3
    36906   190 527 691
    36907   101 403 470
    

    (1)函数原型

    basestation ="F://pythonBook_PyPDAM/data/test.xls"
    data = pd.read_excel(basestation)
    print data
    

    输出:是一个dataframe

          ID  NUM-1  NUM-2  NUM-3
    0  36901    142    168    661
    1  36902     78    521    602
    2  36903    144    600    521
    3  36904     95    457    468
    4  36905     69    596    695
    

    (2) sheetname参数:返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe

    data_1 = pd.read_excel(basestation,sheetname=[0,1])
    print data_1
    print type(data_1)
    

    输出:dict of dataframe

    OrderedDict([(0,       ID  NUM-1  NUM-2  NUM-3
    0  36901    142    168    661
    1  36902     78    521    602
    2  36903    144    600    521
    3  36904     95    457    468
    4  36905     69    596    695), 
    (1,       ID  NUM-1  NUM-2  NUM-3
    0  36906    190    527    691
    1  36907    101    403    470)])
    

    (3)header参数:指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None ,注意这里还有列名的一行。

    
    data = pd.read_excel(basestation,header=None)
    print data
    输出:
           0      1      2      3
    0     ID  NUM-1  NUM-2  NUM-3
    1  36901    142    168    661
    2  36902     78    521    602
    3  36903    144    600    521
    4  36904     95    457    468
    5  36905     69    596    695
    
    data = pd.read_excel(basestation,header=[3])
    print data
    输出:
       36903  144    600    521  
    0  36904     95    457    468
    1  36905     69    596    695
    

    (4)skiprows 参数:省略指定行数的数据

    data = pd.read_excel(basestation,skiprows = [1])
    print data
    输出:
          ID  NUM-1  NUM-2  NUM-3
    0  36902     78    521    602
    1  36903    144    600    521
    2  36904     95    457    468
    3  36905     69    596    695
    

    (5)skip_footer参数:省略从尾部数的int行的数据

    data = pd.read_excel(basestation, skip_footer=3)
    print data
    输出:
          ID  NUM-1  NUM-2  NUM-3
    0  36901    142    168    661
    1  36902     78    521    602
    

    (6)index_col参数:指定列为索引列,也可以使用u”strings”

    data = pd.read_excel(basestation, index_col="NUM-3")
    print data
    输出:
              ID  NUM-1  NUM-2
    NUM-3                     
    661    36901    142    168
    602    36902     78    521
    521    36903    144    600
    468    36904     95    457
    695    36905     69    596
    

    (7)names参数: 指定列的名字。

    data = pd.read_excel(basestation,names=["a","b","c","e"])
    print data
           a    b    c    e
    0  36901  142  168  661
    1  36902   78  521  602
    2  36903  144  600  521
    3  36904   95  457  468
    4  36905   69  596  695
    

    具体参数如下

    >>> print help(pandas.read_excel)
    Help on function read_excel in module pandas.io.excel:
    
    read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)
        Read an Excel table into a pandas DataFrame
    
        Parameters
        ----------
        io : string, path object (pathlib.Path or py._path.local.LocalPath),
            file-like object, pandas ExcelFile, or xlrd workbook.
            The string could be a URL. Valid URL schemes include http, ftp, s3,
            and file. For file URLs, a host is expected. For instance, a local
            file could be file://localhost/path/to/workbook.xlsx
        sheetname : string, int, mixed list of strings/ints, or None, default 0
    
            Strings are used for sheet names, Integers are used in zero-indexed
            sheet positions.
    
            Lists of strings/integers are used to request multiple sheets.
    
            Specify None to get all sheets.
    
            str|int -> DataFrame is returned.
            list|None -> Dict of DataFrames is returned, with keys representing
            sheets.
    
            Available Cases
    
            * Defaults to 0 -> 1st sheet as a DataFrame
            * 1 -> 2nd sheet as a DataFrame
            * "Sheet1" -> 1st sheet as a DataFrame
            * [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
            * None -> All sheets as a dictionary of DataFrames
    
        header : int, list of ints, default 0
            Row (0-indexed) to use for the column labels of the parsed
            DataFrame. If a list of integers is passed those row positions will
            be combined into a ``MultiIndex``
        skiprows : list-like
            Rows to skip at the beginning (0-indexed)
        skip_footer : int, default 0
            Rows at the end to skip (0-indexed)
        index_col : int, list of ints, default None
            Column (0-indexed) to use as the row labels of the DataFrame.
            Pass None if there is no such column.  If a list is passed,
            those columns will be combined into a ``MultiIndex``.  If a
            subset of data is selected with ``parse_cols``, index_col
            is based on the subset.
        names : array-like, default None
            List of column names to use. If file contains no header row,
            then you should explicitly pass header=None
        converters : dict, default None
            Dict of functions for converting values in certain columns. Keys can
            either be integers or column labels, values are functions that take one
            input argument, the Excel cell content, and return the transformed
            content.
        dtype : Type name or dict of column -> type, default None
            Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
            Use `object` to preserve data as stored in Excel and not interpret dtype.
            If converters are specified, they will be applied INSTEAD
            of dtype conversion.
    
            .. versionadded:: 0.20.0
    
        true_values : list, default None
            Values to consider as True
    
            .. versionadded:: 0.19.0
    
        false_values : list, default None
            Values to consider as False
    
            .. versionadded:: 0.19.0
    
        parse_cols : int or list, default None
            * If None then parse all columns,
            * If int then indicates last column to be parsed
            * If list of ints then indicates list of column numbers to be parsed
            * If string then indicates comma separated list of Excel column letters and
              column ranges (e.g. "A:E" or "A,C,E:F").  Ranges are inclusive of
              both sides.
        squeeze : boolean, default False
            If the parsed data only contains one column then return a Series
        na_values : scalar, str, list-like, or dict, default None
            Additional strings to recognize as NA/NaN. If dict passed, specific
            per-column NA values. By default the following values are interpreted
            as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',
        '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.
        thousands : str, default None
            Thousands separator for parsing string columns to numeric.  Note that
            this parameter is only necessary for columns stored as TEXT in Excel,
            any numeric columns will automatically be parsed, regardless of display
            format.
        keep_default_na : bool, default True
            If na_values are specified and keep_default_na is False the default NaN
            values are overridden, otherwise they're appended to.
        verbose : boolean, default False
            Indicate number of NA values placed in non-numeric columns
        engine: string, default None
            If io is not a buffer or path, this must be set to identify io.
            Acceptable values are None or xlrd
        convert_float : boolean, default True
            convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric
            data will be read in as floats: Excel stores all numbers as floats
            internally
        has_index_names : boolean, default None
            DEPRECATED: for version 0.17+ index names will be automatically
            inferred based on index_col.  To read Excel output from 0.16.2 and
            prior that had saved index names, use True.
    
        Returns
    

    to_excel()

    存储函数为pd.DataFrame.to_excel(),注意,必须是DataFrame写入excel, 即Write DataFrame to an excel sheet。其具体参数如下:

    to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None,
    inf_rep='inf', verbose=True, freeze_panes=None)
    

    常用参数解析

    • excel_writer : string or ExcelWriter object File path or existing ExcelWriter目标路径

    • sheet_name : string, default ‘Sheet1’ Name of sheet which will contain DataFrame,填充excel的第几页

    • na_rep : string, default ”,Missing data representation 缺失值填充

    • float_format : string, default None Format string for floating point numbers

    • columns : sequence, optional,Columns to write 选择输出的的列。

    • header : boolean or list of string, default True Write out column names. If a list of string is given it is assumed to be aliases for the column names

    • index : boolean, default True,Write row names (index)

    • index_label : string or sequence, default None, Column label for index column(s) if desired. If None is given, andheader and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

    • startrow :upper left cell row to dump data frame

    • startcol :upper left cell column to dump data frame

    • engine : string, default None ,write engine to use - you can also set this via the options,io.excel.xlsx.writer, io.excel.xls.writer, andio.excel.xlsm.writer.

    • merge_cells : boolean, default True Write MultiIndex and Hierarchical Rows as merged cells.

    • encoding: string, default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.

    • inf_rep : string, default ‘inf’ Representation for infinity (there is no native representation for infinity in Excel)

    • freeze_panes : tuple of integer (length 2), default None Specifies the one-based bottommost row and rightmost column that is to be frozen

    数据源:

        ID  NUM-1   NUM-2   NUM-3
    0   36901   142 168 661
    1   36902   78  521 602
    2   36903   144 600 521
    3   36904   95  457 468
    4   36905   69  596 695
    5   36906   165 453 
    
    加载数据:
    basestation ="F://python/data/test.xls"
    basestation_end ="F://python/data/test_end.xls"
    data = pd.read_excel(basestation)
    

    (1)参数excel_writer,输出路径。

    data.to_excel(basestation_end)
    输出:
        ID  NUM-1   NUM-2   NUM-3
    0   36901   142 168 661
    1   36902   78  521 602
    2   36903   144 600 521
    3   36904   95  457 468
    4   36905   69  596 695
    5   36906   165 453 
    

    (2)sheet_name,将数据存储在excel的那个sheet页面。

    data.to_excel(basestation_end,sheet_name="sheet2")
    

    (3)na_rep,缺失值填充

    data.to_excel(basestation_end,na_rep="NULL")
    输出:
        ID  NUM-1   NUM-2   NUM-3
    0   36901   142 168 661
    1   36902   78  521 602
    2   36903   144 600 521
    3   36904   95  457 468
    4   36905   69  596 695
    5   36906   165 453 NULL
    

    (4)colums参数: sequence, optional,Columns to write 选择输出的的列。

    data.to_excel(basestation_end,columns=["ID"])
    输出
        ID
    0   36901
    1   36902
    2   36903
    3   36904
    4   36905
    5   36906
    

    (5)header 参数: boolean or list of string,默认为True,可以用list命名列的名字。header = False 则不输出题头

    data.to_excel(basestation_end,header=["a","b","c","d"])
    输出:
        a   b   c   d
    0   36901   142 168 661
    1   36902   78  521 602
    2   36903   144 600 521
    3   36904   95  457 468
    4   36905   69  596 695
    5   36906   165 453 
    
    
    data.to_excel(basestation_end,header=False,columns=["ID"])
    header = False 则不输出题头
    输出:
    0   36901
    1   36902
    2   36903
    3   36904
    4   36905
    5   36906
    

    (6)index : boolean, default True Write row names (index)
    默认为True,显示index,当index=False 则不显示行索引(名字)。
    index_label : string or sequence, default None
    设置索引列的列名。

    data.to_excel(basestation_end,index=False)
    输出:
    ID  NUM-1   NUM-2   NUM-3
    36901   142 168 661
    36902   78  521 602
    36903   144 600 521
    36904   95  457 468
    36905   69  596 695
    36906   165 453 
    
    data.to_excel(basestation_end,index_label=["f"])
    输出:
    f   ID  NUM-1   NUM-2   NUM-3
    0   36901   142 168 661
    1   36902   78  521 602
    2   36903   144 600 521
    3   36904   95  457 468
    4   36905   69  596 695
    5   36906   165 453 
    

    来源于https://blog.csdn.net/tongxinzhazha/article/details/78796952

  • 相关阅读:
    ECMAScript 6 基础入门
    软件历史版本存档及下载
    arduino 编程基础
    生活中的实验 —— 家庭电路
    电子元件 —— 继电器
    电与磁 —— 电磁铁
    windows cmd 命令行 —— 进程与服务
    计算机硬件、摄影设备、物质、材料英语
    DHCP服务器备份、还原、迁移
    SVN同步
  • 原文地址:https://www.cnblogs.com/hankleo/p/11426784.html
Copyright © 2011-2022 走看看