zoukankan      html  css  js  c++  java
  • hdf5文件、tqdm模块、nunique、read_csv、sort_values、astype、fillna

    pandas.DataFrame.to_hdf(self, path_or_buf, key, **kwargs):

    Hierarchical Data Format (HDF) ,to add another DataFrame or Series to an existing HDF file, please use append mode and a different a key.

    df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},  index=['a', 'b', 'c'])
    df.to_hdf('data.h5', key='df', mode='w', format='table')
    # format : {‘fixed’, ‘table’}, default ‘fixed’
    # ‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable
    # ‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
    s
    = pd.Series([1, 2, 3, 4]) s.to_hdf('data.h5', key='s') pd.read_hdf('data.h5', 'df') pd.read_hdf('data.h5', 's')

    tqdm模块显示进度条:

    tqdm(self, iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, gui=False, **kwargs)

    iterable : iterable, optional

    total : int, optional. The number of expected iterations. If unspecified, len(iterable) is used if possible. 

    for x in tqdm(train_df['request_timestamp'].values,total=len(train_df)):
        localtime=time.localtime(x)
        wday.append(localtime[6])
        hour.append(localtime[3])

     https://lorexxar.cn/2016/07/21/python-tqdm/

    https://tqdm.github.io/docs/tqdm/

    pandas.DataFrame.nuniquehttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nunique.html

    DataFrame.nunique(selfaxis=0dropna=True)

    Count distinct observations over requested axis. Return Series with number of distinct observations. Can ignore NaN values.

    >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 1, 1]})
    >>> df
       A  B
    0  1  1
    1  2  1
    2  3  1
    >>> df.nunique()
    A    3
    B    1
    dtype: int64
    >>> df.nunique(axis=1)
    0    1
    1    2
    2    2
    dtype: int64

    pandas.read_csv:

    pandas.read_csv(...)常见参数:

    sep str, default ‘,’

    header int, list of int, default ‘infer’. Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None

    names array-like, optional. List of column names to use.  Duplicates in this list are not allowed.

    df=pd.read_csv('data/testA/totalExposureLog.out', sep='	',names=['id','request_timestamp','position','uid','aid','imp_ad_size','bid','pctr','quality_ecpm','totalEcpm'])

    pandas.DataFrame.sort_values:

    DataFrame.sort_values(self, by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
    # axis这个参数的默认值为0,匹配的是index,跨行进行排序,当axis=1时,匹配的是columns,跨列进行排序
    # by这个参数要求传入一个字符或者是一个字符列表,用来指定按照axis的中的哪个元素来进行排序
    # ascending这个参数的默认值是True,按照升序排序,当传入False时,按照降序进行排列
    # kind这个参数表示按照什么样算法来进行排序,默认值是quicksort(快速排序),也可以传入mergesort(归并排序)或者是heapsort(堆排序)
    
    df.sort_values(by='col1')
    df.sort_values(by=['col1', 'col2'])

    pandas.DataFrame.astype:

    DataFrame.astype(self, dtype, copy=True, errors='raise', **kwargs)
    # dtype : data type, or dict of column name
    # Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.
    
    d = {'col1': [1, 2], 'col2': [3, 4]}
    df = pd.DataFrame(data=d)
    df.dtypes
    
    df.astype('int32').dtypes
    df.astype({'col1': 'int32'}).dtypes

    pandas.DataFrame.fillna: 

    DataFrame.fillna(self, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
    # fillna()会填充nan数据,返回填充后的结果。如果希望在原DataFrame中修改,则把inplace设置为True
  • 相关阅读:
    【JS】逻辑处理
    XCODE
    mysql基础之-mysql锁和事务(七)
    mysql数据库-mysql数据定义语言DDL (Data Definition Language)归类(六)
    mysql基础-数据库表简单查询-记录(五)
    mysql基础-数据库表的管理-记录(四)
    mysql基础-数据类型和sql模式-学习之(三)
    mysql基础-数据库初始化操作必要步骤和客户端工具使用-记录(二)
    mysql基础-新版5.7.10源码安装-记录(一)
    LVS概述
  • 原文地址:https://www.cnblogs.com/ljygoodgoodstudydaydayup/p/11420596.html
Copyright © 2011-2022 走看看