zoukankan      html  css  js  c++  java
  • Python pandas DataFrame操作

    1. 从字典创建Dataframe

    >>> import pandas as pd
    >>> dict1 = {'col1':[1,2,5,7],'col2':['a','b','c','d']}
    >>> df = pd.DataFrame(dict1)
    >>> df
       col1 col2
    0     1    a
    1     2    b
    2     5    c
    3     7    d

    2. 从列表创建Dataframe (先把列表转化为字典,再把字典转化为DataFrame)

    >>> lista = [1,2,5,7]
    >>> listb = ['a','b','c','d']
    >>> df = pd.DataFrame({'col1':lista,'col2':listb})
    >>> df
       col1 col2
    0     1    a
    1     2    b
    2     5    c
    3     7    d

    3. 从列表创建DataFrame,指定data和columns

    >>> a = ['001','zhangsan','M']
    >>> b = ['002','lisi','F']
    >>> c = ['003','wangwu','M']
    >>> df = pandas.DataFrame(data=[a,b,c],columns=['id','name','sex'])
    >>> df
        id      name sex
    0  001  zhangsan   M
    1  002      lisi   F
    2  003    wangwu   M

    4. 修改列名,从['id','name','sex']修改为['Id','Name','Sex']

    >>> df.columns = ['Id','Name','Sex']
    >>> df
        Id      Name Sex
    0  001  zhangsan   M
    1  002      lisi   F
    2  003    wangwu   M

    5. 调整DataFrame列顺序、调整列编号从1开始

    http://www.cnblogs.com/huahuayu/p/8324755.html 

    6. DataFrame随机生成10行4列int型数据

    >>> import pandas
    >>> import numpy
    >>> df = pandas.DataFrame(numpy.random.randint(0,100,size=(10, 4)), columns=list('ABCD')) # 0,100指定随机数为0到100之间(包括0,不包括100),size = (10,4)指定数据为10行4列,column指定列名
    >>> df
        A   B   C   D
    0  67  28  37  66
    1  21  27  43  37
    2  73  54  98  85
    3  40  78   4  93
    4  99  60  63  16
    5  48  46  24  61
    6  59  52  62  28
    7  20  74  36  64
    8  14  13  46  60
    9  18  44  70  36

    7. 用时间序列做index名

    >>> df # 原本index为自动生成的0~9
        A   B   C   D
    0  31  25  45  67
    1  62  12  61  88
    2  79  36  20  97
    3  26  57  50  44
    4  24  12  50   1
    5   4  61  99  62
    6  40  47  52  27
    7  83  66  71   4
    8  58  59  25  62
    9  38  81  60   8
    >>> import pandas
    >>> dates = pandas.date_range('20180121',periods=10)
    >>> dates # 从20180121开始,共10天
    DatetimeIndex(['2018-01-21', '2018-01-22', '2018-01-23', '2018-01-24',
                   '2018-01-25', '2018-01-26', '2018-01-27', '2018-01-28',
                   '2018-01-29', '2018-01-30'],
                  dtype='datetime64[ns]', freq='D')
    >>> df.index = dates # 将dates赋值给index
    >>> df
                 A   B   C   D
    2018-01-21  31  25  45  67
    2018-01-22  62  12  61  88
    2018-01-23  79  36  20  97
    2018-01-24  26  57  50  44
    2018-01-25  24  12  50   1
    2018-01-26   4  61  99  62
    2018-01-27  40  47  52  27
    2018-01-28  83  66  71   4
    2018-01-29  58  59  25  62
    2018-01-30  38  81  60   8

    8. dataframe 实现类SQL操作

    pandas官方文档 Comparison with SQL

    https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html

    【Python实战】Pandas:让你像写SQL一样做数据分析(一)

    https://www.cnblogs.com/en-heng/category/778194.html 

  • 相关阅读:
    <frame>、<iframe>、<embed>、<object> 和 <applet>
    xss攻击
    回流 和 重绘
    defer 和 async 的区别
    从输入URL到浏览页面的过程
    webkit vs v8
    缓存
    LeetCode
    LeetCode
    LeetCode
  • 原文地址:https://www.cnblogs.com/huahuayu/p/8227494.html
Copyright © 2011-2022 走看看