zoukankan      html  css  js  c++  java
  • pandas (2)

    pandas 数据类型

    赋值

    #Series赋值
    s = pd.Series([3,-5,7,4],index = ['a','b','c','d'])
    #DataFrame 赋值
    data = {'Country':['belgium','India','Brazil'],
            'Capital':['Brussels','New Delhi','Brasilia'],
            'Population':[11190846,1303171035,207847528]}
    df = pd.DataFrame(data,cloumns=['Country','Capital','Population'])
    

    数据选择

    #选择一个项
    s['b']
    #    -5
    #选择多个
    df[1:]
    
    #选择第n行,如果已经定义了clonms,还可以直接跟 =['xx','xx']赋值新行
    
    df.loc[n]
    

    选择、布尔下标

    By Position 坐标选择:

    df.iloc([0],[0])
    #    `Belgium`
    df.iat([0],[0])
    #    `Belgium`
    

    By Label 标签选择:

    df.loc([0],['country'])
    df.at([0],['country'])
    

    By Label/Position :

    de.ix[2]
    #     Country     Brazil
    #    Capital        rasilia
    #    Population    207847528
    df.ix[:,'Capital']
    #    0    Brussels
    #    1    Delhi
    #    2    Brasilia
    df.ix[1,'Capital']
    #    'New Delhi'
    

    使用ix方法被提示如下:ix is deprecated
    pic1

    Boolean Indexing 布尔下标(筛选)

    s[-(s>1)]
    s[(s<-1)|(s>2)]
    df[df['Population']>1200000000]
    

    Dropping

    s.drop(['a','c'])
    df.drop('Country',axis=1)
    

    Sort&Rank 排序

    df.sort_index()
    df.sort_values(by='Country')
    df.rank()
    ## Retrieving Series/DataFrame Information
    ### Basic Information
    ```python
    df.shape    # (rows,columns)
    df.index    # Describe index
    df.cloumns    # Describe DataFrame cloumns
    df.info()    # Info on DataFrame
    df.count()    # Number of non-NA values 默认输出每列的项数
    

    Summary 概要

    df.sum()            #sum of values
    df.cumsum()            #cummulative sum of values 从上到下的累加,输出一个新的dataframe
    df.min()/df.max()    #Minimum/maximum values
    df.idxmin()/df.idxmax()    #Minimum/maximum index values
    df.describe()        #Summary statistics 所有特征计算汇总统计
    df.mean()            #Mean of values 平均值(所有int64数据的)
    df.median()            #Median of values 中间值
    

    Applying Functions 应用函数

    f = lambda x : x*2
    df.apply(f)
    df.applymap(f)
    

    df.apply()函数只输出 df*2,不改变df的值。此例中博主没发现df.applymap()df.apply()的区别。

    Data Alignment 数据对齐

    Internal Data Alignment 内部数据对齐

    pandas DataFrame学习

    I/O 文件读写

    csv文件

    pd.read_csv()
    pd.to_csv()
    

    Excel文件

    pd.read_excel('path')
    pd.to_excel('path',sheet_name='name')
    #读取单个文件下不同sheets
    xlsx = pd.ExcelFile('path')
    df = pd.read_excel(xlsx,'sheetname')
    

    SQL Query or Database Table

    from sqlalchemy import create_engine
    engine = create_engine('sqlite:///:memory:')
    pd.read_sql("SELECT * FROM my_table;",engine)
    pd.read_sql_table('my_table',engine)
    pd.read_sql_query("SELECT * FROM my_table;",engine)
    #生成sql
    pd.to_sql('myDf',engine)
    

    参考

  • 相关阅读:
    bzoj3293 分金币
    考前模板整理
    CF785D Anton and School
    容斥法解决错排问题
    CF1248F Catowice City
    CF1248E Queue in the Train
    CF1244F Chips
    CF1244C The Football Season
    Noip2016Day1T2 天天爱跑步
    Noip2015Day2T3 运输计划
  • 原文地址:https://www.cnblogs.com/aubucuo/p/pandas2.html
Copyright © 2011-2022 走看看