zoukankan      html  css  js  c++  java
  • pandas练习(二)------ 数据过滤与排序

    数据过滤与排序------探索2012欧洲杯数据

    相关数据见(github

    步骤1 - 导入pandas库

    import pandas as pd

    步骤2 - 数据集

    path2 = "./data/Euro2012.csv"      # Euro2012.csv

    步骤3 - 将数据集命名为euro12

    euro12 = pd.read_csv(path2)
    euro12.tail()

    输出:

    步骤4 选取 Goals 这一列

    euro12.Goals  # euro12['Goals'] 

    输出:

    步骤5 有多少球队参与了2012欧洲杯?

    euro12.shape[0]

    输出:

    16

    步骤6 该数据集中一共有多少列(columns)?

    euro12.info()

    输出:

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 16 entries, 0 to 15
    Data columns (total 35 columns):
    Team                          16 non-null object
    Goals                         16 non-null int64
    Shots on target               16 non-null int64
    Shots off target              16 non-null int64
    Shooting Accuracy             16 non-null object
    % Goals-to-shots              16 non-null object
    Total shots (inc. Blocked)    16 non-null int64
    Hit Woodwork                  16 non-null int64
    Penalty goals                 16 non-null int64
    Penalties not scored          16 non-null int64
    Headed goals                  16 non-null int64
    Passes                        16 non-null int64
    Passes completed              16 non-null int64
    Passing Accuracy              16 non-null object
    Touches                       16 non-null int64
    Crosses                       16 non-null int64
    Dribbles                      16 non-null int64
    Corners Taken                 16 non-null int64
    Tackles                       16 non-null int64
    Clearances                    16 non-null int64
    Interceptions                 16 non-null int64
    Clearances off line           15 non-null float64
    Clean Sheets                  16 non-null int64
    Blocks                        16 non-null int64
    Goals conceded                16 non-null int64
    Saves made                    16 non-null int64
    Saves-to-shots ratio          16 non-null object
    Fouls Won                     16 non-null int64
    Fouls Conceded                16 non-null int64
    Offsides                      16 non-null int64
    Yellow Cards                  16 non-null int64
    Red Cards                     16 non-null int64
    Subs on                       16 non-null int64
    Subs off                      16 non-null int64
    Players Used                  16 non-null int64
    dtypes: float64(1), int64(29), object(5)
    memory usage: 4.5+ KB

    步骤7 将数据集中的列Team, Yellow Cards和Red Cards单独存为一个名叫discipline的数据框

    discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
    discipline

    输出:

     

    步骤8 对数据框discipline按照先Red Cards再Yellow Cards进行排序

    discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending = False)

     输出:

     

    步骤9 计算每个球队拿到的黄牌数的平均值

    round(discipline['Yellow Cards'].mean())

    输出:

    7.0

    步骤10 找到进球数Goals超过6的球队数据

    euro12[euro12.Goals > 6]

    输出:

    步骤11 选取以字母G开头或以e结尾的球队数据

    # euro12[euro12.Team.str.startswith('G')]
    euro12[euro12.Team.str.endswith('e')]  # 以字母e结束的球队

    输出:

    步骤12 选取前7列

    euro12.iloc[: , 0:7]

    输出:

    步骤13 选取除了最后3列之外的全部列

    euro12.iloc[: , :-3]

    输出:

    步骤14 找到英格兰(England)、意大利(Italy)和俄罗斯(Russia)的命中率(Shooting Accuracy)

    euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]

    输出:

    参考链接:

    1、http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook

    2、https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/

    3、https://github.com/guipsamora/pandas_exercises

  • 相关阅读:
    WP之Sql Server CE数据库
    WP布局之Pivot和Panorama
    设计模式之职责链模式
    设计模式之命令模式
    设计模式之桥接模式
    设计模式之组合模式
    设计模式之备忘录模式
    设计模式之适配器模式
    记录参加微软打造开发者社会生态圈线下会议
    ”我的2016“-太多难忘的第一次
  • 原文地址:https://www.cnblogs.com/xiaxuexiaoab/p/9176699.html
Copyright © 2011-2022 走看看