zoukankan      html  css  js  c++  java
  • Pandas练习1

    来自 https://github.com/guipsamora/pandas_exercises

    Ex2 - Getting and Knowing your Data

    This time we are going to pull data directly from the internet.
    Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

    Step 1. Import the necessary libraries

    import pandas as pd
    import numpy as np
    

    Step 2. Import the dataset from this address.

    Step 3. Assign it to a variable called chipo.

    url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
    chipo = pd.read_csv(url,sep='	')
    

    Step 4. See the first 10 entries

    # Solution 1
    
    chipo[:10]
    
    order_id quantity item_name choice_description item_price
    0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
    1 1 1 Izze [Clementine] $3.39
    2 1 1 Nantucket Nectar [Apple] $3.39
    3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
    4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98
    5 3 1 Chicken Bowl [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98
    6 3 1 Side of Chips NaN $1.69
    7 4 1 Steak Burrito [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75
    8 4 1 Steak Soft Tacos [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25
    9 5 1 Steak Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25
    # Solution 2
    
    chipo.head(10)
    
    order_id quantity item_name choice_description item_price
    0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
    1 1 1 Izze [Clementine] $3.39
    2 1 1 Nantucket Nectar [Apple] $3.39
    3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
    4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98
    5 3 1 Chicken Bowl [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98
    6 3 1 Side of Chips NaN $1.69
    7 4 1 Steak Burrito [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75
    8 4 1 Steak Soft Tacos [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25
    9 5 1 Steak Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25

    Step 5. What is the number of observations in the dataset?

    type(chipo)
    
    pandas.core.frame.DataFrame
    
    # Solution 1
    
    len(chipo.index)
    
    4622
    
    # Solution 2
    
    chipo.shape[0]
    
    4622
    
    # Solution 3
    
    chipo.info()
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4622 entries, 0 to 4621
    Data columns (total 5 columns):
    order_id              4622 non-null int64
    quantity              4622 non-null int64
    item_name             4622 non-null object
    choice_description    3376 non-null object
    item_price            4622 non-null object
    dtypes: int64(2), object(3)
    memory usage: 180.7+ KB
    

    Step 6. What is the number of columns in the dataset?

    # Solution 1
    
    len(chipo.columns)
    
    5
    
    # Solution 2
    
    chipo.shape[1]
    
    5
    

    Step 7. Print the name of all the columns.

    list(chipo.columns)
    
    ['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']
    

    Step 8. How is the dataset indexed?

    chipo.index
    
    RangeIndex(start=0, stop=4622, step=1)
    

    Step 9. Which was the most-ordered item?

    c = chipo.groupby('item_name')
    c = c.sum()
    c = c.sort_values(['quantity'],ascending=False)
    c['quantity'].head(1)
    
    item_name
    Chicken Bowl    761
    Name: quantity, dtype: int64
    

    Step 10. For the most-ordered item, how many items were ordered?

    c = chipo.groupby('item_name')
    c = c.sum()
    c = c.sort_values(['quantity'],ascending=False)
    c['quantity'].head(1)
    
    item_name
    Chicken Bowl    761
    Name: quantity, dtype: int64
    

    Step 11. What was the most ordered item in the choice_description column?

    c = chipo.groupby('choice_description')
    c = c.sum()
    c = c.sort_values(['quantity'],ascending=False)
    c.head(1)
    
    order_id quantity
    choice_description
    [Diet Coke] 123455 159

    Step 12. How many items were orderd in total?

    chipo['quantity'].sum()
    
    4972
    

    Step 13. Turn the item price into a float

    Step 13.a. Check the item price type

    chipo['item_price'].dtypes
    
    dtype('O')
    

    Step 13.b. Create a lambda function and change the type of item price

    chipo['item_price'] = chipo['item_price'].apply(lambda x:x.replace('$','')).astype(np.float64);
    # dollarizer = lambda x:float(x[1:-1])
    # chipo.item_price = chipo.item_price.apply(dollarizer)
    

    Step 13.c. Check the item price type

    chipo['item_price'].dtypes
    
    dtype('float64')
    

    Step 14. How much was the revenue for the period in the dataset?

    (chipo['quantity']*chipo['item_price']).sum()
    
    39237.02
    

    Step 15. How many orders were made in the period?

    # Solution 1
    
    g = chipo.groupby(['order_id'])
    g.ngroups
    
    1834
    
    # Solution 2
    
    orders = chipo.order_id.value_counts().count()
    orders
    
    1834
    

    Step 16. What is the average revenue amount per order?

    # Solution 1
    
    chipo['revenue'] = chipo['quantity']*chipo['item_price']
    order_grouped = chipo.groupby(by=['order_id']).sum()
    order_grouped.mean()['revenue']
    
    21.394231188658654
    
    # Solution 2
    
    chipo.groupby(by=['order_id']).sum().mean()['revenue']
    
    21.394231188658654
    

    Step 17. How many different items are sold?

    chipo.item_name.value_counts().count()
    
    50
  • 相关阅读:
    DEVOPS技术实践_02:jenkins自动构建项目
    DEVOPS技术实践_01:jenkins集成平台
    nginx和keeplive实现负载均衡高可用
    web简易MP3播放插件 Aplayer篇章一
    龙珠MAD-视频列表(收集更新)
    使用咪咕云做C站视频直链源
    自翻唱龙珠超OP2【限界突破X幸存者】
    龙珠超的新OP【限界突破×サバイバー】
    [盘点]现今热门的h5网游
    一个简单的“贪吃蛇”小游戏
  • 原文地址:https://www.cnblogs.com/pkuimyy/p/11505970.html
Copyright © 2011-2022 走看看