zoukankan      html  css  js  c++  java
  • pandas-08 pd.cut()的功能和作用

    pandas-08 pd.cut()的功能和作用

    pd.cut()的作用,有点类似给成绩设定优良中差,比如:0-59分为差,60-70分为中,71-80分为优秀等等,在pandas中,也提供了这样一个方法来处理这些事儿。直接上代码:

    import numpy as np
    import pandas as pd
    from pandas import Series, DataFrame
    
    np.random.seed(666)
    
    score_list = np.random.randint(25, 100, size=20)
    print(score_list)
    # [27 70 55 87 95 98 55 61 86 76 85 53 39 88 41 71 64 94 38 94]
    
    # 指定多个区间
    bins = [0, 59, 70, 80, 100]
    
    score_cut = pd.cut(score_list, bins)
    print(type(score_cut)) # <class 'pandas.core.arrays.categorical.Categorical'>
    print(score_cut)
    '''
    [(0, 59], (59, 70], (0, 59], (80, 100], (80, 100], ..., (70, 80], (59, 70], (80, 100], (0, 59], (80, 100]]
    Length: 20
    Categories (4, interval[int64]): [(0, 59] < (59, 70] < (70, 80] < (80, 100]]
    '''
    print(pd.value_counts(score_cut)) # 统计每个区间人数
    '''
    (80, 100]    8
    (0, 59]      7
    (59, 70]     3
    (70, 80]     2
    dtype: int64
    '''
    
    df = DataFrame()
    df['score'] = score_list
    df['student'] = [pd.util.testing.rands(3) for i in range(len(score_list))]
    print(df)
    '''
        score student
    0      27     1ul
    1      70     yuK
    2      55     WWK
    3      87     EU6
    4      95     Vqn
    5      98     KAf
    6      55     QNT
    7      61     HaE
    8      86     aBo
    9      76     MMa
    10     85     Ctc
    11     53     5BI
    12     39     wBp
    13     88     WMB
    14     41     q5t
    15     71     MjZ
    16     64     nTc
    17     94     Kyx
    18     38     Rlh
    19     94     2uV
    '''
    
    # 使用cut方法进行分箱
    print(pd.cut(df['score'], bins))
    '''
    0       (0, 59]
    1      (59, 70]
    2       (0, 59]
    3     (80, 100]
    4     (80, 100]
    5     (80, 100]
    6       (0, 59]
    7      (59, 70]
    8     (80, 100]
    9      (70, 80]
    10    (80, 100]
    11      (0, 59]
    12      (0, 59]
    13    (80, 100]
    14      (0, 59]
    15     (70, 80]
    16     (59, 70]
    17    (80, 100]
    18      (0, 59]
    19    (80, 100]
    Name: score, dtype: category
    Categories (4, interval[int64]): [(0, 59] < (59, 70] < (70, 80] < (80, 100]]
    '''
    
    df['Categories'] = pd.cut(df['score'], bins)
    print(df)
    '''
        score student Categories
    0      27     1ul    (0, 59]
    1      70     yuK   (59, 70]
    2      55     WWK    (0, 59]
    3      87     EU6  (80, 100]
    4      95     Vqn  (80, 100]
    5      98     KAf  (80, 100]
    6      55     QNT    (0, 59]
    7      61     HaE   (59, 70]
    8      86     aBo  (80, 100]
    9      76     MMa   (70, 80]
    10     85     Ctc  (80, 100]
    11     53     5BI    (0, 59]
    12     39     wBp    (0, 59]
    13     88     WMB  (80, 100]
    14     41     q5t    (0, 59]
    15     71     MjZ   (70, 80]
    16     64     nTc   (59, 70]
    17     94     Kyx  (80, 100]
    18     38     Rlh    (0, 59]
    19     94     2uV  (80, 100]
    '''
    
    # 但是这样的方法不是很适合阅读,可以使用cut方法中的label参数
    # 为每个区间指定一个label
    df['Categories'] = pd.cut(df['score'], bins, labels=['low', 'middle', 'good', 'perfect'])
    print(df)
    '''
        score student Categories
    0      27     1ul        low
    1      70     yuK     middle
    2      55     WWK        low
    3      87     EU6    perfect
    4      95     Vqn    perfect
    5      98     KAf    perfect
    6      55     QNT        low
    7      61     HaE     middle
    8      86     aBo    perfect
    9      76     MMa       good
    10     85     Ctc    perfect
    11     53     5BI        low
    12     39     wBp        low
    13     88     WMB    perfect
    14     41     q5t        low
    15     71     MjZ       good
    16     64     nTc     middle
    17     94     Kyx    perfect
    18     38     Rlh        low
    19     94     2uV    perfect
    '''
    
  • 相关阅读:
    Pytest 系列(28)- 参数化 parametrize + @allure.title() 动态生成标题
    Pytest 系列(27)- allure 命令行参数
    Pytest 系列(26)- 清空 allure 历史报告记录
    Pytest 系列(25)- 标记用例级别 @allure.
    Pytest 系列(24)- allure 环境准备
    基于Python的三种Bandit算法的实现
    博客迁移
    团体程序设计天梯赛2020游记
    P1825 [USACO11OPEN]Corn Maze S
    # JavaScript中的对象转数组Array.prototype.slice.call()方法详解
  • 原文地址:https://www.cnblogs.com/wenqiangit/p/11252758.html
Copyright © 2011-2022 走看看