zoukankan      html  css  js  c++  java
  • pandas-08 pd.cut()的功能和作用

    pandas-08 pd.cut()的功能和作用

    pd.cut()的作用,有点类似给成绩设定优良中差,比如:0-59分为差,60-70分为中,71-80分为优秀等等,在pandas中,也提供了这样一个方法来处理这些事儿。直接上代码:

    import numpy as np
    import pandas as pd
    from pandas import Series, DataFrame
    
    np.random.seed(666)
    
    score_list = np.random.randint(25, 100, size=20)
    print(score_list)
    # [27 70 55 87 95 98 55 61 86 76 85 53 39 88 41 71 64 94 38 94]
    
    # 指定多个区间
    bins = [0, 59, 70, 80, 100]
    
    score_cut = pd.cut(score_list, bins)
    print(type(score_cut)) # <class 'pandas.core.arrays.categorical.Categorical'>
    print(score_cut)
    '''
    [(0, 59], (59, 70], (0, 59], (80, 100], (80, 100], ..., (70, 80], (59, 70], (80, 100], (0, 59], (80, 100]]
    Length: 20
    Categories (4, interval[int64]): [(0, 59] < (59, 70] < (70, 80] < (80, 100]]
    '''
    print(pd.value_counts(score_cut)) # 统计每个区间人数
    '''
    (80, 100]    8
    (0, 59]      7
    (59, 70]     3
    (70, 80]     2
    dtype: int64
    '''
    
    df = DataFrame()
    df['score'] = score_list
    df['student'] = [pd.util.testing.rands(3) for i in range(len(score_list))]
    print(df)
    '''
        score student
    0      27     1ul
    1      70     yuK
    2      55     WWK
    3      87     EU6
    4      95     Vqn
    5      98     KAf
    6      55     QNT
    7      61     HaE
    8      86     aBo
    9      76     MMa
    10     85     Ctc
    11     53     5BI
    12     39     wBp
    13     88     WMB
    14     41     q5t
    15     71     MjZ
    16     64     nTc
    17     94     Kyx
    18     38     Rlh
    19     94     2uV
    '''
    
    # 使用cut方法进行分箱
    print(pd.cut(df['score'], bins))
    '''
    0       (0, 59]
    1      (59, 70]
    2       (0, 59]
    3     (80, 100]
    4     (80, 100]
    5     (80, 100]
    6       (0, 59]
    7      (59, 70]
    8     (80, 100]
    9      (70, 80]
    10    (80, 100]
    11      (0, 59]
    12      (0, 59]
    13    (80, 100]
    14      (0, 59]
    15     (70, 80]
    16     (59, 70]
    17    (80, 100]
    18      (0, 59]
    19    (80, 100]
    Name: score, dtype: category
    Categories (4, interval[int64]): [(0, 59] < (59, 70] < (70, 80] < (80, 100]]
    '''
    
    df['Categories'] = pd.cut(df['score'], bins)
    print(df)
    '''
        score student Categories
    0      27     1ul    (0, 59]
    1      70     yuK   (59, 70]
    2      55     WWK    (0, 59]
    3      87     EU6  (80, 100]
    4      95     Vqn  (80, 100]
    5      98     KAf  (80, 100]
    6      55     QNT    (0, 59]
    7      61     HaE   (59, 70]
    8      86     aBo  (80, 100]
    9      76     MMa   (70, 80]
    10     85     Ctc  (80, 100]
    11     53     5BI    (0, 59]
    12     39     wBp    (0, 59]
    13     88     WMB  (80, 100]
    14     41     q5t    (0, 59]
    15     71     MjZ   (70, 80]
    16     64     nTc   (59, 70]
    17     94     Kyx  (80, 100]
    18     38     Rlh    (0, 59]
    19     94     2uV  (80, 100]
    '''
    
    # 但是这样的方法不是很适合阅读,可以使用cut方法中的label参数
    # 为每个区间指定一个label
    df['Categories'] = pd.cut(df['score'], bins, labels=['low', 'middle', 'good', 'perfect'])
    print(df)
    '''
        score student Categories
    0      27     1ul        low
    1      70     yuK     middle
    2      55     WWK        low
    3      87     EU6    perfect
    4      95     Vqn    perfect
    5      98     KAf    perfect
    6      55     QNT        low
    7      61     HaE     middle
    8      86     aBo    perfect
    9      76     MMa       good
    10     85     Ctc    perfect
    11     53     5BI        low
    12     39     wBp        low
    13     88     WMB    perfect
    14     41     q5t        low
    15     71     MjZ       good
    16     64     nTc     middle
    17     94     Kyx    perfect
    18     38     Rlh        low
    19     94     2uV    perfect
    '''
    
  • 相关阅读:
    机试笔记1
    ZOJ 3846 GCD Reduce//水啊水啊水啊水
    最短路练习
    CodeForces 632C The Smallest String Concatenation//用string和sort就好了&&string的基础用法
    HDOJ 5667 Sequence//费马小定理 矩阵快速幂
    HDOJ 5666//快速积,推公式
    HDOJ 5672//模拟
    网络流相关知识点以及题目//POJ1273 POJ 3436 POJ2112 POJ 1149
    南理第八届校赛同步赛-C count_prime//容斥原理
    python之shutil模块使用方法
  • 原文地址:https://www.cnblogs.com/wenqiangit/p/11252758.html
Copyright © 2011-2022 走看看