zoukankan      html  css  js  c++  java
  • Python学习笔记:实现SQL中case when构建新列功能

    要实现一个类似于 SQL 中的 case when 功能,为用户打上标签。

    例如:

    select tj_month,
           name,      
           online_time,
           case when online_time < 12 then '(3,12]'
                     when online_time >= 12 and online_time < 24 then '[12,24)'
                     when online_time >= 24 and online_time < 36 then '[24,36)'
                     when online_time >= 36 and online_time < 48 then '[36,48)'
                     when online_time >= 48 and online_time < 60 then '[48,60)'
                else '>60' end as online_time_cut
    from table_name
    where tj_month = '202106';
    

    一、利用pandas.DataFrame.loc直接筛选

    构造测试数据框。

    import numpy as np
    import pandas as pd
    data = np.array([[np.nan, 0], [2, 0], [np.nan, 1]])
    df = pd.DataFrame(data=data, columns=['a', 'b'])
    '''
    	a	b
    0	NaN	0.0
    1	2.0	0.0
    2	NaN	1.0
    '''
    

    直接筛选符合条件数据进行打标。

    # 此方法已不推荐 不支持 建议使用loc/iloc定位
    df[(df['a'].isnull()) & (df['b'] == 0)]['c'] = 1
    
    # loc定位
    df['c'] = 0
    df.loc[(df['a'].isnull()) & (df['b'] == 0), 'c'] = 1
    '''
    
        a	b	c
    0	NaN	0.0	1.0
    1	2.0	0.0	NaN
    2	NaN	1.0	NaN
    '''
    

    二、利用np.where筛选

    # 满足条件 输出x 否则输出y
    np.where(condition, x, y)
    
    np.where(df.isnull(), 100, 5)
    '''
    array([[100,   5],
           [  5,   5],
           [100,   5]])
    '''
    
    # 打标签
    df['c'] = np.where((df['a'].isnull()) & (df['b'] == 0), 1, 0)
    

    One more嵌套判断的例子:

    df['class'] = np.where(df['score'].between(0, 60, inclusive=False), '不及格',
                          np.where(df['score'].between(60, 80, inclusive=True), '良好', '优秀'))
    

    三、利用np.select筛选

    np.select 函数可以根据某些条件筛选某些元素,使用语法为:

    np.select(condition_list, choice_list, default=0)
    # 条件列表、执行操作列表、缺失值
    # 返回列表
    

    实操:

    df['c'] = np.select([(df['a'].isnull()) & (df['b'] == 0),
                        (df['a'].isnull()) & (df['b'] == 1),
                        (df['a'] == 2) & (df['b'] == 0)],
                        ['one', 'two', 'three'],
                        default = 'XXX')
    '''
    	a	b	c
    0	NaN	0.0	one
    1	2.0	0.0	three
    2	NaN	1.0	two
    '''
    

    四、利用apply函数与if语句

    apply 应用在 dataframe 上,用于对行或者列进行计算。

    • axis=1 指定按行计算
    • lambda匿名函数判断满足条件为1,不满足为0
    df['c'] = df.apply(lambda x: 1 if np.isnan(x[0]) and x[1] == 0 else 0, axis=1)
    df
    '''
    	a	b	c
    0	NaN	0.0	1
    1	2.0	0.0	0
    2	NaN	1.0	0
    '''
    

    另外一个简单的例子:

    import pandas as pd
    import numpy as np
    df = pd.DataFrame(np.random.randint(0,11,size=(1000000,5)), columns=('a','b','c','d','e'))
    
    def func(a,b,c,d,e):
        if e == 10:
            return c*d
        elif (e < 10) and (e >= 5):
            return c+d
        elif e < 5:
            return a+b
    
    df['new'] = df.apply(lambda x: func(x['a'], x['b'], x['c'], x['d'], x['e']), axis=1)
    df
    '''
    	a	b	c	d	e	new
    0	2	0	5	7	5	12
    1	9	3	3	0	2	12
    2	2	0	9	10	3	2
    3	5	8	3	8	9	11
    4	1	10	0	2	0	11
    '''
    
    # 例子
    def function(x):
        if x['数学'] != 0:
            s = x['语文']/x['数学']
        else:
            s = 0
        return s
    
    data['result'] = data.apply(lambda x: function(x), axis=1)
    data
    

    参考链接:Pandas等价于创建新变量的SQL case when语句

    参考链接:在 pandas 中如何 实现 sql 查询中 case when then end 的功能?

    参考链接:pandas apply处理两个参参数 如何写case when then else end

  • 相关阅读:
    connect: network is unreachable问题的解决
    Linux图形界面与字符界面切换
    Xshell远程连接Linux服务器出错
    demo-placeholder兼容ie8
    Python设计TFTP客户端
    python hashlib、hmac模块
    python time、datetime、random、os、sys模块
    python 字符串和字典
    ssh远程登录时提示access denied
    指针的指针与指针的引用
  • 原文地址:https://www.cnblogs.com/hider/p/15353692.html
Copyright © 2011-2022 走看看