zoukankan      html  css  js  c++  java
  • pandas操作

    python中使用了pandas的一些操作,特此记录下来:

    生成DataFrame

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_2'],
        'label': ["a,b", 'e,f,g'],
    })
    print(data)
    

    得到结果为:

       label v_id
    0    a,b  v_1
    1  e,f,g  v_2
    

    按照逗号分隔并拼接

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_2'],
        'label': ["a,b", 'e,f,g'],
    })
    df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
    print(df)
    

    得到结果为:

      v_id label
    0  v_1     a
    0  v_1     b
    1  v_2     e
    1  v_2     f
    1  v_2     g
    

    筛选符合条件的行

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
    })
    target_label = data.loc[data['label'].isin(["e", "f"])]
    print(target_label)
    

    得到结果为:

      v_id label
    1  v_2     e
    1  v_2     f
    

    筛选不符合条件的行

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
        'num': [1, 2, 3, 4, 5],
    })
    
    other_label1 = data[~data['label'].isin(["f", "g"])]
    print(other_label1)
    
    other_label2 = data.query("num<=3 & label!='b'")
    print(other_label2)
    

    得到结果为:

      v_id label
    0  v_1     a
    0  v_1     b
    1  v_2     e
    
      label  num v_id
    0     a    1  v_1
    2     e    3  v_2
    

    替换某一列的值

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
    })
    
    df = data.copy()
    df.loc[df["label"] != "", 'label'] = "1"
    print(df)
    

    得到结果为:

      v_id label
    0  v_1     1
    0  v_1     1
    1  v_2     1
    1  v_2     1
    1  v_2     1
    

    取某一列转换成list

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
    })
    print(data["label"].values.tolist())
    

    得到结果为:

    ['a', 'b', 'e', 'f', 'g']
    

    按照某一列去重

    import pandas as pd
    
    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
    })
    print(data.drop_duplicates(subset=['v_id']))
    

    得到结果为:

      v_id label
    0  v_1     a
    1  v_2     e
    

    复制dataframe并拼接

    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
    })
    
    data_copy = data.copy()
    times = 2
    for i in range(times):
        data_copy = pd.concat([data_copy,data])
    print(data_copy)
    

    得到结果为:

      v_id label
    0  v_1     a
    0  v_1     b
    1  v_2     e
    1  v_2     f
    1  v_2     g
    0  v_1     a
    0  v_1     b
    1  v_2     e
    1  v_2     f
    1  v_2     g
    0  v_1     a
    0  v_1     b
    1  v_2     e
    1  v_2     f
    1  v_2     g
    

    更改某一列类型

    data = pd.DataFrame({
        'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
        'label': ["a", 'b', "e", "f", "g"],
        'num': [1.0, 2.0, 3.0, 4.0, 5.0],
    })
    data["num"] = data[["num"]].astype(int)
    print(data)
    

    得到结果为:

      label  num v_id
    0     a    1  v_1
    1     b    2  v_1
    2     e    3  v_2
    3     f    4  v_2
    4     g    5  v_2
    
  • 相关阅读:
    C#常用笔记.cs
    OpenFileDialog 和 FolderBrowserDialog
    C#双缓存.cs
    数据库设计范式
    js 给你一个 32 位的有符号整数 x ,返回将 x 中的数字部分反转后的结果
    Study Plan The Sixth Day
    Study Plan The First Day
    Study Plan The Seventh Day
    Study Plan The Second Day
    C#CustomAttribute和泛型约束 应用于经典数据处理适配
  • 原文地址:https://www.cnblogs.com/TTyb/p/9717537.html
Copyright © 2011-2022 走看看