zoukankan      html  css  js  c++  java
  • pandas将非数值型特征转化为数值型(one-hot编码)

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    name = np.array([['jack', 'ross', 'john', 'blues', 'frank', 'bitch', 'haha', 'asd', 'loubin']])
    age = np.array([[12, 32, 23, 4,32,45,65,23,65]])
    married = np.array([[1, 0, 1, 1, 0, 1, 0, 0, 0]])
    gender = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 1]])
    
    
    matrix = np.concatenate((name, age, married, gender), axis=0)
    matrix = matrix.T
    
    
    data = pd.DataFrame(data=matrix, columns=['name', 'age', 'married', 'gender'])
    print(data)
    
    print(pd.get_dummies(data=data['name'], prefix='name'))

    运行结果如下,新的表的列名是以被编码的列的值进行命名,可以定义前缀

    C:softwareAnacondaenvsmlpython.exe C:/学习/python/科比生涯数据分析/venv/groupy.py
         name age married gender
    0    jack  12       1      0
    1    ross  32       0      0
    2    john  23       1      0
    3   blues   4       1      0
    4   frank  32       0      1
    5   bitch  45       1      1
    6    haha  65       0      1
    7     asd  23       0      1
    8  loubin  65       0      1
       name_asd  name_bitch  name_blues  ...  name_john  name_loubin  name_ross
    0         0           0           0  ...          0            0          0
    1         0           0           0  ...          0            0          1
    2         0           0           0  ...          1            0          0
    3         0           0           1  ...          0            0          0
    4         0           0           0  ...          0            0          0
    5         0           1           0  ...          0            0          0
    6         0           0           0  ...          0            0          0
    7         1           0           0  ...          0            0          0
    8         0           0           0  ...          0            1          0
    
    [9 rows x 9 columns]
    
    Process finished with exit code 0
  • 相关阅读:
    vSphere笔记01~02
    【科普】人眼到底等于多少像素
    《标题党》自我修炼的10个秘籍
    说说云盘背后的黑科技!
    用shell批量编码转换
    Java课设--俄罗斯方块Tetris
    教程,Python图片转字符堆叠图
    谈谈索引的哲学思想
    MySQL索引实战经验总结
    博客要转型啦
  • 原文地址:https://www.cnblogs.com/loubin/p/11919777.html
Copyright © 2011-2022 走看看