机器学习入门-数据下采样 np.random_choice

1. np.random_choice(array, len) 进行随机的数据选择，array表示抽取的对象，len表示抽取样本的个数

数据的下采样是对多的数据进行np.random.choice 随机的抽取，抽取出于少的样本相同的索引个数，将两组索引进行合并，从原始数据中重新取值

# 2 进行数据的下采样

negtive_len = len(data[data.Class==1])
negtive_index = data[data.Class==1].index

# 获得正常样本的数据便签
normal_len = len(data[data.Class==0])
normal_index = data[data.Class==0].index
# 随机抽取
under_normal_index = np.random.choice(normal_index, negtive_len)
# 将两个样本的索引进行合并
under_index = np.concatenate([negtive_index, under_normal_index])

under_data = data.iloc[under_index, :]
under_x = under_data.loc[:, under_data.columns != 'Class']
under_y = under_data.loc[:, under_data.columns == 'Class']

查看全文

相关阅读:
Pandas缺失值处理
 文件读取与存储
 DataFrame运算
 C++11 不抛异常的new operator
In p = new Fred(), does the Fred memory “leak” if the Fred constructor throws an exception?
method chaining
C++中的运算符重载
 Why am I getting an error converting a Foo** → const Foo**?
The constness of a method should makes sense from outside the object
Virtual Friend Function

原文地址：https://www.cnblogs.com/my-love-is-python/p/10271330.html