机器学习sklearn（十七）：特征工程（八）特征选择（三）卡方选择（二）卡方检验

zoukankan html css js c++ java

机器学习sklearn（十七）：特征工程（八）特征选择（三）卡方选择（二）卡方检验
Python有包可以直接实现特征选择，也就是看自变量对因变量的相关性。今天我们先开看一下如何用卡方检验实现特征选择。

1. 首先import包和实验数据：
from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2 from sklearn.datasets import load_iris #导入IRIS数据集 iris = load_iris() iris.data#查看数据
结果输出：
array([[ 5.1, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, 0.2], [ 4.7, 3.2, 1.3, 0.2], [ 4.6, 3.1, 1.5, 0.2], [ 5. , 3.6, 1.4, 0.2], [ 5.4, 3.9, 1.7, 0.4], [ 4.6, 3.4, 1.4, 0.3],
2. 使用卡方检验来选择特征
model1 = SelectKBest(chi2, k=2)#选择k个最佳特征 model1.fit_transform(iris.data, iris.target)#iris.data是特征数据，iris.target是标签数据，该函数可以选择出k个特征
结果输出为：
array([[ 1.4, 0.2],
[ 1.4, 0.2],
[ 1.3, 0.2],
[ 1.5, 0.2],
[ 1.4, 0.2],
[ 1.7, 0.4],
[ 1.4, 0.3],

可以看出后使用卡方检验，选择出了后两个特征。如果我们还想查看卡方检验的p值和得分，可以使用第3步。

3. 查看p-values和scores

model1.scores_ #得分

得分输出为：
array([ 10.81782088, 3.59449902, 116.16984746, 67.24482759])
可以看出后两个特征得分最高，与我们第二步的结果一致；
model1.pvalues_ #p-values
p值输出为：
array([ 4.47651499e-03, 1.65754167e-01, 5.94344354e-26, 2.50017968e-15])
可以看出后两个特征的p值最小，置信度也最高，与前面的结果一致。

API

class sklearn.feature_selection.SelectKBest(score_func=<function f_classif>, *, k=10)

Select features according to the k highest scores.

Read more in the User Guide.

Parameters
score_funccallable, default=f_classif

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below “See Also”). The default function only works with classification tasks.

New in version 0.18.

kint or “all”, default=10

Number of top features to select. The “all” option bypasses selection, for use in a parameter search.

Attributes
scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores, None if score_func returned only scores.
>>> from sklearn.datasets import load_digits >>> from sklearn.feature_selection import SelectKBest, chi2 >>> X, y = load_digits(return_X_y=True) >>> X.shape (1797, 64) >>> X_new = SelectKBest(chi2, k=20).fit_transform(X, y) >>> X_new.shape (1797, 20)
查看全文

相关阅读:
Informix日期获取上周上月昨天去年SQL
PDI-KETTLE-4 使用Kettle完成通用DB生成指定文件并通过FTP上传
 日常问题解决记录二：DOS下切换盘符和工作目录
 PDI-KETTLE-3：数据库连接
 window下安装node.js
【原创】正则断言的使用--为自动生成的get方法添加注解字段
 【原创】文本工具的使用--根据数据库字段快速生成该表对应的Model类属性
 【原创】字符串工具类--驼峰法与下划线法互转
 【原创】字符串工具类--获取汉字对应的拼音(全拼或首字母)
【原创】关于oracle11G空表无法导出问题的解决方法

原文地址：https://www.cnblogs.com/qiu-hua/p/14904436.html

机器学习sklearn（十七）： 特征工程（八）特征选择（三）卡方选择（二）卡方检验

1. 首先import包和实验数据：

2. 使用卡方检验来选择特征

3. 查看p-values和scores

机器学习sklearn（十七）：特征工程（八）特征选择（三）卡方选择（二）卡方检验