zoukankan      html  css  js  c++  java
  • 特征选择

    # -*- coding: utf-8 -*-
    """
    Created on Wed Aug 10 20:26:15 2016
    
    @author: qqhfeng
    """
    
    #模块1 VarianceThreshold 选择特征值
    '''
    Feature selector that removes all low-variance features. 
    This feature selection algorithm looks only at the features (X), 
    not the desired outputs (y), and can thus be used for unsupervised learning.
    
    VarianceThreshold is a simple baseline approach to feature selection. 
    It removes all features whose variance doesn’t meet some threshold.
    By default, it removes all zero-variance features, i.e. 
    features that have the same value in all samples. 
    As an example, suppose that we have a dataset with boolean features, 
    and we want to remove all features that are either one or zero (on or off) 
    in more than 80% of the samples. Boolean features are Bernoulli random variables,
    and the variance of such variables is given by
    '''
    
    from sklearn.feature_selection import VarianceThreshold
    X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]
    #sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
    sel = VarianceThreshold()
    print sel.fit_transform(X)
    
    
    
    
    #模块2 选择最重要的 SelectKBest removes all but the k highest scoring features
    from sklearn.datasets import load_iris
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import chi2
    iris = load_iris()
    X, y = iris.data, iris.target
    print X.shape
    X_new = SelectKBest(chi2, k=2).fit_transform(X, y) #chi2是一种特征重要性评价方法
    print X_new.shape
    
    
    
    #模块3 递归特征消除法
  • 相关阅读:
    文档的几何形状和滚动
    聊聊并发——生产者消费者模式
    在JavaScript中什么时候使用==是正确的?
    HTML5使用canvas画图时,图片被自动放大模糊的问题
    获取元素的几种方式
    利用jQuery和CSS实现环形进度条
    最实用、最常用的jQuery代码片段
    表格样式
    javascript常量的定义
    null 和 undefined 的区别
  • 原文地址:https://www.cnblogs.com/qqhfeng/p/5758354.html
Copyright © 2011-2022 走看看