zoukankan      html  css  js  c++  java
  • 特征选择

    # -*- coding: utf-8 -*-
    """
    Created on Wed Aug 10 20:26:15 2016
    
    @author: qqhfeng
    """
    
    #模块1 VarianceThreshold 选择特征值
    '''
    Feature selector that removes all low-variance features. 
    This feature selection algorithm looks only at the features (X), 
    not the desired outputs (y), and can thus be used for unsupervised learning.
    
    VarianceThreshold is a simple baseline approach to feature selection. 
    It removes all features whose variance doesn’t meet some threshold.
    By default, it removes all zero-variance features, i.e. 
    features that have the same value in all samples. 
    As an example, suppose that we have a dataset with boolean features, 
    and we want to remove all features that are either one or zero (on or off) 
    in more than 80% of the samples. Boolean features are Bernoulli random variables,
    and the variance of such variables is given by
    '''
    
    from sklearn.feature_selection import VarianceThreshold
    X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]
    #sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
    sel = VarianceThreshold()
    print sel.fit_transform(X)
    
    
    
    
    #模块2 选择最重要的 SelectKBest removes all but the k highest scoring features
    from sklearn.datasets import load_iris
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import chi2
    iris = load_iris()
    X, y = iris.data, iris.target
    print X.shape
    X_new = SelectKBest(chi2, k=2).fit_transform(X, y) #chi2是一种特征重要性评价方法
    print X_new.shape
    
    
    
    #模块3 递归特征消除法
  • 相关阅读:
    实战演练:通过伪列、虚拟列实现SQL优化
    python try else
    Prince2是怎么考试的?
    Prince2是怎么考试的?
    Prince2是怎么考试的?
    Prince2是怎么考试的?
    拦截器
    拦截器
    拦截器
    拦截器
  • 原文地址:https://www.cnblogs.com/qqhfeng/p/5758354.html
Copyright © 2011-2022 走看看