zoukankan      html  css  js  c++  java
  • 解决sklearn 随机森林数据不平衡的方法

    Handle Imbalanced Classes In Random Forest

     

    Preliminaries

    # Load libraries
    from sklearn.ensemble import RandomForestClassifier
    import numpy as np
    from sklearn import datasets

    Load Iris Flower Dataset

    # Load data
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target

    Adjust Iris Dataset To Make Classes Imbalanced

    # Make class highly imbalanced by removing first 40 observations
    X = X[40:,:]
    y = y[40:]
    
    # Create target vector indicating if class 0, otherwise 1
    y = np.where((y == 0), 0, 1)

    Train Random Forest While Balancing Classes

    When using RandomForestClassifier a useful setting is class_weight=balanced wherein classes are automatically weighted inversely proportional to how frequently they appear in the data. Specifically:

    wj=n/knj

    where wj is the weight to class jnn is the number of observations, nj is the number of observations in class j, and k is the total number of classes.

    # Create decision tree classifer object
    clf = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
    
    # Train model
    model = clf.fit(X, y)

    https://chrisalbon.com/machine_learning/trees_and_forests/handle_imbalanced_classes_in_random_forests/



    类别不平衡处理方法:
    https://segmentfault.com/a/1190000015248984
  • 相关阅读:
    C#基础
    进制转换
    养猪和存储空间
    独热码和二进制码
    mux_xz
    饮料机
    亚稳态
    mos管功耗
    功能覆盖率和代码覆盖率
    时序逻辑电路输出特点
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/10441792.html
Copyright © 2011-2022 走看看