zoukankan      html  css  js  c++  java
  • stratified k-fold

    If you have a skewed dataset for binary classification with 90% positive samples and only 10% negative samples,

    you don't want to use random k-fold cross-validation. Using simple k-fold cross-validation for a dataset like this can result in folds with all negative samples.
    In these cases, we prefer using stratified k-fold cross-validation.Stratified k-fold cross-validation keeps the ratio of labels in each fold constant. So,in each fold, you will have the same 90% positive and 10% negative samples. Thus,whatever metric you choose to evaluate, it will give similar results across all folds.
    stratified k-fold可以避免fold中全是正样本,负样本的问题,保证每个fold中正负 样本 比率相同。
    import pandas as pd
    from sklearn import model_selection
     
     
    if __name__ == "__main__":
      # Training data is in a csv file called train.csv
      df = pd.read_csv("train.csv")
      # we create a new column called kfold and fill it with -1
      df["kfold"] = -1
      # the next step is to randomize the rows of the data
      df = df.sample(frac=1).reset_index(drop=True)
      # fetch targets
      y = df.target.values
      # initiate the kfold class from model_selection module
      kf = model_selection.StratifiedKFold(n_splits=5)
      # fill the new kfold column
      for f, (t_, v_) in enumerate(kf.split(X=df, y=y)):
        df.loc[v_, 'kfold'] = f
      # save the new csv with kfold column
      df.to_csv("train_folds.csv", index=False)
     
     
  • 相关阅读:
    218. The Skyline Problem (LeetCode)
    并发编程-读书笔记
    Lock Free (无锁并发)
    最近公共祖先 LCA 递归非递归
    Node.js 开发指南-读书笔记
    [paper reading] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection CVPR2019
    开发者必备,超实用的PHP代码片段!
    二级菜单联动效果
    页面js框架
    我的java mvc
  • 原文地址:https://www.cnblogs.com/songyuejie/p/14781202.html
Copyright © 2011-2022 走看看