zoukankan      html  css  js  c++  java
  • UDA机器学习基础—交叉验证

    交叉验证的目的是为了有在训练集中有更多的数据点,以获得最佳的学习效果,同时也希望有跟多的测试集数据来获得最佳验证。交叉验证的要点是将训练数据平分到k个容器中,在k折交叉验证中,将运行k次单独的试验,每一次试验中,你将挑选k个训练集中的一个作为验证集,剩下k-1个作为训练集,训练你的模型,用测试集测试你的模型。这样运行k次,有十个不同的测试集,将十个测试集的表现平均,就是将这k次试验结果取平均。这样你就差不多用了全部数据去训练,也用全部数据去测试。

    #!/usr/bin/python
    
    
    """
        Starter code for the validation mini-project.
        The first step toward building your POI identifier!
    
        Start by loading/formatting the data
    
        After that, it's not our code anymore--it's yours!
    """
    
    import pickle
    import sys
    sys.path.append("../tools/")
    from feature_format import featureFormat, targetFeatureSplit
    from sklearn.metrics import accuracy_score
    from sklearn.cross_validation import train_test_split
    data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )
    
    ### first element is our labels, any added elements are predictor
    ### features. Keep this the same for the mini-project, but you'll
    ### have a different feature list when you do the final project.
    features_list = ["poi", "salary"]
    
    data = featureFormat(data_dict, features_list)
    labels, features = targetFeatureSplit(data)
    features_train,features_test,labels_train,labels_test=train_test_split(features,labels,test_size=0.3,random_state=42)
    from sklearn.tree import DecisionTreeClassifier
    dlf=DecisionTreeClassifier()
    dlf.fit(features_train ,labels_train)
    f=dlf.predict(features_test)
    print accuracy_score(f,labels_test)
    
    
    
    ### it's all yours from here forward! 
    

      

  • 相关阅读:
    gtest(C++单元测试框架)
    tinyXML入门
    笔记 解决vue3动态绑定本地图片失效问题
    面试技巧
    vuex 状态管理
    插槽的使用
    Vue-router 路由
    Vue组件
    (转)JS 常用 DOM
    9-26
  • 原文地址:https://www.cnblogs.com/fuhang/p/8512977.html
Copyright © 2011-2022 走看看