zoukankan      html  css  js  c++  java
  • UDA机器学习基础—交叉验证

    交叉验证的目的是为了有在训练集中有更多的数据点,以获得最佳的学习效果,同时也希望有跟多的测试集数据来获得最佳验证。交叉验证的要点是将训练数据平分到k个容器中,在k折交叉验证中,将运行k次单独的试验,每一次试验中,你将挑选k个训练集中的一个作为验证集,剩下k-1个作为训练集,训练你的模型,用测试集测试你的模型。这样运行k次,有十个不同的测试集,将十个测试集的表现平均,就是将这k次试验结果取平均。这样你就差不多用了全部数据去训练,也用全部数据去测试。

    #!/usr/bin/python
    
    
    """
        Starter code for the validation mini-project.
        The first step toward building your POI identifier!
    
        Start by loading/formatting the data
    
        After that, it's not our code anymore--it's yours!
    """
    
    import pickle
    import sys
    sys.path.append("../tools/")
    from feature_format import featureFormat, targetFeatureSplit
    from sklearn.metrics import accuracy_score
    from sklearn.cross_validation import train_test_split
    data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )
    
    ### first element is our labels, any added elements are predictor
    ### features. Keep this the same for the mini-project, but you'll
    ### have a different feature list when you do the final project.
    features_list = ["poi", "salary"]
    
    data = featureFormat(data_dict, features_list)
    labels, features = targetFeatureSplit(data)
    features_train,features_test,labels_train,labels_test=train_test_split(features,labels,test_size=0.3,random_state=42)
    from sklearn.tree import DecisionTreeClassifier
    dlf=DecisionTreeClassifier()
    dlf.fit(features_train ,labels_train)
    f=dlf.predict(features_test)
    print accuracy_score(f,labels_test)
    
    
    
    ### it's all yours from here forward! 
    

      

  • 相关阅读:
    03 java中的基本数据类型和运算符
    02 Eclipse安装
    01 HelloWorld
    express不是内部或外部命令
    win10 内存或系统资源不足,无法打开PPT
    win 10中解决“此文件在另外一个进程中运行”的问题
    后台查找密码暴力破解
    DVWA--全等级暴力破解(Burte Force)
    DVWA简单搭建
    破解版
  • 原文地址:https://www.cnblogs.com/fuhang/p/8512977.html
Copyright © 2011-2022 走看看