zoukankan      html  css  js  c++  java
  • UDA机器学习基础—交叉验证

    交叉验证的目的是为了有在训练集中有更多的数据点,以获得最佳的学习效果,同时也希望有跟多的测试集数据来获得最佳验证。交叉验证的要点是将训练数据平分到k个容器中,在k折交叉验证中,将运行k次单独的试验,每一次试验中,你将挑选k个训练集中的一个作为验证集,剩下k-1个作为训练集,训练你的模型,用测试集测试你的模型。这样运行k次,有十个不同的测试集,将十个测试集的表现平均,就是将这k次试验结果取平均。这样你就差不多用了全部数据去训练,也用全部数据去测试。

    #!/usr/bin/python
    
    
    """
        Starter code for the validation mini-project.
        The first step toward building your POI identifier!
    
        Start by loading/formatting the data
    
        After that, it's not our code anymore--it's yours!
    """
    
    import pickle
    import sys
    sys.path.append("../tools/")
    from feature_format import featureFormat, targetFeatureSplit
    from sklearn.metrics import accuracy_score
    from sklearn.cross_validation import train_test_split
    data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )
    
    ### first element is our labels, any added elements are predictor
    ### features. Keep this the same for the mini-project, but you'll
    ### have a different feature list when you do the final project.
    features_list = ["poi", "salary"]
    
    data = featureFormat(data_dict, features_list)
    labels, features = targetFeatureSplit(data)
    features_train,features_test,labels_train,labels_test=train_test_split(features,labels,test_size=0.3,random_state=42)
    from sklearn.tree import DecisionTreeClassifier
    dlf=DecisionTreeClassifier()
    dlf.fit(features_train ,labels_train)
    f=dlf.predict(features_test)
    print accuracy_score(f,labels_test)
    
    
    
    ### it's all yours from here forward! 
    

      

  • 相关阅读:
    png格式的img元素直接设置背景色、border-radius等属性,不需再包裹div造成冗余
    :before伪元素的灵活用法——前置元素的装饰
    linear-gradient在实战项目中的灵活运用——position和size的深入理解
    算法之单向链表
    awk(二)
    awk(一)
    grep与正则表达式
    编程原理
    Shell-bash的基本特性
    DNS域名轮循业务监控
  • 原文地址:https://www.cnblogs.com/fuhang/p/8512977.html
Copyright © 2011-2022 走看看