zoukankan      html  css  js  c++  java
  • sk-learning(2)

    sk-learning 学习(2)

    sklearing 训练评估

    针对kdd99数据集使用逻辑回归分类训练 然后进行评估 发觉分数有点高的离谱 取出10%数据494021条,并从中选择四分之一作为测试集 结果这么高 是否过拟合了?

    import numpy as np
    from sklearn import linear_model
    from sklearn.externals import joblib
    from sklearn import cross_validation
    print("data loading ....")
    data=np.loadtxt("newfile.csv",delimiter=",",dtype=np.int32)
    print("load done....")
    
    X=data[:,:-1]
    target=data[:,-1]
    
    X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,target,test_size=0.25,random_state=1)
    
    print("begin fit the model....")
    clf=linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None)
    score=clf.fit(X_train,y_train).score(X_test,y_test)
    
    print("the model have train success, we will save the model to file...")
    #s=pickle.dumps(clf)
    joblib.dump(clf, 'model.pkl')
    #score 
    print(score)
    
    # result output....
    data loading ....
    load done....
    begin fit the model....
    dd
    the model have train success, we will save the model to file...
    0.997449516623
    
    

    十则交叉验证

    >>> from sklearn import cross_validation
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> kf = cross_validation.KFold(4, n_folds=2)
    >>> len(kf)
    2
    >>> print(kf)  
    sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
                                   random_state=None)
    >>> for train_index, test_index in kf:
    ...    print("TRAIN:", train_index, "TEST:", test_index)
    ...    X_train, X_test = X[train_index], X[test_index]
    ...    y_train, y_test = y[train_index], y[test_index]
    TRAIN: [2 3] TEST: [0 1]
    TRAIN: [0 1] TEST: [2 3]
    .. automethod:: __init__
    
  • 相关阅读:
    如何将一个PDF文件里的图片批量导出
    (二十二)golang--时间和日期相关函数
    (二十一)golang--字符串中的函数
    (二十)golang--变量的作用域
    (十九)golang--函数参数的传递方式
    (十八)golang--defer关键字
    (十七)golang--闭包(简单明了)
    (十六)golang--匿名函数
    (十五)golang--init函数
    【自然语言处理】双语数据预处理
  • 原文地址:https://www.cnblogs.com/thinkml/p/4170389.html
Copyright © 2011-2022 走看看