zoukankan      html  css  js  c++  java
  • sk-learning(2)

    sk-learning 学习(2)

    sklearing 训练评估

    针对kdd99数据集使用逻辑回归分类训练 然后进行评估 发觉分数有点高的离谱 取出10%数据494021条,并从中选择四分之一作为测试集 结果这么高 是否过拟合了?

    import numpy as np
    from sklearn import linear_model
    from sklearn.externals import joblib
    from sklearn import cross_validation
    print("data loading ....")
    data=np.loadtxt("newfile.csv",delimiter=",",dtype=np.int32)
    print("load done....")
    
    X=data[:,:-1]
    target=data[:,-1]
    
    X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,target,test_size=0.25,random_state=1)
    
    print("begin fit the model....")
    clf=linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None)
    score=clf.fit(X_train,y_train).score(X_test,y_test)
    
    print("the model have train success, we will save the model to file...")
    #s=pickle.dumps(clf)
    joblib.dump(clf, 'model.pkl')
    #score 
    print(score)
    
    # result output....
    data loading ....
    load done....
    begin fit the model....
    dd
    the model have train success, we will save the model to file...
    0.997449516623
    
    

    十则交叉验证

    >>> from sklearn import cross_validation
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> kf = cross_validation.KFold(4, n_folds=2)
    >>> len(kf)
    2
    >>> print(kf)  
    sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
                                   random_state=None)
    >>> for train_index, test_index in kf:
    ...    print("TRAIN:", train_index, "TEST:", test_index)
    ...    X_train, X_test = X[train_index], X[test_index]
    ...    y_train, y_test = y[train_index], y[test_index]
    TRAIN: [2 3] TEST: [0 1]
    TRAIN: [0 1] TEST: [2 3]
    .. automethod:: __init__
    
  • 相关阅读:
    SCCM2012 R2实战系列之四:初始化配置
    SCCM 2012 R2实战系列之一:SQL安装
    hdu 1242(bfs)
    hdu 1728(bfs)
    hdu 1253(bfs)
    hdu 3661
    hdu 1072(bfs)
    AC模版
    hdu 1010(dfs)
    poj 3628(01_page, dfs)
  • 原文地址:https://www.cnblogs.com/thinkml/p/4170389.html
Copyright © 2011-2022 走看看