zoukankan html css js c++ java

day06-决策树及随机森林



# coding=utf-8
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

def dectree():
    """
    决策树
    :return:
    """

    # 水仙花数
    iris = load_iris()

    # 分割数据
    x_train,x_test,y_train,y_test = train_test_split(iris.data,iris.target,test_size=0.25)

    # 决策树
    dec = DecisionTreeClassifier()

    dec.fit(x_train,y_train)

    print(x_train)

    print("决策树预测的准确率为：",dec.score(x_test,y_test))



    # 随机森林
    rf = RandomForestClassifier()

    # 网格验证找出最好的模型
    gc = GridSearchCV(rf,param_grid={"n_estimators":[1,3,5,7,9],"max_depth":[3,5,7,9,15]},cv=2)

    # 训练
    gc.fit(x_train,y_train)

    # 结果
    print("经过网格验证选择后的随机森林预测准确率为：",gc.score(x_test,y_test))
    print("最佳的随机森林参数为：",gc.best_params_)

    return None

if __name__ == '__main__':
    dectree()

结果为：


决策树预测的准确率为： 0.9473684210526315
经过网格验证选择后的随机森林预测准确率为： 0.9736842105263158
最佳的随机森林参数为： {'max_depth': 3, 'n_estimators': 3}

决策树中使用算法来计算各个特征值的信息熵，判断哪个特征值对结果的影响最大，将这个特征值移到树节点的前边，然后依次将特征值作为节点组成决策树。
而随机森林则是多个决策树，每个决策树有不同的结果，哪个结果多则预测的结果为那个。
随机森林优点比较明显，可以用于大数据量的预测，预测准确率高，且不需要进行数据降维，唯一不好的地方则是参数不好调，即树的数量以及树的深度。

查看全文

相关阅读:
MFC——9.多线程与线程同步
 hdu 1598 find the most comfortable road（并查集+枚举）
POJ3107Godfather[树形DP 树的重心]
Codeforces 410C.Team[构造]
Codeforces 715A. Plus and Square Root[数学构造]
BZOJ1015[JSOI2008]星球大战starwar[并查集]
洛谷U4727小L的二叉树[树转序列 LIS]
Codeforces 500B. New Year Permutation[连通性]
Codeforces 549D. Hear Features[贪心英语]
Codeforces 549C. The Game Of Parity[博弈论]

原文地址：https://www.cnblogs.com/wuren-best/p/14278845.html