机器学习之分类算法-决策树、随机森林（2.3）

zoukankan html css js c++ java

机器学习之分类算法-决策树、随机森林（2.3）
决策树的三种算法实现

当然决策树的原理不止信息增益这一种，还有其他方法。但是原理都类似，我们就不去举例计算。
- ID3
  
  信息增益最大的准则
- C4.5
  
  信息增益比最大的准则
- CART
  
  分类树: 基尼系数最小的准则在sklearn中可以选择划分的默认原则
  
  优势：划分更加细致（从后面例子的树显示来理解）
决策树API
- class sklearn.tree.DecisionTreeClassifier(criterion=’gini’, max_depth=None,random_state=None)
  
  决策树分类器
  
  criterion:默认是’gini’系数，也可以选择信息增益的熵’entropy’
  
  max_depth:树的深度大小
  
  random_state:随机数种子
- 其中会有些超参数：max_depth:树的深度大小
  
  其它超参数我们会结合随机森林分析
流程分析：
1）获取数据
2）数据处理
缺失值处理
特征值 -> 字典类型
3）准备好特征值目标值
4）划分数据集
5）特征工程：字典特征抽取
6）决策树预估器流程
7）模型评估
- 优点：
  
  简单的理解和解释，树木可视化。
- 缺点：
  
  决策树学习者可以创建不能很好地推广数据的过于复杂的树，这被称为过拟合。
- 改进：
  
  减枝cart算法(决策树API当中已经实现，随机森林参数调优有相关介绍)
  
  随机森林
随机森林
森林：包含多个决策树的分类器
原理过程
训练集：
N个样本
特征值目标值
M个特征
两个随机：
训练集随机 - N个样本中随机有放回的抽样N个
bootstrap 随机有放回抽样
[1, 2, 3, 4, 5]
新的树的训练集
[2, 2, 3, 1, 5]
特征随机 - 从M个特征中随机抽取m个特征
M >> m
降维
总结
能够有效地运行在大数据集上，
处理具有高维特征的输入样本，而且不需要降维

案例：
from sklearn.tree import DecisionTreeClassifier, export_graphviz from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV def randomforest(): estimator = RandomForestClassifier() # 1）获取数据集 iris = load_iris() # 2）划分数据集 x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=22) # 加入网格搜索与交叉验证 # 参数准备 param_dict = {"n_estimators": [120,200,300,500,800,1200], "max_depth": [5,8,15,25,30]} estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3) estimator.fit(x_train, y_train) # 5）模型评估 # 方法1：直接比对真实值和预测值 y_predict = estimator.predict(x_test) print("y_predict: ", y_predict) print("直接比对真实值和预测值: ", y_test == y_predict) # 方法2：计算准确率 score = estimator.score(x_test, y_test) print("准确率为： ", score) # 最佳参数：best_params_ print("最佳参数： ", estimator.best_params_) # 最佳结果：best_score_ print("最佳结果： ", estimator.best_score_) # 最佳估计器：best_estimator_ print("最佳估计器: ", estimator.best_estimator_) # 交叉验证结果：cv_results_ print("交叉验证结果: ", estimator.cv_results_)
　　
查看全文

相关阅读:
BZOJ 2876 骑行川藏
 BZOJ 2875 随机数生成器
 DT_修改注册项
 ip001
ip
阿里大鱼阿里云api
JS_全
 destoon框架二次开发【整理】
destoon_笔记
 栏目class导航

原文地址：https://www.cnblogs.com/sima-3/p/14813044.html

机器学习之分类算法-决策树、随机森林（2.3）

决策树的三种算法实现

决策树API