zoukankan      html  css  js  c++  java
  • Comparing randomized search and grid search for hyperparameter estimation

    Comparing randomized search and grid search for hyperparameter estimation

    Compare randomized search and grid search for optimizing hyperparameters of a random forest. All parameters that influence the learning are searched simultaneously (except for the number of estimators, which poses a time / quality tradeoff).

    The randomized search and the grid search explore exactly the same space of parameters. The result in parameter settings is quite similar, while the run time for randomized search is drastically lower.

    The performance is slightly worse for the randomized search, though this is most likely a noise effect and would not carry over to a held-out test set.

    Note that in practice, one would not search over this many different parameters simultaneously using grid search, but pick only the ones deemed most important.

    Python source code: randomized_search.py

    print(__doc__)
    
    import numpy as np
    
    from time import time
    from operator import itemgetter
    from scipy.stats import randint as sp_randint
    
    from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
    from sklearn.datasets import load_digits
    from sklearn.ensemble import RandomForestClassifier
    
    # get some data
    iris = load_digits()
    X, y = iris.data, iris.target
    
    # build a classifier
    clf = RandomForestClassifier(n_estimators=20)
    
    
    # Utility function to report best scores
    def report(grid_scores, n_top=3):
        top_scores = sorted(grid_scores, key=itemgetter(1), reverse=True)[:n_top]
        for i, score in enumerate(top_scores):
            print("Model with rank: {0}".format(i + 1))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                  score.mean_validation_score,
                  np.std(score.cv_validation_scores)))
            print("Parameters: {0}".format(score.parameters))
            print("")
    
    
    # specify parameters and distributions to sample from
    param_dist = {"max_depth": [3, None],
                  "max_features": sp_randint(1, 11),
                  "min_samples_split": sp_randint(1, 11),
                  "min_samples_leaf": sp_randint(1, 11),
                  "bootstrap": [True, False],
                  "criterion": ["gini", "entropy"]}
    
    # run randomized search
    n_iter_search = 20
    random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
                                       n_iter=n_iter_search)
    
    start = time()
    random_search.fit(X, y)
    print("RandomizedSearchCV took %.2f seconds for %d candidates"
          " parameter settings." % ((time() - start), n_iter_search))
    report(random_search.grid_scores_)
    
    # use a full grid over all parameters
    param_grid = {"max_depth": [3, None],
                  "max_features": [1, 3, 10],
                  "min_samples_split": [1, 3, 10],
                  "min_samples_leaf": [1, 3, 10],
                  "bootstrap": [True, False],
                  "criterion": ["gini", "entropy"]}
    
    # run grid search
    grid_search = GridSearchCV(clf, param_grid=param_grid)
    start = time()
    grid_search.fit(X, y)
    
    print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
          % (time() - start, len(grid_search.grid_scores_)))
    report(grid_search.grid_scores_)
  • 相关阅读:
    抛开BlazeDS,自定义flex RPC
    设计模式学习03装饰器模式
    通过ANT生成MANIFEST.MF中的ClassPath属性
    Spring JDBCTemplate与Hiberante混用
    关于 两个 datetime 列的差别导致了运行时溢出
    在Wcf中使用Nhibernate (续)
    sql2005/sql2008 分页
    工行支付api查询asp.net C# 实现
    生成静态页面的vbscript
    Asp.net Mvc Post ID Bug
  • 原文地址:https://www.cnblogs.com/yymn/p/4598416.html
Copyright © 2011-2022 走看看