zoukankan      html  css  js  c++  java
  • 3.2. Grid Search: Searching for estimator parameters

    3.2. Grid Search: Searching for estimator parameters

    Parameters that are not directly learnt within estimators can be set by searching a parameter space for the best Cross-validation: evaluating estimator performance score. Typical examples include Ckernel and gamma for Support Vector Classifier, alpha for Lasso, etc.

    Any parameter provided when constructing an estimator may be optimized in this manner. Specifically, to find the names and current values for all parameters for a given estimator, use:

    estimator.get_params()
    

    Such parameters are often referred to as hyperparameters (particularly in Bayesian learning), distinguishing them from the parameters optimised in a machine learning procedure.

    A search consists of:

    • an estimator (regressor or classifier such as sklearn.svm.SVC());
    • a parameter space;
    • a method for searching or sampling candidates;
    • a cross-validation scheme; and
    • score function.

    Some models allow for specialized, efficient parameter search strategies, outlined below. Two generic approaches to sampling search candidates are provided in scikit-learn: for given values, GridSearchCV exhaustively considers all parameter combinations, while RandomizedSearchCV can sample a given number of candidates from a parameter space with a specified distribution. After describing these tools we detail best practice applicable to both approaches.

    3.2.2. Randomized Parameter Optimization

    While using a grid of parameter settings is currently the most widely used method for parameter optimization, other search methods have more favourable properties. RandomizedSearchCV implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values. This has two main benefits over an exhaustive search:

    • A budget can be chosen independent of the number of parameters and possible values.
    • Adding parameters that do not influence the performance does not decrease efficiency.

    Specifying how parameters should be sampled is done using a dictionary, very similar to specifying parameters forGridSearchCV. Additionally, a computation budget, being the number of sampled candidates or sampling iterations, is specified using the n_iter parameter. For each parameter, either a distribution over possible values or a list of discrete choices (which will be sampled uniformly) can be specified:

    [{'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1),
      'kernel': ['rbf'], 'class_weight':['auto', None]}]
    

    This example uses the scipy.stats module, which contains many useful distributions for sampling parameters, such as expon,gammauniform or randint. In principle, any function can be passed that provides a rvs (random variate sample) method to sample a value. A call to the rvs function should provide independent random samples from possible parameter values on consecutive calls.

    Warning

     

    The distributions in scipy.stats do not allow specifying a random state. Instead, they use the global numpy random state, that can be seeded via np.random.seed or set using np.random.set_state.

    For continuous parameters, such as C above, it is important to specify a continuous distribution to take full advantage of the randomization. This way, increasing n_iter will always lead to a finer search.

    Examples:

    References:

    • Bergstra, J. and Bengio, Y., Random search for hyper-parameter optimization, The Journal of Machine Learning Research (2012)
  • 相关阅读:
    meta_value与meta_value_num的区别(排序)
    B:Wordpress不同分类调用不同的模板
    C:Wordpress自定义文章类型(图视频)
    D:Wordpress_AFC插件常用代码
    A:手把手教Wordpress仿站(基础)
    JS之Form表单相关操作
    PHP常用代码汇总
    mysql_fetch_array,mysql_fetch_row,mysql_fetch_assoc区别
    数据库、数据库表的创建与删除
    Android实现智能提示的文本输入框AutoCompleteTextView
  • 原文地址:https://www.cnblogs.com/yymn/p/4598419.html
Copyright © 2011-2022 走看看