zoukankan      html  css  js  c++  java
  • 3.2. Grid Search: Searching for estimator parameters

    3.2. Grid Search: Searching for estimator parameters

    Parameters that are not directly learnt within estimators can be set by searching a parameter space for the best Cross-validation: evaluating estimator performance score. Typical examples include Ckernel and gamma for Support Vector Classifier, alpha for Lasso, etc.

    Any parameter provided when constructing an estimator may be optimized in this manner. Specifically, to find the names and current values for all parameters for a given estimator, use:

    estimator.get_params()
    

    Such parameters are often referred to as hyperparameters (particularly in Bayesian learning), distinguishing them from the parameters optimised in a machine learning procedure.

    A search consists of:

    • an estimator (regressor or classifier such as sklearn.svm.SVC());
    • a parameter space;
    • a method for searching or sampling candidates;
    • a cross-validation scheme; and
    • score function.

    Some models allow for specialized, efficient parameter search strategies, outlined below. Two generic approaches to sampling search candidates are provided in scikit-learn: for given values, GridSearchCV exhaustively considers all parameter combinations, while RandomizedSearchCV can sample a given number of candidates from a parameter space with a specified distribution. After describing these tools we detail best practice applicable to both approaches.

    3.2.2. Randomized Parameter Optimization

    While using a grid of parameter settings is currently the most widely used method for parameter optimization, other search methods have more favourable properties. RandomizedSearchCV implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values. This has two main benefits over an exhaustive search:

    • A budget can be chosen independent of the number of parameters and possible values.
    • Adding parameters that do not influence the performance does not decrease efficiency.

    Specifying how parameters should be sampled is done using a dictionary, very similar to specifying parameters forGridSearchCV. Additionally, a computation budget, being the number of sampled candidates or sampling iterations, is specified using the n_iter parameter. For each parameter, either a distribution over possible values or a list of discrete choices (which will be sampled uniformly) can be specified:

    [{'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1),
      'kernel': ['rbf'], 'class_weight':['auto', None]}]
    

    This example uses the scipy.stats module, which contains many useful distributions for sampling parameters, such as expon,gammauniform or randint. In principle, any function can be passed that provides a rvs (random variate sample) method to sample a value. A call to the rvs function should provide independent random samples from possible parameter values on consecutive calls.

    Warning

     

    The distributions in scipy.stats do not allow specifying a random state. Instead, they use the global numpy random state, that can be seeded via np.random.seed or set using np.random.set_state.

    For continuous parameters, such as C above, it is important to specify a continuous distribution to take full advantage of the randomization. This way, increasing n_iter will always lead to a finer search.

    Examples:

    References:

    • Bergstra, J. and Bengio, Y., Random search for hyper-parameter optimization, The Journal of Machine Learning Research (2012)
  • 相关阅读:
    python全栈闯关--16-匿名函数
    python全栈闯关--15-内置函数
    python全栈闯关--14-生成器进阶
    示例库
    MySQL的远程连接
    前后端传输编码方式
    后端接收前端时间参数
    控制器接参的空值问题
    MyBatis模糊查询的几种方式
    MySQL常用函数
  • 原文地址:https://www.cnblogs.com/yymn/p/4598419.html
Copyright © 2011-2022 走看看