zoukankan      html  css  js  c++  java
  • Stochastic Gradient Descent (SGD)

    Initialize the weights.
    For each image, use these weights to predict whether it appears to be a 3 or a 7.
    Based on these predictions, calculate how good the model is (its loss).
    Calculate the gradient, which measures for each weight, how changing that weight would change the loss
    Step (that is, change) all the weights based on that calculation.
    Go back to the step 2, and repeat the process.
    Iterate until you decide to stop the training process (for instance, because the model is good enough or you don't want to wait any longer).

    init->predict->loss->gradient->step->stop
    step->predict[label=repeat]

    Initialize:: We initialize the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initializing them to the percentage of times that pixel is activated for that category—but since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well.
    Loss:: This is what Samuel referred to when he spoke of testing the effectiveness of any current weight assignment in terms of actual performance. We need some function that will return a number that is small if the performance of the model is good (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention).
    Step:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it: increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount that works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out in which direction, and by roughly how much, to change each weight, without having to try all these small changes. The way to do this is by calculating gradients. This is just a performance optimization, we would get exactly the same results by using the slower manual process as well.
    Stop:: Once we've decided how many epochs to train the model for (a few suggestions for this were given in the earlier list), we apply that decision. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time.

  • 相关阅读:
    Eclipse启动错误:A Java Runtime Environment(JRE) or Java Development Kit(JDK) must be available……
    thymeleaf 模板使用 提取公共页面
    According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by de
    Spring 自定义Bean 实例获取
    Spring HttpServletRequest对象的获取
    Your ApplicationContext is unlikely tostart due to a @ComponentScan of the defau
    IP 、127.0.0.1、localhost 三者区别
    阿里云 呼叫中心 开发须知
    阿里云 负载均衡 HTTP转HTTPS
    Spring Boot 获取Bean对象实体
  • 原文地址:https://www.cnblogs.com/songyuejie/p/14859390.html
Copyright © 2011-2022 走看看