zoukankan      html  css  js  c++  java
  • Stochastic Gradient Descent (SGD)

    Initialize the weights.
    For each image, use these weights to predict whether it appears to be a 3 or a 7.
    Based on these predictions, calculate how good the model is (its loss).
    Calculate the gradient, which measures for each weight, how changing that weight would change the loss
    Step (that is, change) all the weights based on that calculation.
    Go back to the step 2, and repeat the process.
    Iterate until you decide to stop the training process (for instance, because the model is good enough or you don't want to wait any longer).

    init->predict->loss->gradient->step->stop
    step->predict[label=repeat]

    Initialize:: We initialize the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initializing them to the percentage of times that pixel is activated for that category—but since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well.
    Loss:: This is what Samuel referred to when he spoke of testing the effectiveness of any current weight assignment in terms of actual performance. We need some function that will return a number that is small if the performance of the model is good (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention).
    Step:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it: increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount that works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out in which direction, and by roughly how much, to change each weight, without having to try all these small changes. The way to do this is by calculating gradients. This is just a performance optimization, we would get exactly the same results by using the slower manual process as well.
    Stop:: Once we've decided how many epochs to train the model for (a few suggestions for this were given in the earlier list), we apply that decision. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time.

  • 相关阅读:
    IOS开发博客学习网址
    xmpp学习xmpp概述
    java数据库编程之高级查询
    html基础知识笔记
    深入c#编程
    c#入门基础笔记
    java数据库编程之数据库的设计
    小组会谈(2019.3.14)
    软件工程小组问世第四章之需求规格说明书青铜篇
    小组会谈(2019.03.29)
  • 原文地址:https://www.cnblogs.com/songyuejie/p/14859390.html
Copyright © 2011-2022 走看看