zoukankan      html  css  js  c++  java
  • Numerical Computation

    Overflow and Underflow

    Underflow occurs when numbers near zero are rounded to zero.

    Overflow occurs when numbers with large magnitude are approximated as (infty) or (-infty).

    Poor Condition

    Conditioning refers to how rapidly a function changes with respect to small changes in its inputs. When (oldsymbol{A}inmathbb{R}^{n imes n}) has an eigenvalue decomposition, its condition number is

    [max_{i,j}Big|frac{lambda_i}{lambda_j}Big| ]

    Gradient-Based Optimization

    Most deep learning algorithms involve optimization of some sort. Optimization refers to the task of either minimizing or maximizing some function f((oldsymbol{x})) by altering (oldsymbol{x}). The function we want to minimize or maximize is called the objective function or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function. We often denote the value that minimizes or maximizes a function with a superscript ∗. For example, we might say (oldsymbol{x}^*) = arg min (f(oldsymbol{x})).

    Suppose we have a function (y = f(x)), where both x and y are real numbers. The derivative of this function is denoted as (f^{'}(x)) or as (frac{dy}{dx}). When (f^{'}(x)=0), the derivative provides no information about which direction to move. Points where (f^{'}(x)=0) are known as critical points or stationary points. The following figure gives examples of each type of critical point.

    For functions with multiple inputs, we must make use of the concept of partial derivatives, the gradient of (f) is the vector containing all of the partial derivatives, denoted ( abla_oldsymbol{_x}f(oldsymbol{x})).

    The directional derivative in direction (oldsymbol{u}) (a unit vector) is the slope of the function (f) in direction (u). In other words, the directional derivative is the derivative of the function (f(oldsymbol{x}+alphaoldsymbol{u})) with respect to (alpha), evaluated at (alpha=0). Using the chain rule, we can see that (frac{delta}{deltaalpha}f(oldsymbol{x}+alphaoldsymbol{u})) evaluates to (oldsymbol{u}^T abla_oldsymbol{x}f(oldsymbol{x})) when (alpha=0). To minimize f, we would like to find the direction in which (f) decreases the
    fastest. We can do this using the directional derivative:

    [min_{oldsymbol{u},oldsymbol{u}^Toldsymbol{u}=1}oldsymbol{u}^T abla_oldsymbol{x}f(oldsymbol{x})\ =min_{oldsymbol{u},oldsymbol{u}^Toldsymbol{u}=1}||oldsymbol{u}||_2|| abla_oldsymbol{x}f(oldsymbol{x})||_2cos heta ]

    This simplifies to (min_oldsymbol{u}cos heta). = 1 , this is minimized when (oldsymbol{u}) points in the opposite direction as the gradient. Then steepest descent proposes a new point

    [oldsymbol{x}^{'}=oldsymbol{x}-epsilon abla_oldsymbol{x}f(oldsymbol{x}) ]

    where (epsilon) is the learning rate.

  • 相关阅读:
    jquery遍历节点 children(),next(),prev(),siblings()closest() 等一些常用方法...
    jq 分页
    java8 array、list操作 汇【4】)- Java8 Lambda表达式 函数式编程【思想】
    java8 array、list操作 汇【2】)- (Function,Consumer,Predicate,Supplier)应用
    Java8 (Function,Consumer,Predicate,Supplier)详解
    Elasticsearch
    org.apache.commons.lang3.Validate
    freemarker逻辑判断写法#if
    11 Sping框架--AOP的相关概念及其应用
    10 Spring框架--基于注解的IOC配置
  • 原文地址:https://www.cnblogs.com/wang-haoran/p/13275890.html
Copyright © 2011-2022 走看看