zoukankan      html  css  js  c++  java
  • Training Very Deep Networks

    Rupesh Kumar Srivastava
    Klaus Greff
     ̈
    J urgen
    Schmidhuber
    The Swiss AI Lab IDSIA / USI / SUPSI
    {rupesh, klaus, juergen}@idsia.ch

    Abstract
    Theoretical and empirical evidence indicates that the depth of neural networks
    is crucial for their success. However, training becomes more difficult as depth
    increases, and training of very deep networks remains an open problem. Here we
    introduce a new architecture designed to overcome this. Our so-called highway
    networks allow unimpeded information flow across many layers on information
    highways. They are inspired by Long Short-Term Memory recurrent networks and
    use adaptive gating units to regulate the information flow. Even with hundreds of
    layers, highway networks can be trained directly through simple gradient descent.
    This enables the study of extremely deep and efficient architectures.

    1
    Introduction & Previous Work
    Many recent empirical breakthroughs in supervised machine learning have been achieved through
    large and deep neural networks. Network depth (the number of successive computational layers) has
    played perhaps the most important role in these successes. For instance, within just a few years, the
    top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
    [1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
    practical machine learning problems have also underscored the superiority of deeper networks [6]
    in terms of accuracy and/or performance.
    In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
    This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
    parity problem can in principle be learned by a large feedforward net with n binary input units, 1
    output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
    net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
    recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
    Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

  • 相关阅读:
    启动VMware出现报错:The VMware Authorization Service is not running
    CentOS8安装SQLServer2019
    CentOS8安装Tomcat
    CentOS8安装java环境
    手把手0基础Centos下安装与部署paddleOcr 教程
    redis反向代理docker容器中的rabbit mq服务
    MQTT 4 ——MQTT的Spring Mvc 配置接收字节流数据
    MQTT 3 ——MQTT与Spring Mvc整合
    MQTT 2——服务端安装与客户端测试
    MQTT 1——物联网集成项目技术选型与说明
  • 原文地址:https://www.cnblogs.com/2008nmj/p/9119534.html
Copyright © 2011-2022 走看看