zoukankan      html  css  js  c++  java
  • CS224n, lec 10, NMT & Seq2Seq Attn

    1990s-2010s: Statistical Machine Translation

    • Core idea: Learn a probabilistic model from data

    • best English sentence y, given French sentence x, by bayes rule.

    • is LM is translation model.

    • Do some statistics on aligned parralle corpus

    • Rules!

    NMT

    The first model is an end-to-end, combining an encoder-decoder language model, and use beam search to find the most suitable result, more fluent and less handcraft, but less interpretable and harder to control(add rules).

    Evaluation

    BLEU compares the machine-written translation to one or several human-written translation(s), and computes a similarity score based on: n-gram precision. Good buy Imperfect.

    ATTENTION!

    Seq2Seq models are good, but they encode too much information in a single state, which appears to be the information bottleneck. We need some structual improvements to tackle this problem. Here we have attention.

    Namely speaking, attention is another layer of assigning weights to all the hidden states of the encoder by dot product them with the initial state of the decoder, and summing the hidden states by weights, and concatenate the result with the initial state of the decoder, then take the result as the input of the decoder.

    Formally we have

    There a a variety of pros of the attention mechanism

    1. solves the information bottleneck problem

    2. helps with vanishing gradient problem: add highway-like gradient path

    3. provides some interpretablity, remembering image captioning? Somehow let us know what do the neurons care about. Another example is that we can get free alignment of parallel corpus! Very cool!

      Next time: more attention! I am super excited!

  • 相关阅读:
    ajax异步更新的理解
    Java 中的匿名内部类
    Java 中的方法内部类
    Java 中的静态内部类
    Java 中的成员内部类
    Java 中的 static 使用之静态方法(转)
    构造方法
    成员变量与局部变量的区别
    script标签属性sync和defer
    jsonp原理
  • 原文地址:https://www.cnblogs.com/ichn/p/8909282.html
Copyright © 2011-2022 走看看