zoukankan      html  css  js  c++  java
  • CH08 Advanced Sequence Modeling for Natural Language Processing

    CH08 Advanced Sequence Modeling for Natural Language Processing

    摘要:这部分主要介绍了seq2seq中的attention机制,之前对attention机制一知半解。

    The mapping between the output and the input is called an alignment.

    Capturing More from a Sequence: Bidirectional Recurrent Models

    When modeling a sequence, it is useful to observe not just the words in the past but also the words that appear in the future. Consider the following sentence:

    The man who hunts ducks out on the weekends

    If the model were to observe only from left to right, its representation for “ducks” would be different from that of a model that had also observed the words from right to left.

    Taken together, information from the past and the future will be able to robustly represent the meaning of a word in a sequence.

    Capturing More from a Sequence: Attention

    • One problem is the S2S model encodes the input inyo one single vector and use it to generate the output. Although this might work with very short sentences, for long sentences such models fail to capture the information in the entire input.
    • Another problem with long inputs is that the gradients vanish when back-propagating through time, making the training difficult.

    Attention in Deep Neural Networks

    Recall that in a typical S2S model, each time step produces a hidden state representation, denoted as ϕ , specific to that time step in the encoder. To incorporate attention, we consider not only the final hidden state of the encoder, but also the hidden states for each of the intermediate steps.

    These encoder hidden states are, somewhat uninformatively, called values (or in some situations, keys). Attention also depends on the previous hidden state of the decoder, called the query.

    Attention is represented by a vector with the same dimension as the number of values it is attending to. This is called the attention vector, or attention weights, or sometimes alignment.

    The attention weights are combined with the encoder states (“values”) to generate a context vector that’s sometimes also known as a glimpse.

    computational graph

    software attention : The attention weights are typically floating-point values between 0 and 1.

    hard attention : It is possible to learn a binary 0/1 vector for attention. This is called hard attention.

    global attention : The attention mechanism illustrated in depends on the encoder states for all the time steps in the input.

    local attention : You could devise an attention mechanism that depended only on a window of the input around the current time step.

  • 相关阅读:
    字符串混淆技术应用 设计一个字符串混淆程序 可混淆.NET程序集中的字符串
    运用Mono.Ceci类库修改.NET程序集 走上破解软件的道路
    字符串反混淆实战 Dotfuscator 4.9 字符串加密技术应对策略
    数学:《初等书论》素数与合数
    数学:《线性代数》矩阵乘积 之 代码实现
    数学:《线性代数》方阵求逆 之 代码实现
    数学:《线性代数》矩阵运算
    数学:《线性代数》矩阵初等行变换及其应用(线性方程求解)
    数学:《概率》条件概率公式
    Entityframework:启用延时加载的主意事项(只为强化记忆)
  • 原文地址:https://www.cnblogs.com/curtisxiao/p/10711545.html
Copyright © 2011-2022 走看看