beam search

zoukankan html css js c++ java

beam search
Beam Search

greedy search

generate (or “decode”) the target sentence by taking argmax on each step of the decoder

problem with greedy search :
- Greedy decoding has no way to undo decisions!
  
  Input: il a m’entarté (he hit me with a pie)
  
  → he ____
  
  → he hit ____
  
  → he hit a ____ (whoops! no going back now…)
Exhaustive search decoding

Ideally we want to find a (length T) translation y that maximizes :

[egin{aligned} P(y | x) &=Pleft(y_{1} | x ight) Pleft(y_{2} | y_{1}, x ight) Pleft(y_{3} | y_{1}, y_{2}, x ight) ldots, Pleft(y_{T} | y_{1}, ldots, y_{T-1}, x ight) \ &=prod_{t=1}^{T} Pleft(y_{t} | y_{1}, ldots, y_{t-1}, x ight) end{aligned} ]
We could try computing all possible sequences y:
- This means that on each step t of the decoder, we’re tracking (V^t) possible partial translations, where (V) is vocab size
- This (O(V^t)) complexity is far too expensive!
beam search
- Core idea : On each step of decoder, keep track of the k most probable partial translations (which we call hypotheses), where (k) is the beam size (in practice around 5 to 10)
- A hypothesis (y_1,cdots,y_t) has a score which is its log probability:
  
  [operatorname{score}left(y_{1}, ldots, y_{t} ight)=log P_{mathrm{LM}}left(y_{1}, ldots, y_{t} | x ight)=sum_{i=1}^{t} log P_{mathrm{LM}}left(y_{i} | y_{1}, ldots, y_{i-1}, x ight) ]
  
  Scores are all negative, and higher score is better
  
  We search for high-scoring hypotheses, tracking top (k) on each step
- Beam search is not guaranteed to find optimal solution
- But much more efficient than exhaustive search!
Beam search decoding: stopping criterion
- In greedy decoding, usually we decode until the model produces a token
- In beam search decoding, different hypotheses may produce tokens on different timesteps
  
  When a hypothesis produces , that hypothesis is complete.
  
  Place it aside and continue exploring other hypotheses via beam search.
- Usually we continue beam search until:
  
  We reach timestep T (where T is some pre-defined cutoff), or
  
  We have at least n completed hypotheses (where n is pre-defined cutoff)
Beam search decoding: finishing up
- We have our list of completed hypotheses.
- How to select top one with highest score?
- Each hypothesis (y_1,cdots,y_t) on our list has a score
  [operatorname{score}left(y_{1}, ldots, y_{t} ight)=log P_{mathrm{LM}}left(y_{1}, ldots, y_{t} | x ight)=sum_{i=1}^{t} log P_{mathrm{LM}}left(y_{i} | y_{1}, ldots, y_{i-1}, x ight) ]
Problem with this: longer hypotheses have lower scores

Fix : Normalize by length. Use this to select top one instead:

[frac{1}{t} sum_{i=1}^{t} log P_{mathrm{LM}}left(y_{i} | y_{1}, ldots, y_{i-1}, x ight) ]
查看全文

相关阅读:
Spring MVC知识梳理
 Spring知识梳理
 combination sum(I, II, III, IV)
两个面试题
 LeetCode高频148错题记录
 Multi label 多标签分类问题（Pytorch，TensorFlow，Caffe）
Appearance-and-Relation Networks for Video Classification论文笔记 (ARTnet)
LeetCode-450 二叉搜索树删除一个节点
 长短时记忆网络LSTM和条件随机场crf
最大子数组（I, II, III，IV，V）和最大子数组乘积 (动态规划)

原文地址：https://www.cnblogs.com/curtisxiao/p/10828197.html

Beam Search

greedy search

Exhaustive search decoding

beam search

Beam search decoding: stopping criterion

Beam search decoding: finishing up