zoukankan      html  css  js  c++  java
  • tensorflow笔记3:CRF函数:tf.contrib.crf.crf_log_likelihood()

    在分析训练代码的时候,遇到了,tf.contrib.crf.crf_log_likelihood,这个函数,于是想简单理解下:

    函数的目的:使用crf 来计算损失,里面用到的优化方法是:最大似然估计

    使用方法:

    tf.contrib.crf.crf_log_likelihood(inputs, tag_indices, sequence_lengths, transition_params=None)
    See the guide: CRF (contrib)
    
    Computes the log-likelihood of tag sequences in a CRF.
    
    Args:
    inputs: A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.
    tag_indices: A [batch_size, max_seq_len] matrix of tag indices for which we compute the log-likelihood.
    sequence_lengths: A [batch_size] vector of true sequence lengths.
    transition_params: A [num_tags, num_tags] transition matrix, if available. Returns:
    log_likelihood: A scalar containing the log-likelihood of the given sequence of tag indices.
    transition_params: A [num_tags, num_tags] transition matrix. This is either provided by the caller or created in this function.

    函数讲解:

    1、tf.contrib.crf.crf_log_likelihood

    crf_log_likelihood(inputs,tag_indices,sequence_lengths,transition_params=None)

    在一个条件随机场里面计算标签序列的log-likelihood

    参数:

    inputs: 一个形状为[batch_size, max_seq_len, num_tags] 的tensor,一般使用BILSTM处理之后输出转换为他要求的形状作为CRF层的输入. 
    tag_indices: 一个形状为[batch_size, max_seq_len] 的矩阵,其实就是真实标签. 
    sequence_lengths: 一个形状为 [batch_size] 的向量,表示每个序列的长度. 
    transition_params: 形状为[num_tags, num_tags] 的转移矩阵

    返回:

    log_likelihood: 标量,log-likelihood 
    transition_params: 形状为[num_tags, num_tags] 的转移矩阵

    2、tf.contrib.crf.viterbi_decode

    viterbi_decode(score,transition_params) 
    通俗一点,作用就是返回最好的标签序列.这个函数只能够在测试时使用,在tensorflow外部解码

    参数:

    score: 一个形状为[seq_len, num_tags] matrix of unary potentials. 
    transition_params: 形状为[num_tags, num_tags] 的转移矩阵

    返回:

    viterbi: 一个形状为[seq_len] 显示了最高分的标签索引的列表. 
    viterbi_score: A float containing the score for the Viterbi sequence.

    3、tf.contrib.crf.crf_decode

    crf_decode(potentials,transition_params,sequence_length) 
    在tensorflow内解码

    参数:

    potentials: 一个形状为[batch_size, max_seq_len, num_tags] 的tensor, 
    transition_params: 一个形状为[num_tags, num_tags] 的转移矩阵 
    sequence_length: 一个形状为[batch_size] 的 ,表示batch中每个序列的长度

    返回:

    decode_tags:一个形状为[batch_size, max_seq_len] 的tensor,类型是tf.int32.表示最好的序列标记. 
    best_score: 有个形状为[batch_size] 的tensor, 包含每个序列解码标签的分数.

    转载来自知乎:

    如果你需要预测的是个序列,那么可以选择用crf_log_likelihood作为损失函数

    crf_log_likelihood(
    inputs,
    tag_indices,
    sequence_lengths,
    transition_params=None
    )

    输入:

    inputs:unary potentials,也就是每个标签的预测概率值,这个值根据实际情况选择计算方法,CNN,RNN...都可以

    tag_indices,这个就是真实的标签序列了

    sequence_lengths,这是一个样本真实的序列长度,因为为了对齐长度会做些padding,但是可以把真实的长度放到这个参数里

    transition_params,转移概率,可以没有,没有的话这个函数也会算出来

    输出:

    log_likelihood,

    transition_params,转移概率,如果输入没输,它就自己算个给返回

    作者:知乎用户
    链接:https://www.zhihu.com/question/57666556/answer/326803900
    来源:知乎
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

    官方的示例代码:如何使用crf来计算:

    # !/home/wcg/tools/local/anaconda3/bin/python                                                                                                                                                                                                                                 
    # coding=utf8
    import numpy as np
    import tensorflow as tf
    
    
    #data settings
    num_examples = 10
    num_words = 20
    num_features = 100 
    num_tags = 5 
    
    # 5 tags
    #x shape = [10,20,100]
    #random features.
    x = np.random.rand(num_examples,num_words,num_features).astype(np.float32)
    
    #y shape = [10,20]
    
    #Random tag indices representing the gold sequence.
    y = np.random.randint(num_tags,size = [num_examples,num_words]).astype(np.int32)
    
    # 序列的长度
    #sequence_lengths = [19,19,19,19,19,19,19,19,19,19]
    sequence_lengths = np.full(num_examples,num_words - 1,dtype=np.int32)
    
    
    #Train and evaluate the model.
    with tf.Graph().as_default():
        with tf.Session() as session:
             # Add the data to the TensorFlow gtaph.
             x_t = tf.constant(x) #观测序列
             y_t = tf.constant(y) # 标记序列
             sequence_lengths_t = tf.constant(sequence_lengths)
               
             # Compute unary scores from a linear layer.
             # weights shape = [100,5]
             weights = tf.get_variable("weights", [num_features, num_tags])
       
             # matricized_x_t shape = [200,100]
             matricized_x_t = tf.reshape(x_t, [-1, num_features])
    
             # compute                           [200,100]      [100,5]   get [200,5]
             # 计算结果
             matricized_unary_scores = tf.matmul(matricized_x_t, weights)
                
             #  unary_scores shape = [10,20,5]                  [10,20,5] 
             unary_scores = tf.reshape(matricized_unary_scores, [num_examples, num_words, num_tags])
             # compute the log-likelihood of the gold sequences and keep the transition
             # params for inference at test time.
             #                                                shape      shape   [10,20,5]   [10,20]   [10]
             log_likelihood,transition_params = tf.contrib.crf.crf_log_likelihood(unary_scores,y_t,sequence_lengths_t)
    
             viterbi_sequence, viterbi_score = tf.contrib.crf.crf_decode(unary_scores, transition_params, sequence_lengths_t) 
             # add a training op to tune the parameters.
             loss = tf.reduce_mean(-log_likelihood)
       
             # 定义梯度下降算法的优化器
             #learning_rate 0.01
             train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
               
             #train for a fixed number of iterations.
             session.run(tf.global_variables_initializer())
         
             ''' 
            #eg:
            In [61]: m_20
            Out[61]: array([[ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])
    
            In [62]: n_20
            Out[62]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
             
            In [59]: n_20<m_20
            Out[59]: array([[ True,  True,  True,  True,  True,  True,  True,  True,  True, True]], dtype=bool)
    
             '''
             #这里用mask过滤掉不符合的结果
             mask = (np.expand_dims(np.arange(num_words), axis=0) < np.expand_dims(sequence_lengths, axis=1))
             
             ###mask = array([[ True,  True,  True,  True,  True,  True,  True,  True,  True, True]], dtype=bool)
             #序列的长度
             total_labels = np.sum(sequence_lengths)
             
             print ("mask:",mask)
    
             print ("total_labels:",total_labels)
             for i in range(1000):
                 #tf_unary_scores,tf_transition_params,_ = session.run([unary_scores,transition_params,train_op])
                 tf_viterbi_sequence,_=session.run([viterbi_sequence,train_op])
                 if i%100 == 0:
                    '''
                    false*false = false  false*true= false ture*true = true
                    '''
                    #序列中预测对的个数
                    correct_labels = np.sum((y==tf_viterbi_sequence) *  mask) 
                    accuracy = 100.0*correct_labels/float(total_labels)
                    print ("Accuracy: %.2f%%" %accuracy)

  • 相关阅读:
    2-SAT
    CDQ分治
    整体二分
    未完成
    [BZOJ1857][SCOI2010]传送带-[三分]
    [LCT应用]
    [胡泽聪 趣题选讲]大包子环绕宝藏-[状压dp]
    [清华集训2015 Day2]矩阵变换-[稳定婚姻模型]
    [清华集训2015 Day1]主旋律-[状压dp+容斥]
    [清华集训2015 Day1]玛里苟斯-[线性基]
  • 原文地址:https://www.cnblogs.com/lovychen/p/8490397.html
Copyright © 2011-2022 走看看