zoukankan      html  css  js  c++  java
  • 记用tensorflow-ranking时的bugs

    tensorflow-ranking bugs

    1 在metric函数中给全局变量赋值

    报错

    TypeError: An op outside of the function building code is being passed
    a "Graph" tensor. It is possible to have Graph tensors
    leak out of the function building context by including a
    tf.init_scope in your function building code.
    For example, the following function will fail:
      @tf.function
      def has_init_scope():
        my_constant = tf.constant(1.)
        with tf.init_scope():
          added = my_constant * 2
    The graph tensor has name: add:0
    

    报错代码

    top_one_time = 0
    
    def top_one_accuracy(y_true, y_pred):
        max_idx_gt = tf.argsort(y_true)[:, -1]
        max_idx_pred = tf.argsort(y_pred)[:, -1]
    
        judge = tf.equal(max_idx_gt, max_idx_pred)
        num_true = tf.reduce_sum(tf.cast(judge, tf.int32))
    
        global top_one_time
        top_one_time += num_true
    
        return top_one_time
    

    场景

    在metric函数中给全局变量赋值

    排查步骤

    1. 通过控制变量法定位到此条语句

    2. 初步判定为tensorflow框架错误,Google,原因可能是在 init_scope 外进行了某变量的初始化,又在 init_scope 内使用了。

    3. 有解决方案为加下列语句禁用 tf 的 eager模式

      tf.compat.v1.disable_eager_execution()
      

    尝试后出现新报错

    报错:

    tensorflow.python.framework.errors_impl.FailedPreconditionError: 3 root error(s) found.
      (0) Failed precondition: Error while reading resource variable metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/total from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/total/N10tensorflow3VarE does not exist.
    	 [[{{node metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/value/ReadVariableOp}}]]
    	 [[gt/Squeeze/_283]]
      (1) Failed precondition: Error while reading resource variable metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/total from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/total/N10tensorflow3VarE does not exist.
    	 [[{{node metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/value/ReadVariableOp}}]]
    	 [[loss/gt_loss/pairwise_logistic_loss/weighted_loss/num_present/broadcast_weights/assert_broadcastable/is_valid_shape/else/_291/has_valid_nonscalar_shape/then/_1005/has_invalid_dims/ExpandDims_1/_371]]
      (2) Failed precondition: Error while reading resource variable metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/total from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/total/N10tensorflow3VarE does not exist.
    	 [[{{node metrics/gt_mean_reciprocal_rank/mean_reciprocal_rank/mean/value/ReadVariableOp}}]]
    0 successful operations.
    0 derived errors ignored.
    

    搜索后解决方案为:

    from tensorflow.python.keras.backend import set_session
    from tensorflow.python.keras.models import load_model
    
    tf_config = some_custom_config
    sess = tf.Session(config=tf_config)
    graph = tf.get_default_graph()
    
    # IMPORTANT: models have to be loaded AFTER SETTING THE SESSION for keras! 
    # Otherwise, their weights will be unavailable in the threads after the session there has been set
    set_session(sess)
    model = load_model(...)
    
    # and then in each request (i.e. in each thread):
    global sess
    global graph
    with graph.as_default():
        set_session(sess)
        model.predict(...)
    

    尝试后发现无效

    1. 于是回到最初版本寻找问题切入点,联想原因进行尝试,将代码改为
    def top_one_accuracy(y_true, y_pred):
        max_idx_gt = tf.argsort(y_true)[:, -1]
        max_idx_pred = tf.argsort(y_pred)[:, -1]
    
        judge = tf.equal(max_idx_gt, max_idx_pred)
        num_true = tf.reduce_sum(tf.cast(judge, tf.int32))
    
        return num_true
    

    错误解决

    1. 额外探索,将代码改为
    top_one_time = 0
    
    def top_one_accuracy(y_true, y_pred):
        max_idx_gt = tf.argsort(y_true)[:, -1]
        max_idx_pred = tf.argsort(y_pred)[:, -1]
    
        judge = tf.equal(max_idx_gt, max_idx_pred)
        num_true = tf.reduce_sum(tf.cast(judge, tf.int32))
    
        global top_one_time
        top_one_time += num_true
    
        return num_true
    

    依然报错

    2 直接使用tensorflow-ranking.metrics中的函数当作metric函数

    报错

    ValueError: tf.function-decorated function tried to create variables on non-first call.
    

    报错代码

    model.compile(metrics=[tfr.metrics.normalized_discounted_cumulative_gain, tfr.metrics.mean_reciprocal_rank])
    

    场景

    直接使用 tensorflow-ranking.metrics 的函数作 metric

    排查步骤

    1. 通过控制变量法定位到此条语句

    2. 初步判定为tensorflow框架错误,Google,原因可能是未正确使用 @tf.function 修饰器,但我并未使用它。

    3. 于是开始阅读 tf-ranking源码

    源码:

    def normalized_discounted_cumulative_gain(
        labels,
        predictions,
        weights=None,
        topn=None,
        name=None,
        gain_fn=_DEFAULT_GAIN_FN,
        rank_discount_fn=_DEFAULT_RANK_DISCOUNT_FN):
      """Computes normalized discounted cumulative gain (NDCG).
    
      Args:
        labels: A `Tensor` of the same shape as `predictions`.
        predictions: A `Tensor` with shape [batch_size, list_size]. Each value is
          the ranking score of the corresponding example.
        weights: A `Tensor` of the same shape of predictions or [batch_size, 1]. The
          former case is per-example and the latter case is per-list.
        topn: A cutoff for how many examples to consider for this metric.
        name: A string used as the name for this metric.
        gain_fn: (function) Transforms labels. Note that this implementation of
          NDCG assumes that this function is *increasing* as a function of its
          imput.
        rank_discount_fn: (function) The rank discount function. Note that this
          implementation of NDCG assumes that this function is *decreasing* as a
          function of its imput.
    
      Returns:
        A metric for the weighted normalized discounted cumulative gain of the
        batch.
      """
      metric = metrics_impl.NDCGMetric(name, topn, gain_fn, rank_discount_fn)
      with tf.compat.v1.name_scope(metric.name,
                                   'normalized_discounted_cumulative_gain',
                                   (labels, predictions, weights)):
        per_list_ndcg, per_list_weights = metric.compute(labels, predictions,
                                                         weights)
      return tf.compat.v1.metrics.mean(per_list_ndcg, per_list_weights)
    

    发现每次调用此函数都会生成一个 metrics_impl.NDCGMetric 对象,可能因此导致某些函数在非初始化时被运行,从而错误(原因)

    1. 于是自己写了一个函数代替。先初始化这个 metrics_impl.NDCGMetric 对象,然后每次调用函数时调用它的compute
    ndcg_topn = tfr.metrics.metrics_impl.NDCGMetric('ndcg_topn', app.transform_param_config.n)
    
    def metric_ndcg_topn(y_true, y_pred):
        return ndcg_topn.compute(y_true, y_pred, None)
    

    调用代码:

    model.compile(metrics=metric_ndcg_topn)
    

    错误解决

    1. 额外探索。下列代码依然报错,判断是 tf.compat.v1.metrics.mean 有问题
    ndcg_topn = tfr.metrics.metrics_impl.NDCGMetric('ndcg_topn', app.transform_param_config.n)
    
    def metric_ndcg_topn(y_true, y_pred):
        per_list_ndcg, per_list_weights = ndcg_topn.compute(y_true, y_pred, None)
        return tf.compat.v1.metrics.mean(per_list_ndcg, per_list_weights)
    
    1. 额外探索。下列代码不报错,但是输出不对
    ndcg_topn = tfr.metrics.metrics_impl.NDCGMetric('ndcg_topn', app.transform_param_config.n)
    mean = tf.keras.metrics.Mean()
    
    def metric_ndcg_topn(y_true, y_pred):
        per_list_ndcg, per_list_weights = ndcg_topn.compute(y_true, y_pred, None)
        return mean(per_list_ndcg, per_list_weights)
    
  • 相关阅读:
    基于稀疏表示学习的图像分类
    多个for循环嵌套会影响速度
    LP-KPN
    C++ const
    C++面向对象
    使用最新的“huihui中文语音库”实现文本转语音功能
    后缀crt证书转换
    server2012 配置SSL证书
    解决windows server2012 评估版本过期,系统会自动关机
    Win2008 r2 IIS7.5出现FastCGI进程最近常常失败。请过一会再尝试此请求的解决方法
  • 原文地址:https://www.cnblogs.com/GY8023/p/14110786.html
Copyright © 2011-2022 走看看