zoukankan      html  css  js  c++  java
  • tensorflow nan

    https://github.com/tensorflow/tensorflow/issues/3212

    NaNs usually indicate something wrong with your training. Perhaps your learning rate is too high, perhaps you have invalid data. Maybe you have an invalid operation like a divide by zero. Tensorflow refusing to write any NaNs is giving you a warning that something has gone wrong with your training.

    If you  still suspect there is an underlying bug, you need to provide us a reproducible test case (as small as possible), plus information about what environment (please see the issue submission template).

     

    https://stackoverflow.com/questions/33712178/tensorflow-nan-bug?newreg=c7e31a867765444280ba3ca50b657a07

    Actually, it turned out to be something stupid. I'm posting this in case anyone else would run into a similar error.

    cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
    

    is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.

    Replacing it with

    cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))

    https://stackoverflow.com/questions/33922937/why-does-tensorflow-return-nan-nan-instead-of-probabilities-from-a-csv-file

    Try throwing in a few of these.  Instead of this line:

    tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)
    

    Try:

    tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")
    tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")
    tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")
    matmul_result = tf.matmul(tf_in, tf_weight)
    matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")
    tf_softmax = tf.nn.softmax(matmul_result + tf_bias)
    

    to see what Tensorflow thinks the intermediate values are.  If the NaNs are showing up earlier in the pipeline, it should give you a better idea of where the problem lies.  Good luck!  If you get some data out of this, feel free to follow up and we'll see if we can get you further.

    Updated to add:  Here's a stripped-down debugging version to try, where I got rid of the input functions and just generated some random data:

    https://stackoverflow.com/questions/38810424/how-does-one-debug-nan-values-in-tensorflow

    There are a couple of reasons WHY you can get a NaN-result, often it is because of too high a learning rate but plenty other reasons are possible like for example corrupt data in your input-queue or a log of 0 calculation.

    Anyhow, debugging with a print as you describe cannot be done by a simple print (as this would result only in the printing of the tensor-information inside the graph and not print any actual values).

    However, if you use tf.print as an op in bulding the graph (tf.print) then when the graph gets executed you will get the actual values printed (and it IS a good exercise to watch these values to debug and understand the behavior of your net).

    However, you are using the print-statement not entirely in the correct manner. This is an op, so you need to pass it a tensor and request a result-tensor that you need to work with later on in the executing graph. Otherwise the op is not going to be executed and no printing occurs. Try this:

    Z = tf.sqrt(Delta_tilde)
    Z = tf.Print(Z,[Z], message="my Z-values:") # <-------- TF PRINT STATMENT
    Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
    Z = tf.pow(Z, 2.0)
  • 相关阅读:
    文件上传
    图片压缩
    Java Utils工具类大全
    一些常用的常量
    压缩文档相关的工具类
    提供些获取系统信息相关的工具方法
    提供些常用的字符串相关的工具方法
    流相关的操作方法封装
    封装一些正则相关的操作
    随机数
  • 原文地址:https://www.cnblogs.com/qianblue/p/6993328.html
Copyright © 2011-2022 走看看