zoukankan      html  css  js  c++  java
  • tensorflow nan

    https://github.com/tensorflow/tensorflow/issues/3212

    NaNs usually indicate something wrong with your training. Perhaps your learning rate is too high, perhaps you have invalid data. Maybe you have an invalid operation like a divide by zero. Tensorflow refusing to write any NaNs is giving you a warning that something has gone wrong with your training.

    If you  still suspect there is an underlying bug, you need to provide us a reproducible test case (as small as possible), plus information about what environment (please see the issue submission template).

     

    https://stackoverflow.com/questions/33712178/tensorflow-nan-bug?newreg=c7e31a867765444280ba3ca50b657a07

    Actually, it turned out to be something stupid. I'm posting this in case anyone else would run into a similar error.

    cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
    

    is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.

    Replacing it with

    cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))

    https://stackoverflow.com/questions/33922937/why-does-tensorflow-return-nan-nan-instead-of-probabilities-from-a-csv-file

    Try throwing in a few of these.  Instead of this line:

    tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)
    

    Try:

    tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")
    tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")
    tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")
    matmul_result = tf.matmul(tf_in, tf_weight)
    matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")
    tf_softmax = tf.nn.softmax(matmul_result + tf_bias)
    

    to see what Tensorflow thinks the intermediate values are.  If the NaNs are showing up earlier in the pipeline, it should give you a better idea of where the problem lies.  Good luck!  If you get some data out of this, feel free to follow up and we'll see if we can get you further.

    Updated to add:  Here's a stripped-down debugging version to try, where I got rid of the input functions and just generated some random data:

    https://stackoverflow.com/questions/38810424/how-does-one-debug-nan-values-in-tensorflow

    There are a couple of reasons WHY you can get a NaN-result, often it is because of too high a learning rate but plenty other reasons are possible like for example corrupt data in your input-queue or a log of 0 calculation.

    Anyhow, debugging with a print as you describe cannot be done by a simple print (as this would result only in the printing of the tensor-information inside the graph and not print any actual values).

    However, if you use tf.print as an op in bulding the graph (tf.print) then when the graph gets executed you will get the actual values printed (and it IS a good exercise to watch these values to debug and understand the behavior of your net).

    However, you are using the print-statement not entirely in the correct manner. This is an op, so you need to pass it a tensor and request a result-tensor that you need to work with later on in the executing graph. Otherwise the op is not going to be executed and no printing occurs. Try this:

    Z = tf.sqrt(Delta_tilde)
    Z = tf.Print(Z,[Z], message="my Z-values:") # <-------- TF PRINT STATMENT
    Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
    Z = tf.pow(Z, 2.0)
  • 相关阅读:
    考研系列一-线性表类(顺序存储)
    因特网协议分层及它们的服务模型
    矩阵归零
    字符编码(续)---Unicode与ANSI字符串转换以及分辨字符编码形式
    奇妙的位运算
    一道面试题Lintcode196-Find the Missing Number
    错误处理
    px 和 em 的区别
    简述同步和异步的区别
    简述一下 src 与 href 的区别
  • 原文地址:https://www.cnblogs.com/qianblue/p/6993328.html
Copyright © 2011-2022 走看看