one of the variables needed for gradient computation has been modified by an inplace operation - 走看看

zoukankan html css js c++ java

one of the variables needed for gradient computation has been modified by an inplace operation
记录一个pytorch多卡训练遇到的bug
报错如下：
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 30; expected version 29 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
```
这个是多卡训练时候遇到的，单卡是一切正常的

先按网上的提示，在报错的代码前加上with torch.autograd.set_detect_anomaly(True):语句，之后它会把挂掉时候的栈显示出来，我的打出来是在batchNorm那里出的问题

搜索得到一个方案：https://discuss.pytorch.org/t/ddp-sync-batch-norm-gradient-computation-modified/82847/5

解决方法就是在DDP那里加上一个broadcast_buffers=False参数
查看全文

相关阅读:
springmvc的文件上传和JWT图形验证码
 POJ 2932 Coneology计算最外层圆个数
 POJ1127 Jack Straws
求逆序对
 P3809 【模板】后缀排序
 匈牙利算法
 POJ2976 Dropping tests
字符串哈希
 zkw费用流
 最大流Dinic算法

原文地址：https://www.cnblogs.com/jiading/p/14842397.html

Copyright © 2011-2022 走看看