# zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step()
optimizer.zero_grad()意思是把梯度置零,也就是把loss关于weight的导数变成0.