zoukankan html css js c++ java

PyTorch【3】-Autograd

先做入门讲解，后面慢慢更新

本教程环境 pytorch 1.3以上

Variable

变量 variable 是对张量 tensor 的封装，所以它具有 tensor 的大部分属性方法；

variable 用来构建计算图；

variable 包括 data、grad、grad_fn 3 个属性；

　　// data 获取它的 Tensor 值，

　　// 梯度保存在 grad 中，

　　// grad_fn 记录生成该 variable 的函数，在反向传播时对该函数求导，

　　// requires_grad 是否需要求导，详见下文

　　// is_leaf 是否为叶子节点，详见下文

注意，在 1.0+ 版本中，variable 已经被废弃，但是还能用，它对应的功能已经移植到 Tensor 中

requires_grad

在 pytorch 中，每个 Tensor 都有 requires_grad 属性，默认值是 False，代表子图不参与梯度计算，这样做可以提高效率；　　　　【子图，计算图的概念，和 tf 类似，后期会补充】

示例

x = t.randn(3, 3)
y = t.randn(3, 3)
z = t.randn((3, 3), requires_grad=True)

print(x.requires_grad)      # False     ### 只有当全部输入 requires_grad 都为 False，输出的 requires_grad 才是 False，此时该节点不参与梯度
print(z.requires_grad)      # True
a = x + z
print(a.requires_grad)      # True      ### 如果一个节点对应的输入有一个 requires_grad 为 True，该节点的输出的 requires_grad 也是 True，

一个节点对应的输入有一个 requires_grad 为 True，该节点输出的 requires_grad 就为 True；

当 subgraph 中所有 Tensor 都不需要梯度时，在反向传播时就无需 backward computation 了；当你提前知道某些 Tensor 不需要参与梯度时，该属性很有用

注意：该属性只针对浮点型 Tensor

如果这样创建 Tensor

b = t.tensor(2, requires_grad=True)

在反向传播时会报如下错误

RuntimeError: Only Tensors of floating point dtype can require gradients

正确的方式是

b = t.tensor(2., requires_grad=True)

计算图

计算图是记录运算的有向无环图，包括节点和边，节点代表数据，即 Tensor，边代表运算；

is_leaf

叶子节点：用户创建的张量，可以理解为计算图中最底层的节点；

叶子节点的作用是，在反向传播过程中，如果是叶子节点，其梯度自动保存，

如果为非叶子节点，其梯度会自动释放，释放以后就无法再次获得它的 grad，

如果想让非叶子节点保存梯度，需要 tensor.retain_grad ，例如

w x 为叶子节点，a b y 为非叶子节点，a 设置了 retain_grad，故 w x a 有梯度， b y 无梯度

Autograd

autograd 的意思是对一个函数自动求导，返回导函数；

在 pytorch 中，autograd 是一个模块，Variable 类就属于这个模块；

backward

def backward(self, gradient=None, retain_graph=None, create_graph=False):
        r"""Computes the gradient of current tensor w.r.t. graph leaves.

        Arguments:
            gradient (Tensor or None): Gradient w.r.t. the
                tensor. If it is a tensor, it will be automatically converted
                to a Tensor that does not require grad unless ``create_graph`` is True.
                None values can be specified for scalar Tensors or ones that
                don't require grad. If a None value would be acceptable then
                this argument is optional.
            retain_graph (bool, optional): If ``False``, the graph used to compute
                the grads will be freed. Note that in nearly all cases setting
                this option to True is not needed and often can be worked around
                in a much more efficient way. Defaults to the value of
                ``create_graph``.
            create_graph (bool, optional): If ``True``, graph of the derivative will
                be constructed, allowing to compute higher order derivative
                products. Defaults to ``False``.
        """
        torch.autograd.backward(self, gradient, retain_graph, create_graph)

Variable 对象可以调用 backward 方法实现反向传播，自动计算梯度；

gradient：它的形状要与 Variable 保持一致；

也就是说，如果 y 是标量，无需 gradient，如果 y 是向量，需要设置 gradient

y 是标量

import torch
from torch.autograd import Variable
x = Variable(torch.Tensor([16]), requires_grad=True) # 需要求导数
y = x * x
print(y)        # tensor([256.], grad_fn=<MulBackward0>)
### y 是标量，无需额外参数
y.backward()
print(x.grad)   # tensor([32.])  ### 2x=32

y 是矢量

torch.manual_seed(10000)
a = torch.ones(2, 2, requires_grad=True)
b = torch. ones(2, 2, requires_grad=True)
c = a + 2 * b
print(c)                ### c 非标量，也就是向量，也就是 非叶节点
# tensor([[3., 3.],
#         [3., 3.]], grad_fn=<AddBackward0>)
### 此时需要添加参数 gradient
d = torch.randn(2, 2)   ### gradient 与 y 形状相同
print(d)
# tensor([[ 2.0065,  1.9535],
#         [ 0.1517, -0.4269]])
c.backward(d)
print(a.grad)
# tensor([[ 2.0065,  1.9535],
#         [ 0.1517, -0.4269]])          ### a.grad = d * c'/a' = d * 1 = d
print(b.grad)
# tensor([[ 4.0129,  3.9069],
#         [ 0.3034, -0.8538]])          ### b.grad = d * c'/b' = d * 2 = 2d

直接拿 gradient 乘以对应梯度即可

其实可以理解为多梯度权重

如下，有多个 loss，gradient 对每个 loss 进行加权

如果 y 是矢量，却没有指定 gradient，报如下错误

RuntimeError: grad can be implicitly created only for scalar outputs

retain_graph：如果为 True，再次求导时，导数会被累加，如果为 False，再次求导时报错

反向传播需要缓存一些中间结果，反向传播完成后，这些缓存被释放，如果为 True，代表不释放缓存，故累加；

默认释放，False

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x.pow(2)print(y)        # tensor([4.])

y.backward(retain_graph=True)
print(x.grad)   # tensor([4.])

### 如果上次 backward 没有retain_graph=True，再次调用 backward 会报如下错误
# RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
y.backward()
print(x.grad)   # tensor([8.])      ### 梯度累加

特殊示例

下面的例子中可以多次调用 backward，而无需 retain_graph = True

import torch as t
from torch.autograd import Variable

x = Variable(t.ones(2, 2), requires_grad=True)  ### 对 Tensor 对象进行封装，形成 autograd 的 Variable 对象
print(x)
# tensor([[1., 1.],
#         [1., 1.]], requires_grad=True)

y = x.sum()     ### Variable 对象可以直接调用 Tensor 对象的方法
print(y)                    # tensor(4., grad_fn=<SumBackward0>)
print(y.data)               # tensor(4.)
print(y.grad_fn)            # <SumBackward0 object at 0x0000000001E9F2E8>
print(y.requires_grad)      # True

y.backward()    ### 反向传播，计算梯度，相当于开启梯度之门，其实啥也没干,只是每次计算梯度都要开一次门
### 开一次门，可以多次计算，但是结果是一样的
print(x.grad)   ### 计算在 x 处的梯度
print(x.grad)
# tensor([[1., 1.],
#         [1., 1.]])

### grad 在反向传播的过程中是累加的，意思是每次迭代计算梯度，是本次梯度与之前所有梯度的累计和
y.backward()    ### 重新开门
print(x.grad)
# tensor([[2., 2.],
#         [2., 2.]])

### 如果不想累积所有梯度和，需要清零操作
x.grad.data.zero_()     ### 梯度清 0
y.backward()            ### 再次开门
print(x.grad)           ### 梯度没有累加
# tensor([[1., 1.],
#         [1., 1.]])

就是因为 x 是叶子节点，其梯度不自动释放，可多次获取

注意几点：

1. 每次计算梯度，都需要显示调用 backward 方法

2. 梯度值的计算会自动累加，如果不想累加，需要显示调用 grad.data.zero_ 方法

3. 上面的 Variable 对象可以替换成 Tensor 对象

x = t.ones(2, 2, requires_grad=True)        ### 不用 Variable 对象，直接用 Tensor 对象也可以

梯度

梯度不多讲，这里简单验证下 backward 的梯度计算是否正确

示例

import torch as t

x = t.Tensor([1])
a = t.tensor(2., requires_grad=True)
b = t.tensor(2., requires_grad=True)
c = t.tensor(3., requires_grad=True)

loss = a * a * x + b * x + c

print(a.grad, b.grad, c.grad)       # None None None
loss.backward()
print(a.grad, b.grad, c.grad)       # tensor(4.) tensor(1.) tensor(1.)

手动计算梯度

对 a 求导，a^2x 求导为 x2a，x=1， a=4，故导数为 4

对 b 求导，bx 求导为 x，x=1，故导数为 1

求导的另一种方式

import torch as t
from torch import autograd

x = t.Tensor([1])
a = t.tensor(2., requires_grad=True)
b = t.tensor(2., requires_grad=True)
c = t.tensor(3., requires_grad=True)
loss = a * a * x + b * x + c

print(a.grad, b.grad, c.grad)       # None None None
grads = autograd.grad(loss, [a, b, c])
print(grads)                        # (tensor(4.), tensor(1.), tensor(1.))

注意，不要和 backward 混用

参考资料：

https://blog.csdn.net/g11d111/article/details/83035270

https://blog.csdn.net/byron123456sfsfsfa/article/details/92210253

https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_autograd.html

https://www.cnblogs.com/luckyscarlett/p/10552747.html

https://www.cnblogs.com/marsggbo/p/11549631.html

https://www.baidu.com/link?url=kBjQHZOgepw8_bODqt9YcDv2l2AcfZx0Zdub1EKz4uVZXF5W-8G6bVDldspceWLX&wd=&eqid=8ed20b9600038f17000000065e1d7a21

https://blog.csdn.net/shiheyingzhe/article/details/83054238　　Pytorch中backward函数

查看全文

相关阅读:
Django学习系列之Cookie、Session
Django学习系列之CSRF
Django学习系列之Form验证
 Django学习系列之结合ajax
Logstash学习系列之插件介绍
 Logstash学习系列之基础介绍
 Kubernetes DNS安装配置
 Kubernetes网络配置
 kubernetes节点安装配置
 Kubernetes控制节点安装配置

原文地址：https://www.cnblogs.com/yanshw/p/12193115.html