赞
踩
pytorch 内置了常见 tensor 操作的求导解析解. 从 loss 到 parameter 是若干个 op 叠加起来的复合函数, 所以用链式法则逐个计算.
tensor.grad_fn 记录了一个 tensor 是由何种运算产出的, 以及相应的求导解析解. 注意并不是根据
y
x
′
=
Δ
y
Δ
x
y'_x=\frac {\Delta y} {\Delta x}
yx′=ΔxΔy 的定义去计算数值解.
tensor.requires_grad :bool
tensor.grad_fn
tensor.backward()
retain_grad()
grad
populated during :func:backward
. This is a no-op for leaf tensors.使用链式法则作复合函数的求导.
∂
y
∂
x
1
=
∂
y
∂
u
1
⋅
∂
u
1
∂
x
1
(1)
\frac{\partial y}{\partial x_1} = \frac{\partial y}{\partial u_1} \cdot \frac{\partial u_1}{\partial x_1} \tag1
∂x1∂y=∂u1∂y⋅∂x1∂u1(1)
分别计算两项各自的导数:
∂
y
∂
u
1
=
∂
(
u
1
2
+
u
2
2
)
1
2
∂
u
1
=
1
2
×
(
u
1
2
+
u
2
2
)
−
1
2
×
2
u
1
=
1
2
×
1
64
+
64
×
2
×
8
=
0.7071
(2)
\frac{\partial y}{\partial u_1}=\frac{{\partial}(u_1^2+u_2^2)^{\frac12}}{{\partial u_1}}\\ =\frac12 \times (u_1^2+u_2^2)^{-\frac12}\times 2u_1\\ =\frac12\times\frac1{\sqrt{64+64}}\times 2\times8\\ =0.7071 \\ \tag2
∂u1∂y=∂u1∂(u12+u22)21=21×(u12+u22)−21×2u1=21×64+64
1×2×8=0.7071(2)
注意因为有
u
1
2
u_1^2
u12 的存在, 这里其实也是一个复合函数, 都用到了
(
x
a
)
′
=
a
x
a
−
1
(x^a)'=ax^{a-1}
(xa)′=axa−1 的求导公式.
∂ u 1 ∂ x 1 = 4 (3) \frac{\partial u_1}{\partial x_1} =4 \tag3 ∂x1∂u1=4(3)
将 (2)(3)的结果代入式(1), 有
r
e
s
=
∂
y
∂
u
1
⋅
∂
u
1
∂
x
1
=
0.7071
×
4
=
2.8284
(4)
res = \frac{\partial y}{\partial u_1} \cdot \frac{\partial u_1}{\partial x_1}\\ =0.7071\times 4\\ =2.8284 \tag4
res=∂u1∂y⋅∂x1∂u1=0.7071×4=2.8284(4)
import torch x = torch.tensor([2, 2], dtype=torch.float) # input tensor x.requires_grad = True u = 4 * x y: torch.Tensor = u.norm() print('x.grad_fn', x.grad_fn) print('y.grad_fn', y.grad_fn) print('u.grad_fn', u.grad_fn) print(f'before y.backward(), x.grad = {x.grad}, u.grad = {u.grad}') y.backward() print(f'after y.backward(), x.grad = {x.grad}, u.grad = {u.grad}') """ x.grad_fn None y.grad_fn <NormBackward1 object at 0x0000027620F89A90> u.grad_fn <MulBackward0 object at 0x0000027620F89A90> before y.backward(), x.grad = None, u.grad = None after y.backward(), x.grad = tensor([2.8284, 2.8284]), u.grad = None D:\code_study\torch_study\test\auto_grad_test.py:10: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\build\aten\src\ATen/core/TensorBody.h:485.) print(f'before y.backward(), x.grad = {x.grad}, u.grad = {u.grad}') """
可以清晰看到 y对x1的偏导为 2.8284, 与手算结果一致.
有个 warning 信息, 是说 tensor u 不是叶子结点, 所以.grad attribute 不会被自动计算. 如果硬要算也可以, 调用 u.retain_grad()
即可.
一些分段函数带来的 第一类间断点等.
todo
zero_grad()
loss.backward()
后使用.zero_grad()
step()
使用最简单的 SGD, 步长为 0.1, 那么一个 step 之后, x 新的值为
x=x+(-1)*gradient*learning_rate
, 代入得 x=2-0.1*2.8284=1.7172.
import torch x = torch.tensor([2, 2], dtype=torch.float) # input tensor x.requires_grad = True u: torch.Tensor = 4 * x y: torch.Tensor = u.norm() print('y.grad_fn = ', y.grad_fn) print('u.grad_fn = ', u.grad_fn) print('x.grad_fn = ', x.grad_fn) loss = y optimizer = torch.optim.SGD(params=[x], lr=0.1) u.retain_grad() optimizer.zero_grad() print(f'before loss.backward(), x.grad = {x.grad}, u.grad = {u.grad}') loss.backward() print(f'after loss.backward(), x.grad = {x.grad}, u.grad = {u.grad}') print(f'before optimizer.step(), x = {x}') optimizer.step() print(f'after optimizer.step(), x = {x}') """ y.grad_fn <CopyBackwards object at 0x0000022F2CDA1BE0> u.grad_fn <MulBackward0 object at 0x0000022F2CDA1BE0> x.grad_fn None before loss.backward(), x.grad = None, u.grad = None after loss.backward(), x.grad = tensor([2.8284, 2.8284]), u.grad = tensor([0.7071, 0.7071]) before optimizer.step(), x = tensor([2., 2.], requires_grad=True) after optimizer.step(), x = tensor([1.7172, 1.7172], requires_grad=True) """
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。