CNN卷积神经网络之反向传播过程
CNN卷积神经网络反向传播过程详细推导过程
文章目录
1. 正向传播过程
1.1 卷积层-卷积运算
我们假设卷积运算如下(其中couv代表卷积运算,w是卷集核的数据,卷积核为2*2,b为偏置数)。建设上一层输出的特征图是 3 ∗ 3 3*3 3∗3,经过卷积运算以及加上偏置结果如下:
[ a 11 l − 1 a 12 l − 1 a 13 l − 1 a 21 l − 1 a 22 l − 1 a 23 l − 1 a 31 l − 1 a 32 l − 1 a 33 l − 1 ] c o u v [ w 11 l w 12 l w 21 l w 22 l ] + [ b 11 l b 12 l b 21 l b 22 l ] = [ z 11 l z 12 l z 21 l z 22 l ] (1) \begin{bmatrix} a_{11}^{l-1} & a_{12}^{l-1} & a_{13}^{l-1} \\ a_{21}^{l-1} & a_{22}^{l-1} & a_{23}^{l-1} \\ a_{31}^{l-1} & a_{32}^{l-1} & a_{33}^{l-1} \\ \end{bmatrix} couv \begin{bmatrix} w_{11}^{l}& w_{12}^{l}\\ w_{21}^{l}& w_{22}^{l}\\ \end{bmatrix} + \begin{bmatrix} b_{11}^{l}& b_{12}^{l}\\ b_{21}^{l}& b_{22}^{l}\\ \end{bmatrix} =\begin{bmatrix} z_{11}^{l}& z_{12}^{l}\\ z_{21}^{l}& z_{22}^{l}\\ \end{bmatrix} \tag{1}
a11l−1a21l−1a31l−1a12l−1a22l−1a32l−1a13l−1a23l−1a33l−1
couv[w11lw21lw12lw22l]+[b11lb21lb12lb22l]=[z11lz21lz12lz22l](1)
其中 y ^ \hat y y^代表预测值(对输出的值经过激活函数的结果):
y ^ = σ ( z l ) (2) \hat y = \sigma (z^{l}) \tag{2} y^=σ(zl)(2)
1.2 池化层-向下采样
池化有平均池化和最大池化,这里以平均池化为例子。即将原始矩阵按照指定的大小比例进行缩放。将原始矩阵缩小到一个更小的尺寸,通过将相邻元素的值进行平均来得到新的缩放后的矩阵
m a t r i x = [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ] matrix = \left[\begin {array}{c} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \\ \end{array}\right] matrix=
15913261014371115481216
设置池化层大小为 2 ∗ 2 2*2 2∗2,则 4 ∗ 4 4*4 4∗4的矩阵经过池化层后输出的矩阵大小为 2 ∗ 2 2*2 2∗2
- 对于第一行第一列的元素:计算原始矩阵中小区域 {(0, 0), (0, 1), (1, 0), (1, 1)} 内元素的平均值:(1 + 2 + 5 + 6) / 4 = 3.5,将其赋值给 scaledMatrix[0][0]。
- 最终的矩阵
[ 3.5 5.5 11.5 13.5 ] \left[\begin {array}{c} 3.5 & 5.5 \\ 11.5 & 13.5\\ \end{array}\right] [3.511.55.513.5]
2. 输出层误差项
输出层的误差项通过损失函数相对于输出的梯度来计算
2.1 损失函数
- 均方误差(MSE),适用于回归问题
M S E = 1 n ∑ i = 1 n ( y i ^ − y i ) 2 MSE=\frac{1}{n} \sum_{i=1}^{n}(\hat{y_{i}} - y_{i})^{2} MSE=n1i=1∑n(yi^−yi)2
y i y_i yi是真实的值; y i ^ \hat{y_{i}} yi^是预测值 - 交叉熵损失 适用于分类问题
C r o s s − e n t r o p y l o s s = − ∑ i = 1 n y i log ( y i ^ ) Cross-entropy loss = -\sum_{i=1}^{n}y_{i}\log(\hat{y_{i}}) Cross−entropyloss=−i=1∑nyilog(yi^)
2.2 误差项推导过程
为了方便计算,我们选择的损失函数为MSE,n去2; y i y_i yi是真实的值; y i ^ \hat{y_{i}} yi^是预测值,则损失函数 J J J则表示为:
J = 1 2 ( y i ^ − y i ) 2 (3) J = \frac{1}{2}(\hat{y_{i}} - y_{i})^{2} \tag{3} J=21(yi^−yi)2(3)
我们由(2)知道 y i ^ \hat{y_{i}} yi^的表达式,所以 计算损失函数对于输出层的加权输入 z l z^{l} zl的偏导数(这里采用了链式法则)
∂ J ∂ z l = ∂ J ∂ y ^ ⋅ ∂ y ^ ∂ z l (4) \frac{\partial J}{\partial z^l}=\frac{\partial J}{\partial \hat y}\cdot \frac{\partial \hat y}{\partial z^l} \tag{4} ∂zl∂J=∂y^∂J⋅∂zl∂y^(4)
而在这个公式中 ∂ J ∂ y ^ \frac{\partial J}{\partial \hat y} ∂y^∂J可以计算出,我们 J J J是用的均方误差函数
∂ J ∂ y ^ = y ^ − y (5) \frac{\partial J}{\partial \hat y}=\hat{y} - y \tag{5} ∂y^∂J=y^−y(5)
所以输出层的误差项通过损失函数相对于输出的梯度
δ l = ∂ J ∂ z l = ( y ^ − y ) ⋅ σ ′ ( z l ) (6) \delta^{l} =\frac{\partial J}{\partial z^l}=(\hat{y} - y)\cdot \sigma ' (z^{l}) \tag{6} δl=∂zl∂J=(y^−y)⋅σ′(zl)(6)
假设这里用的激活函数是sigmod函数。
∂ y ^ ∂ z l = σ ′ ( z l ) = σ ( z l ) ⋅ ( 1 − σ ( z l ) ) (7) \frac{\partial \hat y}{\partial z^l}=\sigma ' (z^{l})=\sigma (z^{l})\cdot (1-\sigma (z^{l}))\tag{7} ∂zl∂y^=σ′(zl)=σ(zl)⋅(1−σ(zl))(7)
∂ J ∂ z l = ( y ^ − y ) ⋅ σ ′ ( z l ) = ( y ^ − y ) ⋅ σ ( z l ) ⋅ ( 1 − σ ( z l ) ) (8) \frac{\partial J}{\partial z^l}=(\hat{y} - y)\cdot\sigma ' (z^{l})=(\hat{y} - y)\cdot\sigma (z^{l})\cdot (1-\sigma (z^{l}))\tag{8} ∂zl∂J=(y^−y)⋅σ′(zl)=(y^−y)⋅σ(zl)⋅(1−σ(zl))(8)
3. 已知卷积层的误差,推上一层(反卷积)
3.1 池化层的误差项
假设我们的卷积层为 δ l \delta^{l} δl,推上一层池化层 δ l − 1 \delta^{l-1} δl−1,我们要结合卷积层误差项 δ l \delta^l δl去推上一层的误差项。
3.1.1 推导过程
在卷积层中,我们卷积计算后还需要进行激活函数处理。例如公式(2)表达式,我们进一步细化这个公式:
y ^ = a l = σ ( z L ) = σ ( a l − 1 ∗ W l + b l ) (9) \hat y=a^{l} =\sigma(z^L)=\sigma(a^{l-1}*W^l + b^l) \tag{9} y^=al=σ(zL)=σ(al−1∗Wl+bl)(9)
∂ y ^ ∂ z l = ∂ a l ∂ z l = σ ′ ( z L ) (10) \frac{\partial \hat y}{\partial z^l}=\frac{\partial a^l}{\partial z^l}=\sigma ' (z^L) \tag{10} ∂zl∂y^=∂zl∂al=σ′(zL)(10)
那 δ l − 1 \delta^{l-1} δl−1的误差项:
δ l − 1 = ∂ J ∂ z l − 1 (链式法则去化解) = ∂ J ∂ z l ⋅ ∂ z l ∂ z l − 1 = δ l ⋅ ∂ z l ∂ a l − 1 ⋅ ∂ a l − 1 ∂ z l − 1 = δ l ⋅ ∂ z l ∂ a l − 1 ⋅ σ ′ ( z l − 1 ) (11) \begin{equation} \begin{split} \delta^{l-1}& =\frac{\partial J}{\partial z^{l-1}} \text{(链式法则去化解)} \\ & =\frac{\partial J}{\partial z^{l}} \cdot \frac{\partial z^{l}}{\partial z^{l-1}}\\ & =\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}\cdot \frac{\partial a^{l-1}}{\partial z^{l-1}}\\ &=\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}\cdot \sigma ' (z^{l-1}) \end{split} \end{equation} \tag{11} δl−1=∂zl−1∂J(链式法则去化解)=∂zl∂J⋅∂zl−1∂zl=δl⋅∂al−1∂zl⋅∂zl−1∂al−1=δl⋅∂al−1∂zl⋅σ′(zl−1)(11)
这是我们来单独看这里面的一些符号:
- ∂ z l ∂ a l − 1 \frac{\partial z^{l}}{\partial a^{l-1}} ∂al−1∂zl
我们知道如下公式不难得出:(可以参考卷积层卷积运算公式)
z l = w l ⋅ a l − 1 + b l (12) z^{l}=w^l\cdot a^{l-1} + b^l \tag{12} zl=wl⋅al−1+bl(12)
那对公式12求导则
∂ z l ∂ a l − 1 = w l (13) \frac{\partial z^{l}}{\partial a^{l-1}} = w^l \tag{13} ∂al−1∂zl=wl(13) - δ l ⋅ ∂ z l ∂ a l − 1 \delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}} δl⋅∂al−1∂zl他们之间有啥关联吗?
∇ a = δ l ⋅ ∂ z l ∂ a l − 1 (链式法则) = δ l ⋅ w l = ∂ J ∂ z l ⋅ ∂ z l ∂ a l − 1 = ∂ J ∂ a l − 1 (14) \begin{equation} \begin{split} \nabla a & = \delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}} \text{(链式法则)} \\ & = \delta^{l}\cdot w^{l}\\ & =\frac{\partial J}{\partial z^{l}}\cdot\frac{\partial z^{l}}{\partial a^{l-1}}\\ &=\frac{\partial J}{\partial a^{l-1}} \end{split} \end{equation} \tag{14} ∇a=δl⋅∂al−1∂zl(链式法则)=δl⋅wl=∂zl∂J⋅∂al−1∂zl=∂al−1∂J(14)
从这个公式知道 ∇ a \nabla a ∇a代表损失函数 J J J关于 a l − 1 a^{l-1} al−1的导数,即我们每个矩阵值的误差项。我们根据损失函数的变化情况来更新网络的参数,从而优化网络的性能。梯度下降算法。 - 结合 ∇ a \nabla a ∇a来尝试计算卷积层的误差项,会有什么规律。
我们根据文章上面卷积层-卷积运算的例子来细化每一个z的取值
z 11 = a 11 ⋅ w 11 + a 12 ⋅ w 12 + a 21 ⋅ w 21 + a 22 ⋅ w 22 + b 11 z 12 = a 12 ⋅ w 11 + a 13 ⋅ w 12 + a 22 ⋅ w 21 + a 23 ⋅ w 22 + b 12 z 21 = a 21 ⋅ w 11 + a 22 ⋅ w 12 + a 31 ⋅ w 21 + a 32 ⋅ w 22 + b 21 z 22 = a 22 ⋅ w 11 + a 23 ⋅ w 12 + a 32 ⋅ w 21 + a 33 ⋅ w 22 + b 22 (15) z_{11} = a_{11} \cdot w_{11} + a_{12} \cdot w_{12} + a_{21} \cdot w_{21} + a_{22} \cdot w_{22} + b_{11} \\ z_{12} = a_{12} \cdot w_{11} + a_{13} \cdot w_{12} + a_{22} \cdot w_{21} + a_{23} \cdot w_{22} + b_{12} \\ z_{21} = a_{21} \cdot w_{11} + a_{22} \cdot w_{12} + a_{31} \cdot w_{21} + a_{32} \cdot w_{22} + b_{21} \\ z_{22} = a_{22} \cdot w_{11} + a_{23} \cdot w_{12} + a_{32} \cdot w_{21} + a_{33} \cdot w_{22} + b_{22} \tag{15} z11=a11⋅w11+a12⋅w12+a21⋅w21+a22⋅w22+b11z12=a12⋅w11+a13⋅w12+a22⋅w21+a23⋅w22+b12z21=a21⋅w11+a22⋅w12+a31⋅w21+a32⋅w22+b21z22=a22⋅w11+a23⋅w12+a32⋅w21+a33⋅w22+b22(15)
根据公式(15)得出 ∇ a \nabla a ∇a他们的每个的具体误差项
∇ a 11 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 11 = δ 11 ⋅ w 11 ∇ a 12 = ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 12 + ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 12 = δ 12 ⋅ w 11 + δ 11 ⋅ w 12 ∇ a 13 = ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 13 = δ 12 ⋅ w 12 ∇ a 21 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 21 + ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 21 = δ 11 ⋅ w 21 + δ 21 ⋅ w 11 ∇ a 22 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 22 + ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 22 + ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 22 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 22 = δ 11 ⋅ w 22 + δ 12 ⋅ w 21 + δ 21 ⋅ w 12 + δ 22 ⋅ w 11 ∇ a 23 = ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 23 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 23 = δ 12 ⋅ w 22 + δ 22 ⋅ w 12 ∇ a 31 = ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 31 = δ 21 ⋅ w 21 ∇ a 32 = ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 32 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 32 = δ 21 ⋅ w 22 + δ 22 ⋅ w 21 ∇ a 33 = ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 33 = δ 22 ⋅ w 22 \begin{equation} \begin{split} & \nabla a_{11} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{11}}= \delta_{11}\cdot w_{11} \\ & \nabla a_{12} =\frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{12}} + \frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{12}} = \delta_{12}\cdot w_{11} + \delta_{11}\cdot w_{12}\\ & \nabla a_{13} =\frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{13}} =\delta_{12}\cdot w_{12}\\ & \nabla a_{21} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{21}} + \frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{21}} = \delta_{11}\cdot w_{21} + \delta_{21}\cdot w_{11}\\ & \nabla a_{22} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{22}} + \frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{22}} + \frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{22}}+ \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{22}}= \delta_{11} \cdot w_{22} + \delta_{12}\cdot w_{21} + \delta_{21}\cdot w_{12} + \delta_{22}\cdot w_{11}\\ & \nabla a_{23} =\frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{23}} + \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{23}} = \delta_{12}\cdot w_{22} + \delta_{22}\cdot w_{12}\\ & \nabla a_{31} =\frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{31}} = \delta_{21}\cdot w_{21} \\ &\nabla a_{32} =\frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{32}} + \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{32}} = \delta_{21}\cdot w_{22} + \delta_{22}\cdot w_{21} \\ &\nabla a_{33} =\frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{33}} = \delta_{22}\cdot w_{22} \\ \end{split} \end{equation} ∇a11=∂z11∂J⋅∂a11∂z11=δ11⋅w11∇a12=∂z12∂J⋅∂a12∂z12+∂z11∂J⋅∂a12∂z11=δ12⋅w11+δ11⋅w12∇a13=∂z12∂J⋅∂a13∂z12=δ12⋅w12∇a21=∂z11∂J⋅∂a21∂z11+∂z21∂J⋅∂a21∂z21=δ11⋅w21+δ21⋅w11∇a22=∂z11∂J⋅∂a22∂z11+∂z12∂J⋅∂a22∂z12+∂z21∂J⋅∂a22∂z21+∂z22∂J⋅∂a22∂z22=δ11⋅w22+δ12⋅w21+δ21⋅w12+δ22⋅w11∇a23=∂z12∂J⋅∂a23∂z12+∂z22∂J⋅∂a23∂z22=δ12⋅w22+δ22⋅w12∇a31=∂z21∂J⋅∂a31∂z21=δ21⋅w21∇a32=∂z21∂J⋅∂a32∂z21+∂z22∂J⋅∂a32∂z22=δ21⋅w22+δ22⋅w21∇a33=∂z22∂J⋅∂a33∂z22=δ22⋅w22
把这个转换为卷积运算:
[ ∇ a 11 ∇ a 12 ∇ a 13 ∇ a 21 ∇ a 22 ∇ a 23 ∇ a 31 ∇ a 32 ∇ a 33 ] = [ δ 11 ⋅ w 11 δ 12 ⋅ w 11 + δ 11 ⋅ w 12 δ 12 ⋅ w 12 δ 11 ⋅ w 21 + δ 21 ⋅ w 11 δ 11 ⋅ w 22 + δ 12 ⋅ w 21 + δ 21 ⋅ w 12 + δ 22 ⋅ w 11 δ 12 ⋅ w 22 + δ 22 ⋅ w 12 δ 21 ⋅ w 21 δ 21 ⋅ w 22 + δ 22 ⋅ w 21 δ 22 ⋅ w 22 ] \begin{bmatrix} \nabla a_{11} & \nabla a_{12} & \nabla a_{13} \\ \nabla a_{21} & \nabla a_{22} & \nabla a_{23} \\ \nabla a_{31} & \nabla a_{32} & \nabla a_{33} \\ \end{bmatrix}=\begin{bmatrix} \delta_{11}\cdot w_{11} & \delta_{12}\cdot w_{11} + \delta_{11}\cdot w_{12} & \delta_{12}\cdot w_{12} \\ \delta_{11}\cdot w_{21} + \delta_{21}\cdot w_{11} & \delta_{11} \cdot w_{22} + \delta_{12}\cdot w_{21} + \delta_{21}\cdot w_{12} + \delta_{22}\cdot w_{11} & \delta_{12}\cdot w_{22} + \delta_{22}\cdot w_{12} \\ \delta_{21}\cdot w_{21} & \delta_{21}\cdot w_{22} + \delta_{22}\cdot w_{21} & \delta_{22}\cdot w_{22} \\ \end{bmatrix} ∇a11∇a21∇a31∇a12∇a22∇a32∇a13∇a23∇a33 = δ11⋅w11δ11⋅w21+δ21⋅w11δ21⋅w21δ12⋅w11+δ11⋅w12δ11⋅w22+δ12⋅w21+δ21⋅w12+δ22⋅w11δ21⋅w22+δ22⋅w21δ12⋅w12δ12⋅w22+δ22⋅w12δ22⋅w22
[ 0 0 0 0 0 δ 11 δ 12 0 0 δ 21 δ 22 0 0 0 0 0 ] c o n v [ w 22 w 21 w 12 w 11 ] = [ δ 11 ⋅ w 11 δ 12 ⋅ w 11 + δ 11 ⋅ w 12 δ 12 ⋅ w 12 δ 11 ⋅ w 21 + δ 21 ⋅ w 11 δ 11 ⋅ w 22 + δ 12 ⋅ w 21 + δ 21 ⋅ w 12 + δ 22 ⋅ w 11 δ 12 ⋅ w 22 + δ 22 ⋅ w 12 δ 21 ⋅ w 21 δ 21 ⋅ w 22 + δ 22 ⋅ w 21 δ 22 ⋅ w 22 ] \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & \delta_{11} & \delta_{12} & 0 \\ 0 & \delta_{21} & \delta_{22}& 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} conv \begin{bmatrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{bmatrix}= \begin{bmatrix} \delta_{11}\cdot w_{11} & \delta_{12}\cdot w_{11} + \delta_{11}\cdot w_{12} & \delta_{12}\cdot w_{12} \\ \delta_{11}\cdot w_{21} + \delta_{21}\cdot w_{11} & \delta_{11} \cdot w_{22} + \delta_{12}\cdot w_{21} + \delta_{21}\cdot w_{12} + \delta_{22}\cdot w_{11} & \delta_{12}\cdot w_{22} + \delta_{22}\cdot w_{12} \\ \delta_{21}\cdot w_{21} & \delta_{21}\cdot w_{22} + \delta_{22}\cdot w_{21} & \delta_{22}\cdot w_{22} \\ \end{bmatrix} 00000δ11δ2100δ12δ2200000 conv[w22w12w21w11]= δ11⋅w11δ11⋅w21+δ21⋅w11δ21⋅w21δ12⋅w11+δ11⋅w12δ11⋅w22+δ12⋅w21+δ21⋅w12+δ22⋅w11δ21⋅w22+δ22⋅w21δ12⋅w12δ12⋅w22+δ22⋅w12δ22⋅w22
3.1.2 误差项表示
即卷积层的误差项是上一层池化层的误差项与卷积核大小旋转180度的卷积运算。即进一步蒋公式11化解:(其中池化层没有激活函数,或者可以理解 δ ( x ) = x \delta (x) = x δ(x)=x 求导就为1)
δ l − 1 = δ l ⋅ ∂ z l ∂ a l − 1 ⋅ σ ′ ( z l − 1 ) = δ l c o n v ( r o t 189 ( w l ) ) ⋅ σ ′ ( z l − 1 ) (16) \delta^{l-1} =\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}\cdot \sigma ' (z^{l-1})= \delta^{l} conv ( rot189(w^l))\cdot \sigma ' (z^{l-1}) \tag{16} δl−1=δl⋅∂al−1∂zl⋅σ′(zl−1)=δlconv(rot189(wl))⋅σ′(zl−1)(16)
3.2 推导W和b的梯度
3.2.1 推导过程
我们知道这是对矩阵特征值的误差项,
δ l − 1 = ∂ J ∂ z l − 1 \delta^{l-1} =\frac{\partial J}{\partial z^{l-1}} δl−1=∂zl−1∂J
同理对 W W W的梯度为:
∂ J ∂ W l = ∂ J ∂ z l ⋅ ∂ z l ∂ W l = δ l ∂ z l ∂ W l (17) \frac{\partial J}{\partial W^{l}} = \frac{\partial J}{\partial z^{l}} \cdot \frac{\partial z^{l}}{\partial W^{l}} = \delta^{l}\frac{\partial z^{l}}{\partial W^{l}} \tag{17} ∂Wl∂J=∂zl∂J⋅∂Wl∂zl=δl∂Wl∂zl(17)
同理计算机 ∇ a \nabla a ∇a,我们也可以计算出W的梯度: ∇ W \nabla W ∇W 如:
∇ W 11 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ W 11 + ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ W 11 + ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ W 11 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ W 11 = δ 11 a 11 + δ 12 a 12 + δ 21 a 21 + δ 22 a 22 \nabla W _{11} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial W_{11}} + \frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial W_{11}} + \frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial W_{11}} + \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial W_{11}}= \delta _{11} a_{11} + \delta _{12} a_{12} + \delta _{21} a_{21} +\delta _{22} a_{22} ∇W11=∂z11∂J⋅∂W11∂z11+∂z12∂J⋅∂W11∂z12+∂z21∂J⋅∂W11∂z21+∂z22∂J⋅∂W11∂z22=δ11a11+δ12a12+δ21a21+δ22a22
同理 ∇ W 12 , ∇ W 21 , ∇ W 22 \nabla W _{12}, \nabla W _{21},\nabla W _{22} ∇W12,∇W21,∇W22
[ ∇ W 11 ∇ W 12 ∇ W 21 ∇ W 22 ] = [ a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 ] c o n v [ δ 11 δ 12 δ 21 δ 22 ] (18) \begin{bmatrix} \nabla W _{11} & \nabla W _{12} \\ \nabla W _{21} & \nabla W _{22} \\ \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{bmatrix} conv \begin{bmatrix} \delta _{11} & \delta _{12} \\ \delta _{21} & \delta _{22} \\ \end{bmatrix} \tag{18} [∇W11∇W21∇W12∇W22]=
a11a21a31a12a22a32a13a23a33
conv[δ11δ21δ12δ22](18)
3.2.2 误差项表示
故权重的误差项为:
∂ J ∂ W l = a l − 1 c o n v ( δ l ) (19) \frac{\partial J}{\partial W^{l}} = a^{l-1} conv (\delta^{l}) \tag{19} ∂Wl∂J=al−1conv(δl)(19)
3.2.3 偏执项b的误差
∂ J ∂ b l = ∑ u v ( δ l ) u v (20) \frac{\partial J}{\partial b^{l}} = \sum _{uv}(\delta^{l})_{uv} \tag{20} ∂bl∂J=uv∑(δl)uv(20)
4. 已知卷积层误差,推上一层误差(反池化)
在cnn中,池化层主要是缩放矩阵,在正向传播中,主要进行向下采样,在反向传播中,我们是倒着回去,应该向上采样来填充误差项。
在池化层,没有经过激活函数的,池化层主要有两种方法:最大池化层和平均池化层。假设我们把池化层误差项标记为: δ l \delta^l δl
4.1 平均池化层的误差项
假设池化层是将88的矩阵进行缩放,输出的特征图是44:
平均池化: [ 1 2 8 4 ] 反向传播 → [ 0.25 0.25 0.5 0.5 0.25 0.25 0.5 0.5 2 2 1 1 2 2 1 1 ] 平均池化:\begin{bmatrix} 1 & 2 \\ 8 & 4 \\ \end{bmatrix}\underrightarrow{反向传播} \begin{bmatrix} 0.25 & 0.25 & 0.5 & 0.5\\ 0.25 & 0.25 & 0.5 & 0.5 \\ 2 & 2 & 1 & 1 \\ 2 & 2 & 1 & 1 \\ \end{bmatrix} 平均池化:[1824]反向传播
0.250.25220.250.25220.50.5110.50.511
4.2 最大池化层的误差项
最大池化在进行反向传播的时候,就需要把最大值放在之前做前向传播算法得到最大值的位置。(这里在进行卷积运算就要记录最大值的原始位置。)
最大池化: [ 1 2 8 4 ] 反向传播 → [ 1 0 0 0 0 0 2 0 0 8 0 0 0 0 4 0 ] 最大池化:\begin{bmatrix} 1 & 2 \\ 8 & 4 \\ \end{bmatrix}\underrightarrow{反向传播} \begin{bmatrix} 1& 0 & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 8 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ \end{bmatrix} 最大池化:[1824]反向传播
1000008002040000
更多推荐



所有评论(0)