目的:最近在写优化代码,需要对函数中的变量求导,以及求得它们的雅克比矩阵。因此用到向量以及矩阵的求导。

一个向量可以表示为如下:Y=[y1,y2,...,ym]TY=[y_1,y_2,...,y_m]^TY=[y1,y2,...,ym]T
向量导数的基本知识。它分为以下几类:
1)向量Y=[y1,y2,...,ym]TY=[y_1,y_2,...,y_m]^TY=[y1,y2,...,ym]Txxx标量求导:
∂Y∂x=[∂y1∂x∂y2∂x⋮∂ym∂x] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x}} \\ \cfrac{\partial{y_2}}{\partial{x}} \\ \vdots \\ \cfrac{\partial{y_m}}{\partial{x}} \end{bmatrix} xY= xy1xy2xym
如果Y=[y1,y2,...,ym]Y=[y_1,y_2,...,y_m]Y=[y1,y2,...,ym]是行向量,则求导
∂Y∂x=[∂y1∂x ∂y2∂x…∂ym∂x] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x}} \space \cfrac{\partial{y_2}}{\partial{x}} \ldots \cfrac{\partial{y_m}}{\partial{x}} \end{bmatrix} xY=[xy1 xy2xym]

2)标量yyy对向量X=[x1,x2,...,xm]TX=[x_1,x_2,...,x_m]^TX=[x1,x2,...,xm]T求导
∂y∂X=[∂y∂x1∂y∂x2⋮∂y∂xm] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_1}} \\ \cfrac{\partial{y}}{\partial{x_2}} \\ \vdots \\ \cfrac{\partial{y}}{\partial{x_m}} \end{bmatrix} Xy= x1yx2yxmy
如果X=[x1,x2,...,xm]X=[x_1,x_2,...,x_m]X=[x1,x2,...,xm]为行向量:
∂y∂X=[∂y∂x1 ∂y∂x2…∂y∂xm] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_1}} \space \cfrac{\partial{y}}{\partial{x_2}} \ldots \cfrac{\partial{y}}{\partial{x_m}} \end{bmatrix} Xy=[x1y x2yxmy]

3)向量Y=[y1,y2,...,ym]TY=[y_1,y_2,...,y_m]^TY=[y1,y2,...,ym]T对向量X=[x1,x2,...,xn]X=[x_1,x_2,...,x_n]X=[x1,x2,...,xn]求导
∂Y∂X=[∂y1∂x1  ∂y1∂x2  …  ∂y1∂xn∂y2∂x1  ∂y2∂x2  …  ∂y2∂xn⋮∂ym∂x1  ∂ym∂x2  …  ∂ym∂xn] \cfrac{\partial{Y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x_1}} \space \space \cfrac{\partial{y_1}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_1}}{\partial{x_n}} \\ \cfrac{\partial{y_2}}{\partial{x_1}} \space \space \cfrac{\partial{y_2}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_2}}{\partial{x_n}} \\ \vdots \\ \cfrac{\partial{y_m}}{\partial{x_1}} \space \space \cfrac{\partial{y_m}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_m}}{\partial{x_n}} \end{bmatrix} XY= x1y1  x2y1    xny1x1y2  x2y2    xny2x1ym  x2ym    xnym
向量对向量求导也是所谓的雅克比矩阵,它在优化中非常见。

如果是矩阵的话,
YYY是矩阵的时候,它的表达:
Y=[y11  y12  …  y1ny21  y22  …  y2n⋮ym1  ym2  …  ymn] Y=\begin{bmatrix} y_{11} \space \space y_{12} \space \space \ldots \space \space y_{1n} \\ y_{21} \space \space y_{22} \space \space \ldots \space \space y_{2n} \\ \vdots \\ y_{m1} \space \space y_{m2} \space \space \ldots \space \space y_{mn} \end{bmatrix} Y= y11  y12    y1ny21  y22    y2nym1  ym2    ymn
XXX是矩阵的时候,它的表达:
X=[x11  x12  …  x1nx21  x22  …  x2n⋮xm1  xm2  …  xmn] X=\begin{bmatrix} x_{11} \space \space x_{12} \space \space \ldots \space \space x_{1n} \\ x_{21} \space \space x_{22} \space \space \ldots \space \space x_{2n} \\ \vdots \\ x_{m1} \space \space x_{m2} \space \space \ldots \space \space x_{mn} \end{bmatrix} X= x11  x12    x1nx21  x22    x2nxm1  xm2    xmn

矩阵的导数有两种,如下
1)矩阵YYY对标量xxx求导:
∂Y∂x=[∂y11∂x  ∂y12∂x  …  ∂y1n∂x∂y21∂x  ∂y22∂x  …  ∂y2n∂x⋮∂ym1∂x  ∂ym2∂x  …  ∂ymn∂x] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_{11}}}{\partial{x}} \space \space \cfrac{\partial{y_{12}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{1n}}}{\partial{x}} \\ \cfrac{\partial{y_{21}}}{\partial{x}} \space \space \cfrac{\partial{y_{22}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{2n}}}{\partial{x}} \\ \vdots \\ \cfrac{\partial{y_{m1}}}{\partial{x}} \space \space \cfrac{\partial{y_{m2}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{mn}}}{\partial{x}} \end{bmatrix} xY= xy11  xy12    xy1nxy21  xy22    xy2nxym1  xym2    xymn
2)标量yyy对矩阵XXX求导:
∂y∂X=[∂y∂x11  ∂y∂x12  …  ∂y∂x1n∂y∂x21  ∂y∂x22  …  ∂y∂x2n⋮∂y∂xm1  ∂y∂xm2  …  ∂y∂xmn] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_{11}}} \space \space \cfrac{\partial{y}}{\partial{x_{12}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{1n}}} \\ \cfrac{\partial{y}}{\partial{x_{21}}} \space \space \cfrac{\partial{y}}{\partial{x_{22}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{2n}}} \\ \vdots \\ \cfrac{\partial{y}}{\partial{x_{m1}}} \space \space \cfrac{\partial{y}}{\partial{x_{m2}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{mn}}} \end{bmatrix} Xy= x11y  x12y    x1nyx21y  x22y    x2nyxm1y  xm2y    xmny
这是基本的向量的导数定义。基于这些定义以及一些基本的运算法则,得到一些组合的公式。在几何算法的编程中非常有用。

公式中的向量求导,在一般公式中会多个向量以及向量依赖,因此,在求导数的时候希望它能满足标量求导的链式法则。
假设向量相互依赖的关系为:U−>V−>WU->V->WU>V>W
则偏导数为:
∂W∂U=∂W∂V  ∂V∂U \cfrac{\partial{W}}{\partial{U}}=\cfrac{\partial{W}}{\partial{V}} \space \space \cfrac{\partial{V}}{\partial{U}} UW=VW  UV

证明:只需要拆开逐一对元素求导得到:
∂wi∂uj=∑k∂wi∂vk ∂vk∂uj=∂wi∂V ∂V∂uj\cfrac{\partial{w_i}}{\partial{u_j}} = \sum_{k}\cfrac{\partial{w_i}}{\partial{v_k}}\space \cfrac{\partial{v_k}}{\partial{u_j}} =\cfrac{\partial{w_i}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{u_j}} ujwi=kvkwi ujvk=Vwi ujV
由此可见∂wi∂uj\cfrac{\partial{w_i}}{\partial{u_j}}ujwi是等于矩阵∂W∂V\cfrac{\partial{W}}{\partial{V}}VWiii行和矩阵∂V∂U\cfrac{\partial{V}}{\partial{U}}UV的第jjj列的内积,这是矩阵的乘法定义。
它很容易能推广到多层中间变量的情景。

在变量中遇到的情况是常常公式为FFF为一个实数,中间变量都是向量的时候,它的依赖为:
X−>V−>U−>fX->V->U->fX>V>U>f
根据雅克比矩阵的传递性可以得到如下:
∂F∂X=∂F∂U ∂U∂V ∂V∂X \cfrac{\partial{F}}{\partial{X}} = \cfrac{\partial{F}}{\partial{U}}\space \cfrac{\partial{U}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{X}} XF=UF VU XV
因为fff为标量,因此它写成如下形式:
∂f∂XT=∂f∂UT ∂U∂V ∂V∂X \cfrac{\partial{f}}{\partial{X^T}} = \cfrac{\partial{f}}{\partial{U^T}}\space \cfrac{\partial{U}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{X}} XTf=UTf VU XV
为了便于计算,上述需要转为行向量UTU^TUT,XTX^TXT计算。这个非常重要。

下面介绍向量求倒数的时候遇到的常用的运算公式,它们有以下两类

1)两向量UUU,VVV(列向量)点积的结果对WWW求导:
∂(UTV)∂W=VT(∂U∂W)+UT(∂V∂W) (4) \cfrac{\partial{(U^T V)}}{\partial{W}} = V^T ( \cfrac{\partial{U}}{\partial{W}}) +U^T ( \cfrac{\partial{V}}{\partial{W}}) \space (4) W(UTV)=VT(WU)+UT(WV) (4)
点积的导数公式证明后续补上。
证明:假设U=[u0u1u3]U=\begin{bmatrix} u_0 \\ u_1 \\ u_3 \end{bmatrix}U= u0u1u3 V=[v0v1v3]V=\begin{bmatrix} v_0 \\ v_1 \\ v_3 \end{bmatrix}V= v0v1v3 ,它们为三维向量。得到点乘为f=UTVf=U^T Vf=UTV,它是一个标量为:f=u0v0+u1v1+u2v2f=u_0v_0+u_1v_1+u_2v_2f=u0v0+u1v1+u2v2,然后求它对WWW的导数

∂f∂W=∂(u0v0+u1v1+u2v2)∂W=∂u0∂Wv0+∂v0∂Wu0+∂u1∂Wv1+∂v1∂Wu1+∂u2∂Wv2+∂v2∂Wu2=(∂u0∂Wv0+∂u1∂Wv1+∂u2∂Wv2)+(∂v0∂Wu0+∂v1∂Wu1+∂v2∂Wu2)=VT(∂U∂W)+UT(∂V∂W) \cfrac{\partial{f}}{\partial{W}}=\cfrac{\partial{(u_0v_0+u_1v_1+u_2v_2)}}{\partial{W}} \\ =\cfrac{\partial{u_0}}{\partial{W}}v_0 + \cfrac{\partial{v_0}}{\partial{W}}u_0 + \cfrac{\partial{u_1}}{\partial{W}}v_1 + \cfrac{\partial{v_1}}{\partial{W}}u_1 + \cfrac{\partial{u_2}}{\partial{W}}v_2 + \cfrac{\partial{v_2}}{\partial{W}}u_2 \\ =(\cfrac{\partial{u_0}}{\partial{W}}v_0 + \cfrac{\partial{u_1}}{\partial{W}}v_1 + \cfrac{\partial{u_2}}{\partial{W}}v_2) + (\cfrac{\partial{v_0}}{\partial{W}}u_0 + \cfrac{\partial{v_1}}{\partial{W}}u_1 + \cfrac{\partial{v_2}}{\partial{W}}u_2) \\ =V^T ( \cfrac{\partial{U}}{\partial{W}}) +U^T ( \cfrac{\partial{V}}{\partial{W}}) Wf=W(u0v0+u1v1+u2v2)=Wu0v0+Wv0u0+Wu1v1+Wv1u1+Wu2v2+Wv2u2=(Wu0v0+Wu1v1+Wu2v2)+(Wv0u0+Wv1u1+Wv2u2)=VT(WU)+UT(WV)

它可以推广到其它的维度。证明完毕。

如果WWW是标量其实直接代入(4)(4)(4)即可。但是如果WWW为向量,在计算中WWW就是行向量。因为定义jacobi矩阵是,列向量对行向量就行求导。但是如果WWW是列向量(和U,V同样列向量),一般表示为WTW^TWT(行向量),所以在一般情况下公式(4)(4)(4)写成:
∂(UTV)∂WT=VT(∂U∂WT)+UT(∂V∂WT) \cfrac{\partial{(U^T V)}}{\partial{W^T}} = V^T ( \cfrac{\partial{U}}{\partial{W^T}}) +U^T ( \cfrac{\partial{V}}{\partial{W^T}}) WT(UTV)=VT(WTU)+UT(WTV)

2)两个向量UUU,VVV (列向量)叉积的结果对WWW求导:
∂(U×V)∂W=−Skew(V)(∂U∂W)+Skew(U)(∂V∂W) (5) \cfrac{\partial{(U \times V)}}{\partial{W}} = -Skew(V)( \cfrac{\partial{U}}{\partial{W}}) +Skew(U)( \cfrac{\partial{V}}{\partial{W}}) \space (5) W(U×V)=Skew(V)(WU)+Skew(U)(WV) (5)
其中
Skew(U)=[0  −U3  U2U3  0  −U1−U2  U1  0] Skew(U) = \begin{bmatrix} 0 \space \space -U_3 \space \space U_2 \\ U_3 \space \space 0 \space \space -U_1 \\ -U_2 \space \space U_1 \space \space 0 \end{bmatrix} Skew(U)= 0  U3  U2U3  0  U1U2  U1  0
其中Skew(V)Skew(V)Skew(V)是将叉乘转化为点积的矩阵。它非常容易证明,因为它就是矩阵展开即可。
对于多个向量叉乘的时候,需要对公式进行转化。叉乘满足分配率。
∂(U×V)∂W=(∂U∂W)×V+U×(∂V∂W) (6) \cfrac{\partial{(U \times V)}}{\partial{W}} = ( \cfrac{\partial{U}}{\partial{W}}) \times V + U \times ( \cfrac{\partial{V}}{\partial{W}}) \space (6) W(U×V)=(WU)×V+U×(WV) (6)
证明后续再补上。(5)和(6)两者的公式是想通的。只是表达形式不同。它们的转化后面再补上。

证明:假设U=[u0u1u3]U=\begin{bmatrix} u_0 \\ u_1 \\ u_3 \end{bmatrix}U= u0u1u3 V=[v0v1v3]V=\begin{bmatrix} v_0 \\ v_1 \\ v_3 \end{bmatrix}V= v0v1v3 ,它们为三维向量。

U×V=[i  j  ku0  u1  u2v0  v1  v2]=(u1v2−u1v2)i+(u2v0−u0v2)j+(u0v1−u1v0)k U \times V = \begin{bmatrix} i \space \space j \space \space k \\ u_0 \space \space u_1 \space \space u_2 \\ v_0 \space \space v_1 \space \space v_2 \end{bmatrix} \\ = (u_1v_2 - u_1v_2)i+ (u_2v_0 - u_0v_2)j+ (u_0v_1 - u_1v_0)k U×V= i  j  ku0  u1  u2v0  v1  v2 =(u1v2u1v2)i+(u2v0u0v2)j+(u0v1u1v0)k

它是一个向量,因此展开后,它的表达为如下:

U×V=[(u1v2−u2v1)(u2v0−u0v2)(u0v1−u1v0)] U \times V = \begin{bmatrix} (u_1v_2 - u_2v_1) \\ (u_2v_0 - u_0v_2) \\ (u_0v_1 - u_1v_0) \end{bmatrix} U×V= (u1v2u2v1)(u2v0u0v2)(u0v1u1v0)

展开后得到如下:

∂(U×V)∂W=[∂(u1v2−u2v1)∂W∂(u2v0−u0v2)∂W∂(u0v1−u1v0)∂W]=∂(u1v2−u2v1)∂WI+∂(u2v0−u0v2)∂WJ+∂(u0v1−u1v0)∂WK=(∂u1∂W∗v2+∂v2∂W∗u1−∂u2∂W∗v1−∂v1∂W∗u2)I+(∂u2∂W∗v0+∂v0∂W∗u2−∂u0∂W∗v2−∂v2∂W∗u0)J+(∂u0∂W∗v1+∂v1∂W∗u0−∂u1∂W∗v0−∂v0∂W∗u1)K=[(∂u1∂W∗v2−∂u2∂W∗v1)I+(∂u2∂W∗v0−∂u0∂W∗v2)J+(∂u0∂W∗v1−∂u1∂W∗v0)K]+[(∂v2∂W∗u1−∂v1∂W∗u2)I+(∂v0∂W∗u2−∂v2∂W∗u0)J+(∂v1∂W∗u0−∂v0∂W∗u1)K]=(∂U∂W)×V−(∂V∂W)×U=−V×(∂U∂W)+U×(∂V∂W)=−Skew(V)(∂U∂W)+Skew(U)(∂V∂W) \cfrac{\partial{(U \times V)}}{\partial{W}} = \begin{bmatrix} \cfrac{\partial{(u_1v_2 - u_2v_1) }}{\partial{W}} \\ \cfrac{\partial{ (u_2v_0 - u_0v_2)}}{\partial{W}}\\ \cfrac{\partial{ (u_0v_1 - u_1v_0)}}{\partial{W}}\\ \end{bmatrix} = \cfrac{\partial{(u_1v_2 - u_2v_1) }}{\partial{W}} I + \cfrac{\partial{ (u_2v_0 - u_0v_2)}}{\partial{W}}J+ \cfrac{\partial{ (u_0v_1 - u_1v_0)}}{\partial{W}}K \\ = (\cfrac{\partial{u_1}}{\partial{W}}*v_2+\cfrac{\partial{v_2}}{\partial{W}}*u_1-\cfrac{\partial{u_2}}{\partial{W}}*v_1-\cfrac{\partial{v_1}}{\partial{W}}*u_2)I+(\cfrac{\partial{u_2}}{\partial{W}}*v_0+\cfrac{\partial{v_0}}{\partial{W}}*u_2-\cfrac{\partial{u_0}}{\partial{W}}*v_2-\cfrac{\partial{v_2}}{\partial{W}}*u_0)J + (\cfrac{\partial{u_0}}{\partial{W}}*v_1+\cfrac{\partial{v_1}}{\partial{W}}*u_0-\cfrac{\partial{u_1}}{\partial{W}}*v_0-\cfrac{\partial{v_0}}{\partial{W}}*u_1)K \\ =[(\cfrac{\partial{u_1}}{\partial{W}}*v_2 -\cfrac{\partial{u_2}}{\partial{W}}*v_1)I + (\cfrac{\partial{u_2}}{\partial{W}}*v_0 - \cfrac{\partial{u_0}}{\partial{W}}*v_2)J + (\cfrac{\partial{u_0}}{\partial{W}}*v_1 - \cfrac{\partial{u_1}}{\partial{W}}*v_0)K] + [(\cfrac{\partial{v_2}}{\partial{W}}*u_1 -\cfrac{\partial{v_1}}{\partial{W}}*u_2)I + (\cfrac{\partial{v_0}}{\partial{W}}*u_2 - \cfrac{\partial{v_2}}{\partial{W}}*u_0)J + (\cfrac{\partial{v_1}}{\partial{W}}*u_0 - \cfrac{\partial{v_0}}{\partial{W}}*u_1)K] \\ =( \cfrac{\partial{U}}{\partial{W}}) \times V - ( \cfrac{\partial{V}}{\partial{W}}) \times U = -V \times (\cfrac{\partial{U}}{\partial{W}}) + U \times ( \cfrac{\partial{V}}{\partial{W}})= -Skew(V)( \cfrac{\partial{U}}{\partial{W}}) +Skew(U)( \cfrac{\partial{V}}{\partial{W}}) W(U×V)= W(u1v2u2v1)W(u2v0u0v2)W(u0v1u1v0) =W(u1v2u2v1)I+W(u2v0u0v2)J+W(u0v1u1v0)K=(Wu1v2+Wv2u1Wu2v1Wv1u2)I+(Wu2v0+Wv0u2Wu0v2Wv2u0)J+(Wu0v1+Wv1u0Wu1v0Wv0u1)K=[(Wu1v2Wu2v1)I+(Wu2v0Wu0v2)J+(Wu0v1Wu1v0)K]+[(Wv2u1Wv1u2)I+(Wv0u2Wv2u0)J+(Wv1u0Wv0u1)K]=(WU)×V(WV)×U=V×(WU)+U×(WV)=Skew(V)(WU)+Skew(U)(WV)

其中的假设a,ba,ba,b为向量,易得如下
a×b=−b×a a \times b = -b \times a a×b=b×a

从三维可以拓展到多维向量中。证明完毕

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐