向量的导数运算和向量叉乘以及点乘的导数运算
向量和向量之间的导数,以及它们的叉积的导数
目的:最近在写优化代码,需要对函数中的变量求导,以及求得它们的雅克比矩阵。因此用到向量以及矩阵的求导。
一个向量可以表示为如下:Y=[y1,y2,...,ym]TY=[y_1,y_2,...,y_m]^TY=[y1,y2,...,ym]T
向量导数的基本知识。它分为以下几类:
1)向量Y=[y1,y2,...,ym]TY=[y_1,y_2,...,y_m]^TY=[y1,y2,...,ym]T对xxx标量求导:
∂Y∂x=[∂y1∂x∂y2∂x⋮∂ym∂x] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x}} \\ \cfrac{\partial{y_2}}{\partial{x}} \\ \vdots \\ \cfrac{\partial{y_m}}{\partial{x}} \end{bmatrix} ∂x∂Y=
∂x∂y1∂x∂y2⋮∂x∂ym
如果Y=[y1,y2,...,ym]Y=[y_1,y_2,...,y_m]Y=[y1,y2,...,ym]是行向量,则求导
∂Y∂x=[∂y1∂x ∂y2∂x…∂ym∂x] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x}} \space \cfrac{\partial{y_2}}{\partial{x}} \ldots \cfrac{\partial{y_m}}{\partial{x}} \end{bmatrix} ∂x∂Y=[∂x∂y1 ∂x∂y2…∂x∂ym]
2)标量yyy对向量X=[x1,x2,...,xm]TX=[x_1,x_2,...,x_m]^TX=[x1,x2,...,xm]T求导
∂y∂X=[∂y∂x1∂y∂x2⋮∂y∂xm] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_1}} \\ \cfrac{\partial{y}}{\partial{x_2}} \\ \vdots \\ \cfrac{\partial{y}}{\partial{x_m}} \end{bmatrix} ∂X∂y=
∂x1∂y∂x2∂y⋮∂xm∂y
如果X=[x1,x2,...,xm]X=[x_1,x_2,...,x_m]X=[x1,x2,...,xm]为行向量:
∂y∂X=[∂y∂x1 ∂y∂x2…∂y∂xm] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_1}} \space \cfrac{\partial{y}}{\partial{x_2}} \ldots \cfrac{\partial{y}}{\partial{x_m}} \end{bmatrix} ∂X∂y=[∂x1∂y ∂x2∂y…∂xm∂y]
3)向量Y=[y1,y2,...,ym]TY=[y_1,y_2,...,y_m]^TY=[y1,y2,...,ym]T对向量X=[x1,x2,...,xn]X=[x_1,x_2,...,x_n]X=[x1,x2,...,xn]求导
∂Y∂X=[∂y1∂x1 ∂y1∂x2 … ∂y1∂xn∂y2∂x1 ∂y2∂x2 … ∂y2∂xn⋮∂ym∂x1 ∂ym∂x2 … ∂ym∂xn] \cfrac{\partial{Y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x_1}} \space \space \cfrac{\partial{y_1}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_1}}{\partial{x_n}} \\ \cfrac{\partial{y_2}}{\partial{x_1}} \space \space \cfrac{\partial{y_2}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_2}}{\partial{x_n}} \\ \vdots \\ \cfrac{\partial{y_m}}{\partial{x_1}} \space \space \cfrac{\partial{y_m}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_m}}{\partial{x_n}} \end{bmatrix} ∂X∂Y=
∂x1∂y1 ∂x2∂y1 … ∂xn∂y1∂x1∂y2 ∂x2∂y2 … ∂xn∂y2⋮∂x1∂ym ∂x2∂ym … ∂xn∂ym
向量对向量求导也是所谓的雅克比矩阵,它在优化中非常见。
如果是矩阵的话,
如YYY是矩阵的时候,它的表达:
Y=[y11 y12 … y1ny21 y22 … y2n⋮ym1 ym2 … ymn] Y=\begin{bmatrix} y_{11} \space \space y_{12} \space \space \ldots \space \space y_{1n} \\ y_{21} \space \space y_{22} \space \space \ldots \space \space y_{2n} \\ \vdots \\ y_{m1} \space \space y_{m2} \space \space \ldots \space \space y_{mn} \end{bmatrix} Y=
y11 y12 … y1ny21 y22 … y2n⋮ym1 ym2 … ymn
如XXX是矩阵的时候,它的表达:
X=[x11 x12 … x1nx21 x22 … x2n⋮xm1 xm2 … xmn] X=\begin{bmatrix} x_{11} \space \space x_{12} \space \space \ldots \space \space x_{1n} \\ x_{21} \space \space x_{22} \space \space \ldots \space \space x_{2n} \\ \vdots \\ x_{m1} \space \space x_{m2} \space \space \ldots \space \space x_{mn} \end{bmatrix} X=
x11 x12 … x1nx21 x22 … x2n⋮xm1 xm2 … xmn
矩阵的导数有两种,如下
1)矩阵YYY对标量xxx求导:
∂Y∂x=[∂y11∂x ∂y12∂x … ∂y1n∂x∂y21∂x ∂y22∂x … ∂y2n∂x⋮∂ym1∂x ∂ym2∂x … ∂ymn∂x] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_{11}}}{\partial{x}} \space \space \cfrac{\partial{y_{12}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{1n}}}{\partial{x}} \\ \cfrac{\partial{y_{21}}}{\partial{x}} \space \space \cfrac{\partial{y_{22}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{2n}}}{\partial{x}} \\ \vdots \\ \cfrac{\partial{y_{m1}}}{\partial{x}} \space \space \cfrac{\partial{y_{m2}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{mn}}}{\partial{x}} \end{bmatrix} ∂x∂Y=
∂x∂y11 ∂x∂y12 … ∂x∂y1n∂x∂y21 ∂x∂y22 … ∂x∂y2n⋮∂x∂ym1 ∂x∂ym2 … ∂x∂ymn
2)标量yyy对矩阵XXX求导:
∂y∂X=[∂y∂x11 ∂y∂x12 … ∂y∂x1n∂y∂x21 ∂y∂x22 … ∂y∂x2n⋮∂y∂xm1 ∂y∂xm2 … ∂y∂xmn] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_{11}}} \space \space \cfrac{\partial{y}}{\partial{x_{12}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{1n}}} \\ \cfrac{\partial{y}}{\partial{x_{21}}} \space \space \cfrac{\partial{y}}{\partial{x_{22}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{2n}}} \\ \vdots \\ \cfrac{\partial{y}}{\partial{x_{m1}}} \space \space \cfrac{\partial{y}}{\partial{x_{m2}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{mn}}} \end{bmatrix} ∂X∂y=
∂x11∂y ∂x12∂y … ∂x1n∂y∂x21∂y ∂x22∂y … ∂x2n∂y⋮∂xm1∂y ∂xm2∂y … ∂xmn∂y
这是基本的向量的导数定义。基于这些定义以及一些基本的运算法则,得到一些组合的公式。在几何算法的编程中非常有用。
公式中的向量求导,在一般公式中会多个向量以及向量依赖,因此,在求导数的时候希望它能满足标量求导的链式法则。
假设向量相互依赖的关系为:U−>V−>WU->V->WU−>V−>W
则偏导数为:
∂W∂U=∂W∂V ∂V∂U \cfrac{\partial{W}}{\partial{U}}=\cfrac{\partial{W}}{\partial{V}} \space \space \cfrac{\partial{V}}{\partial{U}} ∂U∂W=∂V∂W ∂U∂V
证明:只需要拆开逐一对元素求导得到:
∂wi∂uj=∑k∂wi∂vk ∂vk∂uj=∂wi∂V ∂V∂uj\cfrac{\partial{w_i}}{\partial{u_j}} = \sum_{k}\cfrac{\partial{w_i}}{\partial{v_k}}\space \cfrac{\partial{v_k}}{\partial{u_j}} =\cfrac{\partial{w_i}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{u_j}} ∂uj∂wi=k∑∂vk∂wi ∂uj∂vk=∂V∂wi ∂uj∂V
由此可见∂wi∂uj\cfrac{\partial{w_i}}{\partial{u_j}}∂uj∂wi是等于矩阵∂W∂V\cfrac{\partial{W}}{\partial{V}}∂V∂W第iii行和矩阵∂V∂U\cfrac{\partial{V}}{\partial{U}}∂U∂V的第jjj列的内积,这是矩阵的乘法定义。
它很容易能推广到多层中间变量的情景。
在变量中遇到的情况是常常公式为FFF为一个实数,中间变量都是向量的时候,它的依赖为:
X−>V−>U−>fX->V->U->fX−>V−>U−>f
根据雅克比矩阵的传递性可以得到如下:
∂F∂X=∂F∂U ∂U∂V ∂V∂X \cfrac{\partial{F}}{\partial{X}} = \cfrac{\partial{F}}{\partial{U}}\space \cfrac{\partial{U}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{X}} ∂X∂F=∂U∂F ∂V∂U ∂X∂V
因为fff为标量,因此它写成如下形式:
∂f∂XT=∂f∂UT ∂U∂V ∂V∂X \cfrac{\partial{f}}{\partial{X^T}} = \cfrac{\partial{f}}{\partial{U^T}}\space \cfrac{\partial{U}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{X}} ∂XT∂f=∂UT∂f ∂V∂U ∂X∂V
为了便于计算,上述需要转为行向量UTU^TUT,XTX^TXT计算。这个非常重要。
下面介绍向量求倒数的时候遇到的常用的运算公式,它们有以下两类
1)两向量UUU,VVV(列向量)点积的结果对WWW求导:
∂(UTV)∂W=VT(∂U∂W)+UT(∂V∂W) (4) \cfrac{\partial{(U^T V)}}{\partial{W}} = V^T ( \cfrac{\partial{U}}{\partial{W}}) +U^T ( \cfrac{\partial{V}}{\partial{W}}) \space (4) ∂W∂(UTV)=VT(∂W∂U)+UT(∂W∂V) (4)
点积的导数公式证明后续补上。
证明:假设U=[u0u1u3]U=\begin{bmatrix} u_0 \\ u_1 \\ u_3 \end{bmatrix}U=
u0u1u3
和V=[v0v1v3]V=\begin{bmatrix} v_0 \\ v_1 \\ v_3 \end{bmatrix}V=
v0v1v3
,它们为三维向量。得到点乘为f=UTVf=U^T Vf=UTV,它是一个标量为:f=u0v0+u1v1+u2v2f=u_0v_0+u_1v_1+u_2v_2f=u0v0+u1v1+u2v2,然后求它对WWW的导数
∂f∂W=∂(u0v0+u1v1+u2v2)∂W=∂u0∂Wv0+∂v0∂Wu0+∂u1∂Wv1+∂v1∂Wu1+∂u2∂Wv2+∂v2∂Wu2=(∂u0∂Wv0+∂u1∂Wv1+∂u2∂Wv2)+(∂v0∂Wu0+∂v1∂Wu1+∂v2∂Wu2)=VT(∂U∂W)+UT(∂V∂W) \cfrac{\partial{f}}{\partial{W}}=\cfrac{\partial{(u_0v_0+u_1v_1+u_2v_2)}}{\partial{W}} \\ =\cfrac{\partial{u_0}}{\partial{W}}v_0 + \cfrac{\partial{v_0}}{\partial{W}}u_0 + \cfrac{\partial{u_1}}{\partial{W}}v_1 + \cfrac{\partial{v_1}}{\partial{W}}u_1 + \cfrac{\partial{u_2}}{\partial{W}}v_2 + \cfrac{\partial{v_2}}{\partial{W}}u_2 \\ =(\cfrac{\partial{u_0}}{\partial{W}}v_0 + \cfrac{\partial{u_1}}{\partial{W}}v_1 + \cfrac{\partial{u_2}}{\partial{W}}v_2) + (\cfrac{\partial{v_0}}{\partial{W}}u_0 + \cfrac{\partial{v_1}}{\partial{W}}u_1 + \cfrac{\partial{v_2}}{\partial{W}}u_2) \\ =V^T ( \cfrac{\partial{U}}{\partial{W}}) +U^T ( \cfrac{\partial{V}}{\partial{W}}) ∂W∂f=∂W∂(u0v0+u1v1+u2v2)=∂W∂u0v0+∂W∂v0u0+∂W∂u1v1+∂W∂v1u1+∂W∂u2v2+∂W∂v2u2=(∂W∂u0v0+∂W∂u1v1+∂W∂u2v2)+(∂W∂v0u0+∂W∂v1u1+∂W∂v2u2)=VT(∂W∂U)+UT(∂W∂V)
它可以推广到其它的维度。证明完毕。
如果WWW是标量其实直接代入(4)(4)(4)即可。但是如果WWW为向量,在计算中WWW就是行向量。因为定义jacobi矩阵是,列向量对行向量就行求导。但是如果WWW是列向量(和U,V同样列向量),一般表示为WTW^TWT(行向量),所以在一般情况下公式(4)(4)(4)写成:
∂(UTV)∂WT=VT(∂U∂WT)+UT(∂V∂WT) \cfrac{\partial{(U^T V)}}{\partial{W^T}} = V^T ( \cfrac{\partial{U}}{\partial{W^T}}) +U^T ( \cfrac{\partial{V}}{\partial{W^T}}) ∂WT∂(UTV)=VT(∂WT∂U)+UT(∂WT∂V)
2)两个向量UUU,VVV (列向量)叉积的结果对WWW求导:
∂(U×V)∂W=−Skew(V)(∂U∂W)+Skew(U)(∂V∂W) (5) \cfrac{\partial{(U \times V)}}{\partial{W}} = -Skew(V)( \cfrac{\partial{U}}{\partial{W}}) +Skew(U)( \cfrac{\partial{V}}{\partial{W}}) \space (5) ∂W∂(U×V)=−Skew(V)(∂W∂U)+Skew(U)(∂W∂V) (5)
其中
Skew(U)=[0 −U3 U2U3 0 −U1−U2 U1 0] Skew(U) = \begin{bmatrix} 0 \space \space -U_3 \space \space U_2 \\ U_3 \space \space 0 \space \space -U_1 \\ -U_2 \space \space U_1 \space \space 0 \end{bmatrix} Skew(U)=
0 −U3 U2U3 0 −U1−U2 U1 0
其中Skew(V)Skew(V)Skew(V)是将叉乘转化为点积的矩阵。它非常容易证明,因为它就是矩阵展开即可。
对于多个向量叉乘的时候,需要对公式进行转化。叉乘满足分配率。
∂(U×V)∂W=(∂U∂W)×V+U×(∂V∂W) (6) \cfrac{\partial{(U \times V)}}{\partial{W}} = ( \cfrac{\partial{U}}{\partial{W}}) \times V + U \times ( \cfrac{\partial{V}}{\partial{W}}) \space (6) ∂W∂(U×V)=(∂W∂U)×V+U×(∂W∂V) (6)
证明后续再补上。(5)和(6)两者的公式是想通的。只是表达形式不同。它们的转化后面再补上。
证明:假设U=[u0u1u3]U=\begin{bmatrix} u_0 \\ u_1 \\ u_3 \end{bmatrix}U= u0u1u3 和V=[v0v1v3]V=\begin{bmatrix} v_0 \\ v_1 \\ v_3 \end{bmatrix}V= v0v1v3 ,它们为三维向量。
U×V=[i j ku0 u1 u2v0 v1 v2]=(u1v2−u1v2)i+(u2v0−u0v2)j+(u0v1−u1v0)k U \times V = \begin{bmatrix} i \space \space j \space \space k \\ u_0 \space \space u_1 \space \space u_2 \\ v_0 \space \space v_1 \space \space v_2 \end{bmatrix} \\ = (u_1v_2 - u_1v_2)i+ (u_2v_0 - u_0v_2)j+ (u_0v_1 - u_1v_0)k U×V= i j ku0 u1 u2v0 v1 v2 =(u1v2−u1v2)i+(u2v0−u0v2)j+(u0v1−u1v0)k
它是一个向量,因此展开后,它的表达为如下:
U×V=[(u1v2−u2v1)(u2v0−u0v2)(u0v1−u1v0)] U \times V = \begin{bmatrix} (u_1v_2 - u_2v_1) \\ (u_2v_0 - u_0v_2) \\ (u_0v_1 - u_1v_0) \end{bmatrix} U×V= (u1v2−u2v1)(u2v0−u0v2)(u0v1−u1v0)
展开后得到如下:
∂(U×V)∂W=[∂(u1v2−u2v1)∂W∂(u2v0−u0v2)∂W∂(u0v1−u1v0)∂W]=∂(u1v2−u2v1)∂WI+∂(u2v0−u0v2)∂WJ+∂(u0v1−u1v0)∂WK=(∂u1∂W∗v2+∂v2∂W∗u1−∂u2∂W∗v1−∂v1∂W∗u2)I+(∂u2∂W∗v0+∂v0∂W∗u2−∂u0∂W∗v2−∂v2∂W∗u0)J+(∂u0∂W∗v1+∂v1∂W∗u0−∂u1∂W∗v0−∂v0∂W∗u1)K=[(∂u1∂W∗v2−∂u2∂W∗v1)I+(∂u2∂W∗v0−∂u0∂W∗v2)J+(∂u0∂W∗v1−∂u1∂W∗v0)K]+[(∂v2∂W∗u1−∂v1∂W∗u2)I+(∂v0∂W∗u2−∂v2∂W∗u0)J+(∂v1∂W∗u0−∂v0∂W∗u1)K]=(∂U∂W)×V−(∂V∂W)×U=−V×(∂U∂W)+U×(∂V∂W)=−Skew(V)(∂U∂W)+Skew(U)(∂V∂W) \cfrac{\partial{(U \times V)}}{\partial{W}} = \begin{bmatrix} \cfrac{\partial{(u_1v_2 - u_2v_1) }}{\partial{W}} \\ \cfrac{\partial{ (u_2v_0 - u_0v_2)}}{\partial{W}}\\ \cfrac{\partial{ (u_0v_1 - u_1v_0)}}{\partial{W}}\\ \end{bmatrix} = \cfrac{\partial{(u_1v_2 - u_2v_1) }}{\partial{W}} I + \cfrac{\partial{ (u_2v_0 - u_0v_2)}}{\partial{W}}J+ \cfrac{\partial{ (u_0v_1 - u_1v_0)}}{\partial{W}}K \\ = (\cfrac{\partial{u_1}}{\partial{W}}*v_2+\cfrac{\partial{v_2}}{\partial{W}}*u_1-\cfrac{\partial{u_2}}{\partial{W}}*v_1-\cfrac{\partial{v_1}}{\partial{W}}*u_2)I+(\cfrac{\partial{u_2}}{\partial{W}}*v_0+\cfrac{\partial{v_0}}{\partial{W}}*u_2-\cfrac{\partial{u_0}}{\partial{W}}*v_2-\cfrac{\partial{v_2}}{\partial{W}}*u_0)J + (\cfrac{\partial{u_0}}{\partial{W}}*v_1+\cfrac{\partial{v_1}}{\partial{W}}*u_0-\cfrac{\partial{u_1}}{\partial{W}}*v_0-\cfrac{\partial{v_0}}{\partial{W}}*u_1)K \\ =[(\cfrac{\partial{u_1}}{\partial{W}}*v_2 -\cfrac{\partial{u_2}}{\partial{W}}*v_1)I + (\cfrac{\partial{u_2}}{\partial{W}}*v_0 - \cfrac{\partial{u_0}}{\partial{W}}*v_2)J + (\cfrac{\partial{u_0}}{\partial{W}}*v_1 - \cfrac{\partial{u_1}}{\partial{W}}*v_0)K] + [(\cfrac{\partial{v_2}}{\partial{W}}*u_1 -\cfrac{\partial{v_1}}{\partial{W}}*u_2)I + (\cfrac{\partial{v_0}}{\partial{W}}*u_2 - \cfrac{\partial{v_2}}{\partial{W}}*u_0)J + (\cfrac{\partial{v_1}}{\partial{W}}*u_0 - \cfrac{\partial{v_0}}{\partial{W}}*u_1)K] \\ =( \cfrac{\partial{U}}{\partial{W}}) \times V - ( \cfrac{\partial{V}}{\partial{W}}) \times U = -V \times (\cfrac{\partial{U}}{\partial{W}}) + U \times ( \cfrac{\partial{V}}{\partial{W}})= -Skew(V)( \cfrac{\partial{U}}{\partial{W}}) +Skew(U)( \cfrac{\partial{V}}{\partial{W}}) ∂W∂(U×V)= ∂W∂(u1v2−u2v1)∂W∂(u2v0−u0v2)∂W∂(u0v1−u1v0) =∂W∂(u1v2−u2v1)I+∂W∂(u2v0−u0v2)J+∂W∂(u0v1−u1v0)K=(∂W∂u1∗v2+∂W∂v2∗u1−∂W∂u2∗v1−∂W∂v1∗u2)I+(∂W∂u2∗v0+∂W∂v0∗u2−∂W∂u0∗v2−∂W∂v2∗u0)J+(∂W∂u0∗v1+∂W∂v1∗u0−∂W∂u1∗v0−∂W∂v0∗u1)K=[(∂W∂u1∗v2−∂W∂u2∗v1)I+(∂W∂u2∗v0−∂W∂u0∗v2)J+(∂W∂u0∗v1−∂W∂u1∗v0)K]+[(∂W∂v2∗u1−∂W∂v1∗u2)I+(∂W∂v0∗u2−∂W∂v2∗u0)J+(∂W∂v1∗u0−∂W∂v0∗u1)K]=(∂W∂U)×V−(∂W∂V)×U=−V×(∂W∂U)+U×(∂W∂V)=−Skew(V)(∂W∂U)+Skew(U)(∂W∂V)
其中的假设a,ba,ba,b为向量,易得如下
a×b=−b×a a \times b = -b \times a a×b=−b×a
从三维可以拓展到多维向量中。证明完毕
更多推荐
所有评论(0)