一 Hessian矩阵
实值函数f(x)相对于n×1n×1<script type="math/tex" id="MathJax-Element-48">n\times 1</script>实向量x的二阶偏导是一个由m2m2<script type="math/tex" id="MathJax-Element-49">m^2</script>个二阶偏导组成的矩阵(称为Hessian矩阵),定义为:

2f(x)xxT=xT[f(x)x]∂2f(x)∂x∂xT=∂∂xT[∂f(x)∂x]
<script type="math/tex; mode=display" id="MathJax-Element-50">{\partial ^2f(x)\over \partial x \partial x^T}={\partial \over \partial x^T}[{\partial f(x)\over \partial x}]</script>
或者简写为梯度的梯度:
2xf(x)=x(xf(x))∇x2f(x)=∇x(∇xf(x))
<script type="math/tex; mode=display" id="MathJax-Element-51">\nabla_x^2f(x)=\nabla_x(\nabla_xf(x))</script>
根据定义,Hessian矩阵的第i行第j列是梯度f(x)xi=xif(x)∂f(x)∂xi=∇xif(x)<script type="math/tex" id="MathJax-Element-52">{\partial f(x)\over \partial {x_i}}=\nabla_{x_i}f(x)</script>第j个分量的梯度,即:
[2f(x)xxT]i,j=2f(x)xixj[∂2f(x)∂x∂xT]i,j=∂2f(x)∂xi∂xj
<script type="math/tex; mode=display" id="MathJax-Element-53">[{\partial ^2f(x)\over \partial x \partial x^T}]_{i,j}={\partial ^2f(x)\over \partial x_i \partial x_j}</script>,
或者写做:
2f(x)xxT=2f(x)x1x12f(x)x2x12f(x)xnx12f(x)x1x22f(x)x2x22f(x)xnx22f(x)x1xn2f(x)x2xn2f(x)xnxn∂2f(x)∂x∂xT=[∂2f(x)∂x1∂x1∂2f(x)∂x1∂x2⋯∂2f(x)∂x1∂xn∂2f(x)∂x2∂x1∂2f(x)∂x2∂x2⋯∂2f(x)∂x2∂xn⋮⋮⋱⋮∂2f(x)∂xn∂x1∂2f(x)∂xn∂x2⋯∂2f(x)∂xn∂xn]
<script type="math/tex; mode=display" id="MathJax-Element-54">{\partial ^2f(x)\over \partial x \partial x^T}= \begin{bmatrix} {\partial ^2f(x)\over \partial x_1 \partial x_1} & {\partial ^2f(x)\over \partial x_1 \partial x_2} & \cdots & {\partial ^2f(x)\over \partial x_1 \partial x_n} \\ {\partial ^2f(x)\over \partial x_2 \partial x_1} & {\partial ^2f(x)\over \partial x_2 \partial x_2} & \cdots & {\partial ^2f(x)\over \partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ {\partial ^2f(x)\over \partial x_n \partial x_1} & {\partial ^2f(x)\over \partial x_n \partial x_2} & \cdots & {\partial ^2f(x)\over \partial x_n \partial x_n} \\ \end{bmatrix}</script>
因此,Hessian矩阵可以用两步法求出来:
(1)求实值函数f(x)关于向量变元x的偏导数,得到实值函数的梯度f(x)x∂f(x)∂x;<script type="math/tex" id="MathJax-Element-55">{\partial f(x)\over \partial {x}};</script>
(2)再求梯度f(x)x∂f(x)∂x<script type="math/tex" id="MathJax-Element-56">{\partial f(x)\over \partial {x}}</script>相对于1×n1×n<script type="math/tex" id="MathJax-Element-57">1\times n</script>行向量xTxT<script type="math/tex" id="MathJax-Element-58">x^T</script>的偏导数,得到梯度的梯度即Hessian矩阵。

二 局部极小点的条件
根据定义确定某个点xx∗<script type="math/tex" id="MathJax-Element-59">x_*</script>是否为目标函数的局部极小点,需要将目标函数在该点的取值与函数在该点领域里所有点的取值进行比较。这显然是不实际的做法。然而,如果f(x)是二次连续可微分的话,直接通过检验梯度xf(x)∇xf(x∗)<script type="math/tex" id="MathJax-Element-60">\nabla _xf(x_*)</script>和Hessian矩阵2xf(x)∇x2f(x∗)<script type="math/tex" id="MathJax-Element-61">\nabla _x^2f(x_*)</script>, 即可判断点xx∗<script type="math/tex" id="MathJax-Element-62">x_*</script>是否为局部极小点(甚至是严格局部极小点)。
(Δx)TΔx(Δx)TΔx<script type="math/tex" id="MathJax-Element-63">(\Delta x)^T\Delta x</script>很小, 即函数f(x)的二阶Taylor级数展开为:

f(x+Δx)=f(x)+(Δx)Txf(x)+12(Δx)T2xf(x)Δxf(x+Δx)=f(x)+(Δx)T∇xf(x)+12(Δx)T∇x2f(x)Δx
<script type="math/tex; mode=display" id="MathJax-Element-64">f(x+\Delta x)=f(x)+(\Delta x)^T\nabla_xf(x)+{1\over 2}(\Delta x)^T\nabla_x^2f(x)\Delta x</script>
关于判断一个局部极小点的一阶必要条件和一阶充分条件,请参考《矩阵分析与应用》270页(张贤达著),下面主要讲解其二阶充分条件:
定理:假设2xf(x)∇x2f(x)<script type="math/tex" id="MathJax-Element-65">\nabla_x^2f(x)</script>在xx∗<script type="math/tex" id="MathJax-Element-66">x_*</script>的开邻域内连续,并且
xf(x)=0, 2xf(x)>0∇xf(x∗)=0, ∇x2f(x∗)>0
<script type="math/tex; mode=display" id="MathJax-Element-67">\nabla_xf(x_*)=0, \ \nabla_x^2f(x_*)>0</script>
xx∗<script type="math/tex" id="MathJax-Element-68">x_*</script>是函数f(x)的严格局部极小点。式中2xf(x)>0∇x2f(x∗)>0<script type="math/tex" id="MathJax-Element-69">\nabla_x^2f(x_*)>0</script>表示Hessian矩阵2xf(x)∇x2f(x∗)<script type="math/tex" id="MathJax-Element-70">\nabla_x^2f(x_*)</script>正定。(具体即(Δx)T2xf(x)Δx>0(Δx)T∇x2f(x)Δx>0<script type="math/tex" id="MathJax-Element-71">(\Delta x)^T\nabla_x^2f(x)\Delta x>0</script>)
证明:由函数f(x)的二阶Taylor级数展开f(x+Δx)=f(x)+(Δx)Txf(x)+12(Δx)T2xf(x)Δxf(x∗+Δx)=f(x∗)+(Δx)T∇xf(x∗)+12(Δx)T∇x2f(x∗)Δx<script type="math/tex" id="MathJax-Element-72">f(x_*+\Delta x)=f(x_*)+(\Delta x)^T\nabla_xf(x_*)+{1\over 2}(\Delta x)^T\nabla_x^2f(x_*)\Delta x</script>,且xf(x)=0, (Δx)T2xf(x)Δx>0∇xf(x∗)=0, (Δx)T∇x2f(x∗)Δx>0<script type="math/tex" id="MathJax-Element-73">\nabla_xf(x_*)=0, \ (\Delta x)^T\nabla_x^2f(x_*)\Delta x>0</script>可得:f(x+Δx)>f(x)f(x∗+Δx)>f(x∗)<script type="math/tex" id="MathJax-Element-74">f(x_*+\Delta x)>f(x_*)</script>,所以xx∗<script type="math/tex" id="MathJax-Element-75">x_*</script>是函数f(x)的严格局部极小点。
应当注意的是,该二阶充分条件并不是必要条件:有的点xx∗<script type="math/tex" id="MathJax-Element-76">x_*</script>可能是函数f(x)的严格局部极小点,但是在该点的Hessian矩阵却不是正定的。例如,观察知,点x=0x=0<script type="math/tex" id="MathJax-Element-77">x=0</script>是函数f(x)=(xTx)2f(x)=(xTx)2<script type="math/tex" id="MathJax-Element-78">f(x)=(x^Tx)^2</script>的严格局部极小点,但是Hessian矩阵
2f(x)xxT=2xxT(xTx)2=12xTx∂2f(x)∂x∂xT=∂2∂x∂xT(xTx)2=12xTx
<script type="math/tex; mode=display" id="MathJax-Element-79">{\partial ^2f(x)\over \partial x \partial x^T}={\partial ^2\over \partial x \partial x^T}(x^Tx)^2=12x^Tx</script>
在严格局部极小点x=0x=0<script type="math/tex" id="MathJax-Element-80">x=0</script>处为零矩阵,不是正定矩阵。

定理:凸函数f(x)的任何局部极小点xx∗<script type="math/tex" id="MathJax-Element-81">x_*</script>都是该函数的一个全局极小点。
证明:假设xx∗<script type="math/tex" id="MathJax-Element-82">x_*</script>是局部极小点,但不是一个全局极小点。于是,可以求出一点zRz∈R<script type="math/tex" id="MathJax-Element-83">z\in R</script>满足f(z)<f(x)f(z)<f(x∗)<script type="math/tex" id="MathJax-Element-84">f(z)

x=λz+(1λ)x,  λ(0,1]x=λz+(1−λ)x∗,  λ∈(0,1]
<script type="math/tex; mode=display" id="MathJax-Element-88">x=\lambda z+(1-\lambda)x_*, \ \ \lambda \in (0,1]</script>
根据凸函数的性质,有
f(x)λf(z)+(1λ)f(x)<f(x)f(x)≤λf(z)+(1−λ)f(x∗)<f(x∗)
<script type="math/tex; mode=display" id="MathJax-Element-89">f(x)\le \lambda f(z)+(1-\lambda)f(x_*) 则当xx<script type="math/tex" id="MathJax-Element-90">x</script>趋近于 x <script type="math/tex" id="MathJax-Element-91">x_*</script>时,有f(x)<f(x)f(x)<f(x∗)<script type="math/tex" id="MathJax-Element-92">f(x)
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐