【OR】约束优化:二阶充分条件
约束优化:二阶充分条件
二阶充分条件
当问题 ( P ) (P) (P)中约束规范
(1) f ( x ) , g i ( x ) , i = 1 , 2 , … , m f(x), g_i(x), i=1, 2,\dots, m f(x),gi(x),i=1,2,…,m均为凸函数
(2) h i ( x ) , i = 1 , … , I h_i(x),i=1,\dots, I hi(x),i=1,…,I为线性函数
满足KKT条件的点是 ( P ) (P) (P)的最优解.
考虑如果 ( 1 ) (1) (1)或者 ( 2 ) (2) (2)无法满足时,最优解的条件,需要使用二阶信息.
设 x ∗ x^* x∗满足KKT条件
{ ∇ f ( x ∗ ) + ∑ λ i ∇ g i ( x ∗ ) + ∑ μ i ∇ h i ( x ∗ ) = 0 λ i ≥ 0 λ i g i ( x ∗ ) = 0 \left\{ \begin{aligned} &\nabla f(x^*)+\sum\lambda_i\nabla g_i(x^*)+\sum\mu_i\nabla h_i(x^*)=0\\ &\lambda_i\geq 0\\ &\lambda_ig_i(x^*)=0 \end{aligned} \right. ⎩⎪⎪⎨⎪⎪⎧∇f(x∗)+∑λi∇gi(x∗)+∑μi∇hi(x∗)=0λi≥0λigi(x∗)=0
建立拉格朗日函数 L ( x ) L(x) L(x), 令
L ( x ) = f ( x ) + ∑ i λ i g i ( x ) + ∑ i μ i h i ( x ) L(x)=f(x)+\sum_i\lambda_ig_i(x)+\sum_i\mu_ih_i(x) L(x)=f(x)+i∑λigi(x)+i∑μihi(x)
可以得到
∇ L ( x ∗ ) = ∇ f ( x ∗ ) + ∑ λ i ∇ g i ( x ∗ ) + ∑ μ i ∇ h i ( x ∗ ) = 0 L ( x ∗ ) = f ( x ∗ ) + ∑ λ i g i ( x ∗ ) + ∑ μ i h i ( x ∗ ) = f ( x ∗ ) ∀ x ∈ S , L ( x ) = f ( x ) + ∑ λ i g i ( x ) ⏟ ≤ 0 + ∑ μ i h i ( x ) ⏟ = 0 ⇒ L ( x ) ≤ f ( x ) \begin{aligned} &\nabla L(x^*)=\nabla f(x^*)+\sum\lambda_i\nabla g_i(x^*)+\sum\mu_i\nabla h_i(x^*)=0\\ &L(x^*)=f(x^*)+\sum\lambda_ig_i(x^*)+\sum\mu_ih_i(x^*)=f(x^*)\\ &\forall x\in S, L(x)=f(x)+\sum\underbrace{\lambda_ig_i(x)}_{\leq 0}+\sum\underbrace{\mu_ih_i(x)}_{=0}\Rightarrow L(x)\leq f(x) \end{aligned} ∇L(x∗)=∇f(x∗)+∑λi∇gi(x∗)+∑μi∇hi(x∗)=0L(x∗)=f(x∗)+∑λigi(x∗)+∑μihi(x∗)=f(x∗)∀x∈S,L(x)=f(x)+∑≤0
λigi(x)+∑=0
μihi(x)⇒L(x)≤f(x)
可以知道,如果 x ∗ x^* x∗是 L ( x ) L(x) L(x)的最优解,则 x ∗ x^* x∗是 ( P ) (P) (P)的最优解.
1.设 x ∗ x^* x∗是KKT点, λ , μ , L ( x ) , ∇ L ( x ∗ ) = 0 \lambda, \mu, L(x),\nabla L(x^*)=0 λ,μ,L(x),∇L(x∗)=0
(1). 如果 ∇ 2 L ( x ) ⪰ 0 , ∀ x ∈ S \nabla^2L(x)\succeq 0, \forall x\in S ∇2L(x)⪰0,∀x∈S,则 x ∗ x^* x∗是 L ( x ) L(x) L(x)的全局最优解,也是 ( P ) (P) (P)的全局最优解.
(2). 如果 ∇ 2 L ( x ) ⪰ 0 , ∀ x ∈ S ∩ N s ( x ∗ ) \nabla^2L(x)\succeq 0, \forall x\in S\cap N_s(x^*) ∇2L(x)⪰0,∀x∈S∩Ns(x∗),则 x ∗ x^* x∗是局部最优解.
(3). 如果 ∇ 2 L ( x ) ≻ 0 \nabla^2L(x)\succ 0 ∇2L(x)≻0, 则 x ∗ x^* x∗是 ( P ) (P) (P)问题的严格局部最优解, d T ∇ 2 L ( x ∗ ) d > 0 , ∀ d ≠ 0 d^T\nabla^2L(x^*)d>0, \forall d\neq 0 dT∇2L(x∗)d>0,∀d=0 任何一个方向都会使函数值上升.
设 d T ∇ 2 L ( x ∗ ) d > 0 , ∀ d ∈ F 1 ( x ∗ ) d^T\nabla^2L(x^*)d>0, \forall d\in F_1(x^*) dT∇2L(x∗)d>0,∀d∈F1(x∗)
F 1 ( x ∗ ) = { d ∣ { ∇ g i ( x ∗ ) T d ≤ 0 , i ∈ I ∇ h i ( x ∗ ) T d = 0 , i = 1 , 2 , … , l } F_1(x^*)=\left\{ d\Bigg| \begin{cases} \nabla g_i(x^*)^Td\leq 0, i\in I\\ \nabla h_i(x^*)^Td=0, i=1, 2, \dots, l \end{cases} \right\} F1(x∗)={d∣∣∣∣∣{∇gi(x∗)Td≤0,i∈I∇hi(x∗)Td=0,i=1,2,…,l}
F 1 ( x ∗ ) F_1(x^*) F1(x∗)中目标函数值会上升的方向需要去除,考虑 λ i \lambda_i λi,根据敏感系数 λ i \lambda_i λi的定义,在 λ i > 0 \lambda_i>0 λi>0的情况下, g i ( x ) ≤ 0 , g i ( x ∗ ) = 0 g_i(x)\leq 0, g_i(x^*)=0 gi(x)≤0,gi(x∗)=0,所以在 x ∗ → x x^*\to x x∗→x方向上有
∇ g i ( x ∗ ) T d < 0 \nabla g_i(x^*)^Td<0 ∇gi(x∗)Td<0
令
I + = { i ∣ λ i > 0 , i ∈ I } I 0 = { i ∣ λ i = 0 , i ∈ I } \begin{aligned} I^+=\{i\mid \lambda_i>0, i\in I\}\\ I^0=\{i\mid \lambda_i=0, i\in I\} \end{aligned} I+={i∣λi>0,i∈I}I0={i∣λi=0,i∈I}
子集合 F 2 ( x ) F_2(x) F2(x)
F 2 ( x ∗ ) = { d ∣ ∇ g i ( x ∗ ) ≤ 0 , i ∈ I 0 g i ( x ∗ ) T d = 0 , i ∈ I + ∇ h i ( x ∗ ) T d = 0 , i = 1 , 2 … , l } ⊂ F 1 ( x ∗ ) F_2(x^*)=\left\{d\Bigg| \begin{aligned} &\nabla g_i(x^*)\leq 0, i\in I^0\\ &g_i(x^*)^Td=0, i\in I^+\\ &\nabla h_i(x^*)^Td=0, i=1, 2\dots, l \end{aligned} \right\}\subset F_1(x^*) F2(x∗)=⎩⎪⎨⎪⎧d∣∣∣∣∣∇gi(x∗)≤0,i∈I0gi(x∗)Td=0,i∈I+∇hi(x∗)Td=0,i=1,2…,l⎭⎪⎬⎪⎫⊂F1(x∗)
定理
假设 x ∗ x^* x∗满足KKT条件, λ , μ , L ( x ) = f ( x ) + ∑ λ i g i ( x ) + μ i h i ( x ) \lambda, \mu, L(x)=f(x)+\sum\lambda_ig_i(x)+\mu_ih_i(x) λ,μ,L(x)=f(x)+∑λigi(x)+μihi(x)可以知道 ∇ L ( x ∗ ) = 0 \nabla L(x^*)=0 ∇L(x∗)=0, 已知 d T ∇ 2 L ( x ∗ ) d > 0 , ∀ d ∈ F 2 ( x ∗ ) d^T\nabla^2L(x^*)d>0, \forall d\in F_2(x^*) dT∇2L(x∗)d>0,∀d∈F2(x∗).
则 x ∗ x^* x∗是 ( P ) (P) (P)问题的严格局部最优解.
证明:
使用反证法,设 x ∗ x^* x∗不是严格局部最优解,则存在点列 x k → x ∗ , x k ∈ S , f ( x k ) ≤ f ( x ∗ ) x_k\to x^*, x_k\in S, f(x_k)\leq f(x^*) xk→x∗,xk∈S,f(xk)≤f(x∗),如果可以找打方向 d ∈ F 2 d\in F_2 d∈F2,但是 d T ∇ 2 L ( x ∗ ) d ≤ 0 d^T\nabla^2L(x^*)d\leq 0 dT∇2L(x∗)d≤0则定理不成立.
记
d k = x k − x ∗ ∥ x k − x ∗ ∥ α k = ∥ x k − x ∗ ∥ d_k=\frac{x_k-x^*}{\lVert x_k-x^*\rVert}\\ \alpha_k=\lVert x_k-x^*\rVert dk=∥xk−x∗∥xk−x∗αk=∥xk−x∗∥
显然, { α k } → 0 \{\alpha_k\}\to 0 {αk}→0且 { d k } \{d_k\} {dk}有界, x k = x ∗ + α k d k x_k=x^*+\alpha_kd_k xk=x∗+αkdk
计算
f ( x k ) − f ( x ∗ ) = α k ∇ f ( x ∗ ) T d k + α k 2 2 d k T ∇ 2 f ( x ∗ ) d k + o ( α k 2 ) ≤ 0 ( 1 ) g i ( x k ) − g i ( x ∗ ) = α k ∇ g i ( x ∗ ) T d k + α k 2 2 d k T ∇ 2 g i ( x ∗ ) d k + o ( α k 2 ) ≤ 0 , i ∈ I ( 2 ) h i ( x k ) − h i ( x ∗ ) = α k ∇ h i ( x ∗ ) d k + α k 2 2 d k T ∇ 2 h i ( x ∗ ) d k + o ( α k 2 ) = 0 , i = 1 , 2 … , l ( 3 ) \begin{aligned} &f(x_k)-f(x^*)=\alpha_k\nabla f(x^*)^Td_k+\frac{\alpha_k^2}{2}d_k^T\nabla^2f(x^*)d_k+\mathcal{o}(\alpha_k^2)\leq 0 \quad (1)\\ &g_i(x_k)-g_i(x^*)=\alpha_k\nabla g_i(x^*)^Td_k+\frac{\alpha_k^2}{2}d_k^T\nabla^2g_i(x^*)d_k+\mathcal{o}(\alpha_k^2)\leq 0, i\in I \quad (2)\\ &h_i(x_k)-h_i(x^*)=\alpha_k\nabla h_i(x^*)d_k+\frac{\alpha_k^2}{2}d_k^T\nabla^2h_i(x^*)d_k+\mathcal{o}(\alpha_k^2)= 0, i=1, 2\dots, l \quad (3) \end{aligned} f(xk)−f(x∗)=αk∇f(x∗)Tdk+2αk2dkT∇2f(x∗)dk+o(αk2)≤0(1)gi(xk)−gi(x∗)=αk∇gi(x∗)Tdk+2αk2dkT∇2gi(x∗)dk+o(αk2)≤0,i∈I(2)hi(xk)−hi(x∗)=αk∇hi(x∗)dk+2αk2dkT∇2hi(x∗)dk+o(αk2)=0,i=1,2…,l(3)
对方程(1),除以 α k \alpha_k αk,令 k → ∞ k\to \infty k→∞,得到
∇ f ( x ∗ ) T d k ≤ 0 \nabla f(x^*)^Td_k\leq 0 ∇f(x∗)Tdk≤0
对方程(2),除以 α k \alpha_k αk,令 k → ∞ k\to \infty k→∞,得到
∇ g i ( x ∗ ) T d k ≤ 0 \nabla g_i(x^*)^Td_k\leq 0 ∇gi(x∗)Tdk≤0
对方程(3),除以 α k \alpha_k αk,令 k → ∞ k\to \infty k→∞,得到
∇ h i ( x ∗ ) T d k = 0 \nabla h_i(x^*)^Td_k=0 ∇hi(x∗)Tdk=0
由KKT条件(两边同时乘以 d d d)可以得到
∇ f ( x ∗ ) T d ⏟ ≤ 0 + ∑ λ i ∇ g i ( x ∗ ) T d ⏟ ≤ 0 + ∑ μ i ∇ h i ( x ∗ ) T d ⏟ = 0 = 0 \underbrace{\nabla f(x^*)^Td}_{\leq 0}+\sum\underbrace{\lambda_i\nabla g_i(x^*)^Td}_{\leq 0}+\sum\underbrace{\mu_i\nabla h_i(x^*)^Td}_{=0}=0 ≤0
∇f(x∗)Td+∑≤0
λi∇gi(x∗)Td+∑=0
μi∇hi(x∗)Td=0
所以可以得到
∇ f ( x ∗ ) T d = 0 (3) \nabla f(x^*)^Td=0 \tag{3}\\ ∇f(x∗)Td=0(3)
∑ λ i ∇ g i ( x ∗ ) T d = 0 (4) \sum\lambda_i\nabla g_i(x^*)^Td=0 \tag{4} ∑λi∇gi(x∗)Td=0(4)
由方程 ( 4 ) (4) (4)可以得到
∇ g i ( x ∗ ) T d = 0 , i ∈ I + ∇ g i ( x ∗ ) T d ≤ 0 , i ∈ I 0 ∇ h i ( x ∗ ) T d = 0 \begin{aligned} &\nabla g_i(x^*)^Td=0, i\in I^+\\ &\nabla g_i(x^*)^Td\leq 0, i \in I^0 \\ &\nabla h_i(x^*)^Td=0 \end{aligned} ∇gi(x∗)Td=0,i∈I+∇gi(x∗)Td≤0,i∈I0∇hi(x∗)Td=0
即 d ∈ F 2 ( x ∗ ) d\in F_2(x^*) d∈F2(x∗)
计算 ( 1 ) + ∑ λ i ∗ ( 2 ) + ∑ μ i ∗ ( 3 ) (1)+\sum \lambda_i *(2)+\sum \mu_i *(3) (1)+∑λi∗(2)+∑μi∗(3)
α k ∇ f ( x ∗ ) T d k + ∑ λ i ∇ g i ( x ∗ ) T d k + ∑ μ i ∇ h i ( x ∗ ) T d k + α k 2 2 ( d k T ∇ 2 f ( x k ∗ ) d k + ∑ λ i d k T ∇ 2 g i ( x ∗ ) d k + ∑ μ i d k T ∇ 2 h i ( x ∗ ) d k ) + o ( α k 2 ) ≤ 0 \begin{aligned} &\alpha_k\nabla f(x^*)^Td_k+\sum \lambda_i\nabla g_i(x^*)^Td_k+\sum \mu_i\nabla h_i(x^*)^Td_k+\\ &\frac{\alpha_k^2}{2}(d_k^T\nabla^2 f(x_k^*)d_k+\sum\lambda_id_k^T\nabla^2g_i(x^*)d_k+\sum\mu_i d_k^T\nabla^2 h_i(x^*)d_k)+\mathcal{o}(\alpha_k^2)\leq 0 \end{aligned} αk∇f(x∗)Tdk+∑λi∇gi(x∗)Tdk+∑μi∇hi(x∗)Tdk+2αk2(dkT∇2f(xk∗)dk+∑λidkT∇2gi(x∗)dk+∑μidkT∇2hi(x∗)dk)+o(αk2)≤0
整理可得
α k ( ∇ f ( x ∗ ) + ∑ λ i ∇ g i ( x ∗ ) + ∑ μ i ∇ h i ( x ∗ ) ) T d k ⏟ = 0 + α k 2 2 d k T ∇ 2 L ( x ∗ ) d k + o ( α k 2 ) ≤ 0 \begin{aligned} &\underbrace{\alpha_k(\nabla f(x^*)+\sum \lambda_i\nabla g_i(x^*)+\sum\mu_i \nabla h_i(x^*))^Td_k}_{=0}+\\ &\frac{\alpha_k^2}{2}d_k^T\nabla^2L(x^*)d_k+\mathcal{o}(\alpha_k^2)\leq 0 \end{aligned} =0
αk(∇f(x∗)+∑λi∇gi(x∗)+∑μi∇hi(x∗))Tdk+2αk2dkT∇2L(x∗)dk+o(αk2)≤0
即
1 2 d k T ∇ 2 L ( x ∗ ) d k + o ( α k 2 ) α k 2 ≤ 0 \frac{1}{2}d_k^T\nabla^2L(x^*)d_k+\frac{\mathcal{o}(\alpha_k^2)}{\alpha_k^2}\leq 0 21dkT∇2L(x∗)dk+αk2o(αk2)≤0
令 k → ∞ k\to\infty k→∞,可知 ∃ d ∈ F 2 ( x ∗ ) \exist d\in F_2(x^*) ∃d∈F2(x∗),使 d T ∇ 2 L ( x ∗ ) d ≤ 0 d^T\nabla^2L(x^*)d\leq 0 dT∇2L(x∗)d≤0.与已知条件矛盾.
综上, x ∗ x^* x∗是 ( P ) (P) (P)的严格局部最优解.
参考资料
更多推荐
所有评论(0)