微分动态规划(Differential Dynamic Programming, DDP)

变分法思路

由贝尔曼最优性原理得到:
V(X,t)=min⁡u∈Ω{ϕ[X(tf),tf]+∫t0tfL(x(t),u(t),t)dt}=min⁡u∈Ω{∫t0t0+dtL(x(τ),u(τ),τ)dτ+V(X+ΔX,t+dt)} \begin{aligned} V(X, t)&=\min _{u \in \Omega}\left\{\phi\left[X\left(t_{f}\right), t_{f}\right]+\int_{t_{0}}^{t_{f}} L(x(t), u(t), t) d t\right\}\\ &=\min _{u \in \Omega}\left\{\int_{t_{0}}^{t_{0}+d t} L(x(\tau), u(\tau), \tau) d \tau+V(X+\Delta X, t+d t)\right\} \end{aligned} V(X,t)=uΩmin{ϕ[X(tf),tf]+t0tfL(x(t),u(t),t)dt}=uΩmin{t0t0+dtL(x(τ),u(τ),τ)dτ+V(X+ΔX,t+dt)}
离散化得:
xk+1=f(xk,uk)Vk=min⁡u[l(xk,uk)+Vk+1(xk+1)] \begin{array}{l}{x_{k+1}=f\left(x_{k}, u_{k}\right)} \\ \\{V_{k}=\min _{u}\left[l\left(x_{k},u_{k}\right)+V_{k+1}\left(x_{k+1}\right)\right]}\end{array} xk+1=f(xk,uk)Vk=minu[l(xkuk)+Vk+1(xk+1)]
kkk时刻的值函数V(x,k)V(x,k)V(x,k)做变分,令Q(δx,δu)=Vk(x+δx)−Vk(x)Q(\delta x,\delta u)=V_k(x+\delta x)-V_k(x)Q(δx,δu)=Vk(x+δx)Vk(x),对QQQ在标称轨迹(xk,uk)(x_k,u_k)(xk,uk)进行二阶泰勒展开:
Q(δx,δu)=V(x+δx)−V(x)=l(xk+δxk,uk+δuk)+Vk+1(xk+1+δxk+1)−(l(xk,uk)+Vk+1(xk+1))≈δxkTlxk+δukTluk+12(δxkTlxxkδxk+2δxkTlxukδuk+δukTluukδuk)+δxk+1TVxk+1+12δxk+1TVxxk+1δxk+1 \begin{aligned} Q(\delta x, \delta u)&=V(x+\delta x)-V(x)\\ &=l\left(x_{k}+\delta x_{k}, u_{k}+\delta u_{k}\right)+V_{k+1}\left(x_{k+1}+\delta x_{k+1}\right)-\left(l\left(x_{k}, u_{k}\right)+V_{k+1}\left(x_{k+1}\right)\right)\\ &\approx \delta x_{k}^{T} l_{x_{k}}+\delta u_{k}^{T} l_{u_{k}}+\frac{1}{2}\left(\delta x_{k}^{T} l_{x x_{k}} \delta x_{k}+2 \delta x_{k}^{T} l_{x u_{k}} \delta u_{k}+\delta u_{k}^{T} l_{u u_{k}} \delta u_{k}\right)+\\&\qquad \qquad\qquad\qquad\quad\delta x_{k+1}^{T} V_{x_{k+1}}+\frac{1}{2} \delta x_{k+1}^{T} V_{x x_{k+1}} \delta x_{k+1} \end{aligned} Q(δx,δu)=V(x+δx)V(x)=l(xk+δxk,uk+δuk)+Vk+1(xk+1+δxk+1)(l(xk,uk)+Vk+1(xk+1))δxkTlxk+δukTluk+21(δxkTlxxkδxk+2δxkTlxukδuk+δukTluukδuk)+δxk+1TVxk+1+21δxk+1TVxxk+1δxk+1
又有xk+1=f(xk,uk)x_{k+1}=f(x_k,u_k)xk+1=f(xk,uk),二阶泰勒展开得到:
δxk+1=δf(xk,uk)=fxkδxk+fukδuk+12(δxkTfxxkδx+2δxkTfxukδuk+δukTfuukδu) \delta x_{k+1}=\delta f\left(x_{k}, u_{k}\right)=f_{x_{k}} \delta x_{k}+f_{u_{k}} \delta u_{k}+\frac{1}{2}\left(\delta x_{k}^{T} f_{x x_{k}} \delta x+2 \delta x_{k}^{T} f_{x u_{k}} \delta u_{k}+\delta u_{k}^{T} f_{u u_{k}} \delta u\right) δxk+1=δf(xk,uk)=fxkδxk+fukδuk+21(δxkTfxxkδx+2δxkTfxukδuk+δukTfuukδu)
δxk+1\delta x_{k+1}δxk+1带入Q(δx,δu)Q(\delta x,\delta u)Q(δx,δu)中,得到:
Q(δx,δu)≈12[1δxkδuk]T[0QxkTQukTQxkQxxkQxukTQukQuxkQuuk][1δxkδuk] Q(\delta x,\delta u) \approx\frac{1}{2}\left[\begin{array}{c}{1} \\ {\delta x_k} \\ {\delta u_k}\end{array}\right]^T\left[\begin{array}{ccc}{0} & {Q_{\mathrm{x}_k}^{T}} & {Q_{\mathrm{u}_k}^{T}} \\ {Q_{\mathrm{x}_k}} & {Q_{\mathrm{xx}_k}} & {Q_{\mathrm{xu}_k}^T} \\ {Q_{\mathrm{u}_k}} & {Q_{\mathrm{ux}_k}} & {Q_{\mathrm{uu}_k}}\end{array}\right]\left[\begin{array}{c}{1} \\ {\delta x_k} \\ {\delta u_k}\end{array}\right] Q(δx,δu)211δxkδukT0QxkQukQxkTQxxkQuxkQukTQxukTQuuk1δxkδuk
其中:
Qx=lxk+fxkTVxk+1Qu=luk+fukTVxk+1Qxx=lxxk+fxkTVxxk+1fxk+Vxk+1fxkxkQuu=luuk+fukTVxxk+1fuk+Vxk+1fukukQux=luxk+fukTVxxk+1fxk+Vxk+1fukxk \begin{aligned} &Q_{x} =l_{x_{k}}+f_{x_{k}}^{T} V_{x_{k+1}} \\ &Q_{u} =l_{u_{k}}+f_{u_{k}}^{T} V_{x_{k+1}} \\ &Q_{x x} =l_{x x_{k}}+f_{x_{k}}^{T} V_{x x_{k+1}} f_{x_{k}}+V_{x_{k+1}} f_{x_{k} x_{k}} \\ &Q_{u u} =l_{u u_{k}}+f_{u_{k}}^{T} V_{x x_{k+1}} f_{u_{k}}+V_{x_{k+1}} f_{u_{k} u_{k}} \\ &Q_{u x} =l_{u x_{k}}+f_{u_{k}}^{T} V_{x x_{k+1}} f_{x_{k}}+V_{x_{k+1}} f_{u_{k} x_{k}} \end{aligned} Qx=lxk+fxkTVxk+1Qu=luk+fukTVxk+1Qxx=lxxk+fxkTVxxk+1fxk+Vxk+1fxkxkQuu=luuk+fukTVxxk+1fuk+Vxk+1fukukQux=luxk+fukTVxxk+1fxk+Vxk+1fukxk
Q(δx,δu)Q(\delta x, \delta u)Q(δx,δu)视为δu\delta uδu的二次函数,令∂Q(δx,δu)δu=0\frac{\partial Q(\delta x,\delta u)}{\delta u}=0δuQ(δx,δu)=0,有:
∂Q(δx,δu)δu=12(2Quukδu+Quxkδx+δxTQxuk+Quk+Quk)=Quukδu+Quxkδx+Quk=0 \begin{aligned} \frac{\partial Q(\delta x, \delta u)}{\delta u} &= \frac{1}{2}(2Q_{\mathbf{uu}_k}\delta u+Q_{\mathbf{ux}_k}\delta x+\delta x^TQ_{\mathbf{xu}_k}+Q_{\mathbf{u}_k}+Q_{\mathbf{u} _k})\\ &=Q_{\mathbf{uu}_k}\delta u+Q_{\mathbf{ux}_k}\delta x+Q_{\mathbf{u}_k}\\ &=0 \end{aligned} δuQ(δx,δu)=21(2Quukδu+Quxkδx+δxTQxuk+Quk+Quk)=Quukδu+Quxkδx+Quk=0
计算得:δu∗=−Quuk−1(Quk+Quxkδx)\delta u^* = -Q_{\mathbf{uu}_k}^{-1}(Q_{\mathbf{u}_k}+Q_{\mathbf{ux}_k}\delta x)δu=Quuk1(Quk+Quxkδx)
原理如下图:
在这里插入图片描述
k=−Quuk−1Quk, K=−Quuk−1Quxkk=-Q_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k},\ K=-Q_{\mathbf{uu}_k}^{-1}Q_{\mathbf{ux}_k}k=Quuk1Quk, K=Quuk1Quxk,则δu∗=arg⁡min⁡δuQ(δx,δu)=k+Kδx\delta u^{*}=\underset{\delta u}{\arg \min } Q(\delta x, \delta u)=k+K \delta xδu=δuargminQ(δx,δu)=k+Kδx,将其带入Q(δx,δu)Q(\delta x,\delta u)Q(δx,δu),可以整理成如下形式:
Q(δx,δu)≈ΔV+VxkTδx+12!δxTVxxkδx Q(\delta x,\delta u)\approx\Delta V+V_{\mathbf{x}_k}^T\delta{x}+\frac{1}{2!}\delta x^TV_{\mathbf{xx}_k}\delta x Q(δx,δu)ΔV+VxkTδx+2!1δxTVxxkδx
对比可以得到:
ΔV=−12QukTQuuk−1QukVxk=Qxk−QuxkTQuuk−1QukVxxk=Qxxk−QxukTQuuk−1Quxk \begin{aligned} &\Delta V=-\frac{1}{2}Q_{\mathbf{u}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k}\\ &V_{\mathbf{x}_k}=Q_{\mathbf{x}_k}-Q_{\mathbf{ux}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{u}_k}\\ &V_{\mathbf{xx}_k}=Q_{\mathbf{xx}_k}-Q_{\mathbf{xu}_k}^TQ_{\mathbf{uu}_k}^{-1}Q_{\mathbf{ux}_k} \end{aligned} ΔV=21QukTQuuk1QukVxk=QxkQuxkTQuuk1QukVxxk=QxxkQxukTQuuk1Quxk
其中QuukQ_{\mathbf{uu}_k}Quuk是半正定对称阵。

DDP伪代码:

  1. 由给定的控制序列uˉk\bar{u}_kuˉk,正向得带计算标称轨迹。
    xˉk+1=f(xˉk,uˉk)lxk,luk,lxxk,luxk,luukfxk,fuk,fxkxk,fukuk,fukxk \begin{aligned} &\bar{x}_{k+1}=f(\bar{x}_k,\bar{u}_k)\\ &l_{x_k},l_{u_k},l_{xx_k},l_{ux_k},l_{uu_k}\\ &f_{x_k},f_{u_k},f_{x_kx_k},f_{u_ku_k},f_{u_kx_k} \end{aligned} xˉk+1=f(xˉk,uˉk)lxk,luk,lxxk,luxk,luukfxk,fuk,fxkxk,fukuk,fukxk
  2. 反向迭代:从TTT111迭代
    • 计算Vxk+1V_{x_{k+1}}Vxk+1Vxxk+1V_{xx_{k+1}}Vxxk+1
    • 计算QQQ函数
    • 计算δuk∗\delta u_k^*δuk,得到kkk_kkkKkK_kKk
  3. 正向迭代更新控制序列
    x1=xˉ(1)uk=uˉk+kk+Kk(xk−xˉk)xk+1=f(xk,uk) \begin{aligned} &x_1 = \bar{x}(1)\\ &u_k=\bar{u}_k+k_k+K_k(x_k-\bar{x}_k)\\ &x_{k+1}=f(x_k,u_k) \end{aligned} x1=xˉ(1)uk=uˉk+kk+Kk(xkxˉk)xk+1=f(xk,uk)
  4. 是否收敛,否就跳转1,是就结束
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐