多变量t分布的KL散度
多变量t分布的KL散度多变量学生t分布(简称多变量t分布,也称多元t分布,Multivariate t distribution)的定义如下:f(x)=Cn(detΣ)−1/2[1+1ν(x−μ)TΣ−1(x−μ)]−(ν+n)/2f(\mathbf{x})=C_n(\det \Sigma)^{-1/2}[1+\frac{1}{\nu}(\mathbf{x}-\mu)^T\Sigma^{-...
多变量t分布的KL散度
多变量学生t分布(简称多变量t分布,也称多元t分布,Multivariate t distribution)的定义如下:
f(x)=Cn(detΣ)−1/2[1+1ν(x−μ)TΣ−1(x−μ)]−(ν+n)/2 f(\mathbf{x})=C_n(\det \Sigma)^{-1/2}[1+\frac{1}{\nu}(\mathbf{x}-\mu)^T\Sigma^{-1}(\mathbf{x}-\mu)]^{-(\nu+n)/2} f(x)=Cn(detΣ)−1/2[1+ν1(x−μ)TΣ−1(x−μ)]−(ν+n)/2
其中随机变量x∈Rnx\in \mathbb{R}^nx∈Rn,μ∈Rn\mu\in \mathbb{R}^nμ∈Rn表示均值,Σ∈Rn×n\Sigma\in \mathbb{R}^{n\times n}Σ∈Rn×n表示相关矩阵(correlation matrix)或者尺度矩阵(scale matrix),ν\nuν表示自由度,nnn表示xxx的维数,CnC_nCn为归一化常数,其定义如下:
Cn=(πν)−n/2Γ[(ν+n)/2]/Γ(ν/2) C_n=(\pi\nu)^{-n/2}\Gamma[(\nu+n)/2]/\Gamma(\nu/2) Cn=(πν)−n/2Γ[(ν+n)/2]/Γ(ν/2)
其中Γ(⋅)\Gamma(\cdot)Γ(⋅)为Gamma函数。
- 值得注意的是相关矩阵不是统计中一般意义上的协方差矩阵,但其和协方差矩阵有关系,后面将给出。
考虑两个多变量t分布p(x)p(x)p(x)和q(x)q(x)q(x),假设p(x)p(x)p(x)是已知的真值多变量t分布,q(x)q(x)q(x)未知的多变量t分布,用来近似p(x)p(x)p(x),两个分布的表示如下:
p(x)=St(x;μ1,Σ1,ν1)q(x)=St(x;μ2,Σ2,ν2) p(x)=St(x;\mu_1,\Sigma_1,\nu_1)\\ q(x)=St(x;\mu_2,\Sigma_2,\nu_2) p(x)=St(x;μ1,Σ1,ν1)q(x)=St(x;μ2,Σ2,ν2)
根据KL散度的定义,DKL(p(x)∣∣q(x))D_{KL}(p(x)||q(x))DKL(p(x)∣∣q(x)) 可以写成:
DKL(p(x)∣∣q(x))=Ep(x)[logp(x)−logq(x)]=Ep(x){{logΓ(ν1+n2)−logΓ(ν12)−12log(detΣ1)−n2log(ν1π)−ν1+n2log[1+1ν1(x−μ1)TΣ1−1(x−μ1)]}−{logΓ(ν2+n2)−logΓ(ν22)−12log(detΣ2)−n2log(ν2π)−ν2+n2log[1+1ν2(x−μ2)TΣ2−1(x−μ2)]}}=12logdetΣ2detΣ1+n2logν2ν1+logΓ(ν1+n2)−logΓ(ν12)−logΓ(ν2+n2)+logΓ(ν22)−ν1+n2Ep(x){log[1+1ν1(x−μ1)TΣ1−1(x−μ1)]}+ν2+n2Ep(x){log[1+1ν2(x−μ2)TΣ2−1(x−μ2)]} \begin{aligned} &\quad D_{KL}(p(x)||q(x))=\mathbb{E}_{p(x)}[\log p(x)-\log q(x)]\\ &=\mathbb{E}_{p(x)}\left\{ \{\log \Gamma(\frac{\nu_1+n}{2})-\log \Gamma(\frac{\nu_1}{2})-\frac{1}{2}\log (\det \Sigma_1)-\frac{n}{2}\log(\nu_1\pi)\right.\\ &\quad \left. -\frac{\nu_1+n}{2}\log[1+\frac{1}{\nu_1}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)]\} -\{\log \Gamma(\frac{\nu_2+n}{2})-\log \Gamma(\frac{\nu_2}{2})\right.\\ &\quad \left. -\frac{1}{2}\log (\det \Sigma_2)-\frac{n}{2}\log(\nu_2\pi)-\frac{\nu_2+n}{2}\log[1+\frac{1}{\nu_2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)]\} \right\}\\ &=\frac{1}{2}\log \frac{\det \Sigma_2}{\det \Sigma_1}+\frac{n}{2}\log \frac{\nu_2}{\nu_1}+\log \Gamma(\frac{\nu_1+n}{2})-\log \Gamma(\frac{\nu_1}{2})-\log \Gamma(\frac{\nu_2+n}{2})+\log \Gamma(\frac{\nu_2}{2})\\ &\quad -\frac{\nu_1+n}{2}\mathbb{E}_{p(x)}\{\log[1+\frac{1}{\nu_1}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)]\}\\ &\quad +\frac{\nu_2+n}{2}\mathbb{E}_{p(x)}\{\log[1+\frac{1}{\nu_2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)]\} \end{aligned} DKL(p(x)∣∣q(x))=Ep(x)[logp(x)−logq(x)]=Ep(x){{logΓ(2ν1+n)−logΓ(2ν1)−21log(detΣ1)−2nlog(ν1π)−2ν1+nlog[1+ν11(x−μ1)TΣ1−1(x−μ1)]}−{logΓ(2ν2+n)−logΓ(2ν2)−21log(detΣ2)−2nlog(ν2π)−2ν2+nlog[1+ν21(x−μ2)TΣ2−1(x−μ2)]}}=21logdetΣ1detΣ2+2nlogν1ν2+logΓ(2ν1+n)−logΓ(2ν1)−logΓ(2ν2+n)+logΓ(2ν2)−2ν1+nEp(x){log[1+ν11(x−μ1)TΣ1−1(x−μ1)]}+2ν2+nEp(x){log[1+ν21(x−μ2)TΣ2−1(x−μ2)]}
通过多变量t分布的最大熵推导,可以证明:
Ep(x){log[1+1ν1(x−μ1)TΣ1−1(x−μ1)]}=w(n+ν12;n2) \mathbb{E}_{p(x)}\{\log[1+\frac{1}{\nu_1}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)]\}=w(\frac{n+\nu_1}{2};\frac{n}{2}) Ep(x){log[1+ν11(x−μ1)TΣ1−1(x−μ1)]}=w(2n+ν1;2n)
此处 w(x;α)=ψ(x)−ψ(x−α),x>αw(x;\alpha)=\psi(x)-\psi(x-\alpha),x>\alphaw(x;α)=ψ(x)−ψ(x−α),x>α, 而 ψ(⋅)\psi(\cdot)ψ(⋅) 记为digamma函数,其定义如下:
ψ(t)=dlogΓ(t)/dt \psi(t)=\mathrm{d}\log \Gamma(t)/\mathrm{d}t ψ(t)=dlogΓ(t)/dt
同时考虑自然对数函数log(⋅)\log(\cdot)log(⋅)是凹函数,使用Jensen不等式就可以得到以下非常有用的不等式:
Ep(x){log(⋅)}≤log{Ep(x)(⋅)} \mathbb{E}_{p(x)}\{\log(\cdot)\} \leq \log\{\mathbb{E}_{p(x)}(\cdot)\} Ep(x){log(⋅)}≤log{Ep(x)(⋅)}
因此
Ep(x){log[1+1ν2(x−μ2)TΣ2−1(x−μ2)]}≤log{Ep(x)[1+1ν2(x−μ2)TΣ2−1(x−μ2)]}=log{Ep(x)[1+1ν2(x−μ1+μ1−μ2)TΣ2−1(x−μ1+μ1−μ2)]}=log{Ep(x)[1+1ν2(x−μ1)TΣ2−1(x−μ1)+1ν2(μ1−μ2)TΣ2−1(μ1−μ2)+1ν2(x−μ1)TΣ2−1(μ1−μ2)+1ν2(μ1−μ2)TΣ2−1(x−μ1)]}=log{Ep(x){1+1ν2tr[Σ2−1(x−μ1)(x−μ1)T]+1ν2tr[Σ2−1(μ1−μ2)(μ1−μ2)T]}}=log{1+1ν2tr(Σ2−1Σ~1)+1ν2tr[Σ2−1(μ1−μ2)(μ1−μ2)T]} \begin{aligned} &\quad \mathbb{E}_{p(x)}\{\log[1+\frac{1}{\nu_2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)]\}\\ & \leq \log\{\mathbb{E}_{p(x)}[1+\frac{1}{\nu_2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)]\}\\ & =\log\{\mathbb{E}_{p(x)}[1+\frac{1}{\nu_2}(x-\mu_1+\mu_1-\mu_2)^T\Sigma_2^{-1}(x-\mu_1+\mu_1-\mu_2)]\}\\ &=\log\{\mathbb{E}_{p(x)}[1+\frac{1}{\nu_2}(x-\mu_1)^T\Sigma_2^{-1}(x-\mu_1)+\frac{1}{\nu_2}(\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\\ &\quad +\frac{1}{\nu_2}(x-\mu_1)^T\Sigma_2^{-1}(\mu_1-\mu_2)+\frac{1}{\nu_2}(\mu_1-\mu_2)^T\Sigma_2^{-1}(x-\mu_1)]\}\\ &=\log\left\{\mathbb{E}_{p(x)}\{ 1+\frac{1}{\nu_2}tr[\Sigma_2^{-1}(x-\mu_1)(x-\mu_1)^T]+\frac{1}{\nu_2}tr[\Sigma_2^{-1}(\mu_1-\mu_2)(\mu_1-\mu_2)^T] \}\right\}\\ &=\log\left\{1+\frac{1}{\nu_2}tr(\Sigma_2^{-1}\tilde\Sigma_1)+\frac{1}{\nu_2}tr[\Sigma_2^{-1}(\mu_1-\mu_2)(\mu_1-\mu_2)^T] \right\} \end{aligned} Ep(x){log[1+ν21(x−μ2)TΣ2−1(x−μ2)]}≤log{Ep(x)[1+ν21(x−μ2)TΣ2−1(x−μ2)]}=log{Ep(x)[1+ν21(x−μ1+μ1−μ2)TΣ2−1(x−μ1+μ1−μ2)]}=log{Ep(x)[1+ν21(x−μ1)TΣ2−1(x−μ1)+ν21(μ1−μ2)TΣ2−1(μ1−μ2)+ν21(x−μ1)TΣ2−1(μ1−μ2)+ν21(μ1−μ2)TΣ2−1(x−μ1)]}=log{Ep(x){1+ν21tr[Σ2−1(x−μ1)(x−μ1)T]+ν21tr[Σ2−1(μ1−μ2)(μ1−μ2)T]}}=log{1+ν21tr(Σ2−1Σ~1)+ν21tr[Σ2−1(μ1−μ2)(μ1−μ2)T]}
其中Σ~1\tilde\Sigma_1Σ~1记为多变量t分布p(x)p(x)p(x)的协方差矩阵,它和相关矩阵的关系如下:
Σ~1=ν1ν1−2Σ1 \tilde\Sigma_1=\frac{\nu_1}{\nu_1-2}\Sigma_1 Σ~1=ν1−2ν1Σ1
上式需要多变量t分布p(x)p(x)p(x)的自由度ν1\nu_1ν1满足以下条件:
ν1>2 \nu_1>2 ν1>2
因此我们可以得到两个多变量t分布的KL散度的上界(upper bound):
DKL(p(x)∣∣q(x))=Ep(x)[logp(x)−logq(x)]=12logdetΣ2detΣ1+n2logν2ν1+logΓ(ν1+n2)−logΓ(ν12)−logΓ(ν2+n2)+logΓ(ν22)−ν1+n2Ep(x){log[1+1ν1(x−μ1)TΣ1−1(x−μ1)]}+ν2+n2Ep(x){log[1+1ν2(x−μ2)TΣ2−1(x−μ2)]}≤12logdetΣ2detΣ1+n2logν2ν1+logΓ(ν1+n2)−logΓ(ν12)−logΓ(ν2+n2)+logΓ(ν22)−ν1+n2[ψ(ν1+n2)−ψ(ν12)]+ν2+n2log{1+1ν2tr(Σ2−1Σ~1)+1ν2tr[Σ2−1(μ1−μ2)(μ1−μ2)T]} \begin{aligned} &\quad D_{KL}(p(x)||q(x))=\mathbb{E}_{p(x)}[\log p(x)-\log q(x)]\\ &=\frac{1}{2}\log \frac{\det \Sigma_2}{\det \Sigma_1}+\frac{n}{2}\log \frac{\nu_2}{\nu_1}+\log \Gamma(\frac{\nu_1+n}{2})-\log \Gamma(\frac{\nu_1}{2})-\log \Gamma(\frac{\nu_2+n}{2})+\log \Gamma(\frac{\nu_2}{2})\\ &\quad -\frac{\nu_1+n}{2}\mathbb{E}_{p(x)}\{\log[1+\frac{1}{\nu_1}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)]\}\\ &\quad +\frac{\nu_2+n}{2}\mathbb{E}_{p(x)}\{\log[1+\frac{1}{\nu_2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)]\}\\ &\leq \frac{1}{2}\log \frac{\det \Sigma_2}{\det \Sigma_1}+\frac{n}{2}\log \frac{\nu_2}{\nu_1}+\log \Gamma(\frac{\nu_1+n}{2})-\log \Gamma(\frac{\nu_1}{2})-\log \Gamma(\frac{\nu_2+n}{2})+\log \Gamma(\frac{\nu_2}{2})\\ &\quad -\frac{\nu_1+n}{2}[\psi(\frac{\nu_1+n}{2})-\psi(\frac{\nu_1}{2})]\\ &\quad +\frac{\nu_2+n}{2}\log\left\{1+\frac{1}{\nu_2}tr(\Sigma_2^{-1}\tilde\Sigma_1)+\frac{1}{\nu_2}tr[\Sigma_2^{-1}(\mu_1-\mu_2)(\mu_1-\mu_2)^T] \right\}\\ \end{aligned} DKL(p(x)∣∣q(x))=Ep(x)[logp(x)−logq(x)]=21logdetΣ1detΣ2+2nlogν1ν2+logΓ(2ν1+n)−logΓ(2ν1)−logΓ(2ν2+n)+logΓ(2ν2)−2ν1+nEp(x){log[1+ν11(x−μ1)TΣ1−1(x−μ1)]}+2ν2+nEp(x){log[1+ν21(x−μ2)TΣ2−1(x−μ2)]}≤21logdetΣ1detΣ2+2nlogν1ν2+logΓ(2ν1+n)−logΓ(2ν1)−logΓ(2ν2+n)+logΓ(2ν2)−2ν1+n[ψ(2ν1+n)−ψ(2ν1)]+2ν2+nlog{1+ν21tr(Σ2−1Σ~1)+ν21tr[Σ2−1(μ1−μ2)(μ1−μ2)T]}
参考:
[1]: https://www.researchgate.net/publication/335580775_A_Novel_Kullback-Leilber_Divergence_Minimization-Based_Adaptive_Student%27s_t-Filter
[2]: KotzS,NadarajahS.Multivariatet-distributionsandtheirapplicationsM.CambridgeUniversityPress,2004.
更多推荐



所有评论(0)