先给出结论
  • cross entropy和KL-divergence作为目标函数效果是一样的,从数学上来说相差一个常数。
  • logistic loss 是cross entropy的一个特例

1. cross entropy和KL-divergence

假设两个概率分布p(x)p(x)<script id="MathJax-Element-1" type="math/tex">p(x)</script>和q(x)q(x)<script id="MathJax-Element-2" type="math/tex">q(x)</script>, H(p,q)H(p,q)<script id="MathJax-Element-3" type="math/tex">H(p,q)</script>为cross entropy,DKL(p|q)DKL(p|q)<script id="MathJax-Element-4" type="math/tex">D_{KL}(p|q)</script>为 KL divergence。

交叉熵的定义:

H(p,q)=xp(x)logq(x)H(p,q)=−∑xp(x)log⁡q(x)
<script id="MathJax-Element-5" type="math/tex; mode=display"> H(p,q)=-\sum_xp(x) \log {q(x)} </script>
KL divergence的定义:

DKL(p|q)=xp(x)logp(x)q(x)DKL(p|q)=∑xp(x)log⁡p(x)q(x)
<script id="MathJax-Element-6" type="math/tex; mode=display"> D_{KL}(p|q)=\sum _xp(x)\log \frac{ p(x)} {q(x)}</script>
推导:

DKL(p|q)=xp(x)logp(x)q(x)=x(p(x)logp(x)p(x)logq(x))=H(p)xp(x)logq(x)=H(p)+H(p,q)(1)(2)(3)(4)(1)DKL(p|q)=∑xp(x)log⁡p(x)q(x)(2)=∑x(p(x)log⁡p(x)−p(x)log⁡q(x))(3)=−H(p)−∑xp(x)log⁡q(x)(4)=−H(p)+H(p,q)
<script id="MathJax-Element-7" type="math/tex; mode=display">\begin{align} D_{KL}(p|q) & = \sum _xp(x)\log \frac{ p(x)} {q(x)} \\ & = \sum_x(p(x)\log p(x)-p(x)\log q(x))\\ & =-H(p)-\sum_xp(x) \log q(x)\\ & = -H(p)+H(p,q) \\ \end{align}</script>

也就是说,cross entropy也可以定义为:

H(p,q)=DKL(p|q)+H(p)H(p,q)=DKL(p|q)+H(p)
<script id="MathJax-Element-8" type="math/tex; mode=display">H(p,q)=D_{KL}(p|q)+H(p)</script>

直观来说,由于p(x)是已知的分布,H(p)是个常数,cross entropy和KL divergence之间相差一个常数。


2. logistic loss 和cross entropy

假设p{y,1y}p∈{y,1−y}<script id="MathJax-Element-12" type="math/tex">p \in \{y,1-y\}</script> ,q{y^,1y^}q∈{y^,1−y^}<script id="MathJax-Element-13" type="math/tex"> q \in \{ \hat y,1-\hat y \}</script>, cross entropy可以写为logistic loss:

H(p,q)=xp(x)logq(x)=ylogy^(1y)log(1y^)H(p,q)=−∑xp(x)log⁡q(x)=−ylog⁡y^−(1−y)log⁡(1−y^)
<script id="MathJax-Element-14" type="math/tex; mode=display"> H(p,q)=-\sum_xp(x) \log {q(x)}=-y\log\hat y-(1-y)\log(1-\hat y) </script>
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐