交叉熵损失函数

  • 在计算交叉熵损失函数时,一般将Softmax函数与交叉熵函数统一实现。我们先推导
    Softmax 函数的梯度,再推导交叉熵函数的梯度

softmax函数梯度

  • softmax函数表达式
    pi=ezi∑k=1Kezk p_i = \frac{e^{z_i}}{\sum\limits_{k=1}^Ke^{z_k}} pi=k=1Kezkezi
    它的功能是将𝐾个输出节点的值转换为概率,并保证概率之和为 1。ziz_izi 是第 iii 个节点的输出,pip_ipi 是第 iii 个节点的输出经过softmax后的输出概率。

  • 偏导数,当i = j
    $$
    \begin{aligned}
    \frac{\partial p_i}{\partial z_j} &= \frac{\partial \frac{e{z_i}}{\sum\limits_{k=1}Ke^{z_k}}}{\partial z_j} \

    & = \frac{e^{z_i} * \sum\limits_{k=1}Ke{z_k} - e{z_i}*e{z_j} }{(\sum\limits_{k=1}Ke{z_k})^2} \
    & = \frac{e^{z_i} * (\sum\limits_{k=1}Ke{z_k} - e^{z_j} )}{(\sum\limits_{k=1}Ke{z_k})^2} \
    & = \frac{e^{z_i} }{\sum\limits_{k=1}Ke{z_k}}*\frac{\sum\limits_{k=1}Ke{z_k} - e{z_j}}{\sum\limits_{k=1}Ke^{z_k}}\
    &=p_i * (1-p_j)
    \end{aligned}
    KaTeX parse error: Unexpected character: '?' at position 13: 可以看到,上式是概率值?̲?𝑖和1 − 𝑝 的相乘,…
    \frac{\partial p_i}{\partial z_j} = p_i * (1-p_j) , i = j
    $$

  • 偏导数,当 i ≠\ne= j 时
    $$
    \begin{aligned}
    \frac{\partial p_i}{\partial z_j} &= \frac{\partial \frac{e{z_i}}{\sum\limits_{k=1}Ke^{z_k}}}{\partial z_j} \

    & = \frac{0 - e{z_i}*e{z_j} }{(\sum\limits_{k=1}Ke{z_k})^2} \
    & = -\frac{e{z_i}}{\sum\limits_{k=1}Ke^{z_k}}* \frac{e{z_j}}{\sum\limits_{k=1}Ke^{z_k}} \
    & = -p_i*p_j
    \end{aligned}
    $$

  • softmax偏导数表达式
    ∂pi∂zi={pi∗(1−pj)当i=j−pi∗pj当i≠j \frac{\partial p_i}{\partial z_i} = \left \{ \begin{array}{} p_i * (1-p_j) \quad当i=j \\ -p_i*p_j \quad\quad\quad 当i\ne j \end{array} \right. zipi={pi(1pj)i=jpipji=j


交叉熵梯度

  • 交叉熵损失函数表达式
    L=−∑kyklog⁡pk L = -\sum_k y_k\log {p_k} L=kyklogpk

  • 这里直接来推导最终损失值L对网络输出 logits 变量𝑧𝑖的偏导数,展开为:
    $$
    \begin{aligned}
    \frac{\partial L}{\partial z_i} & = -\sum_k y_k\frac{\partial \log {p_k}}{\partial z_i} \
    & = -\sum_k y_k\frac{\partial \log {p_k}}{\partial p_k} \frac{\partial p_k}{\partial z_i} \
    & = -\sum_k y_k\frac{1}{p_k} \frac{\partial p_k}{\partial z_i} \
    & 根据softmax偏导数表达式:\
    &=\left{
    \begin{array}{}
    -\sum_k y_k\frac{1}{p_k} p_i(1-p_k) \quad i=k \
    -\sum_k y_k\frac{1}{p_k} (-p_ip_k) \quad i\ne k
    \end{array}
    \right. \
    & = -y_i(1-p_i) - \sum_{k\ne i} y_k\frac{1}{p_k} (-p_i
    p_k) \
    & = -y_i+y_ip_i+\sum_{k\ne i}y_kp_i \
    & = p_i(y_i+\sum_{k\ne i}y_k) -y_i

    \end{aligned}
    $$
    完成交叉熵损失函数的梯度推导

  • 特别的,对于分类问题中标签𝑦通过 One-hot 编码的方式,则有如下关系:
    ∑kyk=1yi+∑k≠iyk=1 \sum_k y_k = 1 \\ y_i+\sum_{k\ne i} y_k = 1 kyk=1yi+k=iyk=1
    因此交叉熵损失函数的偏导数可以进一步简化为
    ∂L∂zi=pi−yi \frac{\partial L}{\partial z_i} = p_i-y_i ziL=piyi


Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐