习题解答

答 案 仅 供 参 考 \color{red}{答案仅供参考}

4.1

μ ^ , σ ^ 2 = argmax μ , σ 2 [ ∑ i = 1 I log [ Norm x i [ μ , σ 2 ] ] ] = argmax μ , σ 2 [ − 0.5 I log [ 2 π ] − 0.5 I log σ 2 − 0.5 ∑ i = 1 I ( x i − μ ) 2 σ 2 ] \begin{aligned} \hat\mu,\hat\sigma^2 & = \underset{\mu,\sigma^2}{\text{argmax}} \left[ \sum_{i=1}^I \text{log} \left[ \text{Norm}_{x_i} [\mu,\sigma^2]\right] \right] \\ & = \underset{\mu,\sigma^2}{\text{argmax}} \left[ -0.5I\text{log}[2\pi]-0.5I\text{log}\sigma^2-0.5\sum_{i=1}^I \frac{(x_i-\mu)^2}{\sigma^2} \right] \end{aligned} μ^,σ^2=μ,σ2argmax[i=1Ilog[Normxi[μ,σ2]]]=μ,σ2argmax[0.5Ilog[2π]0.5Ilogσ20.5i=1Iσ2(xiμ)2]
求 似 然 对 数 L 对 σ 2 的 微 分 , 并 令 结 果 为 0 求似然对数L对\sigma^2的微分,并令结果为0 Lσ20
∂ L ∂ σ 2 = − 0.5 I 1 σ 2 + 0.5 ∑ i = 1 I ( x i − μ ) 2 σ 4 = 0 \begin{aligned} \frac{\partial L}{\partial \sigma^2} & = -0.5I \frac{1}{\sigma^2}+0.5 \sum_{i=1}^I \frac{(x_i-\mu)^2}{\sigma^4}=0 \end{aligned} σ2L=0.5Iσ21+0.5i=1Iσ4(xiμ)2=0
整 理 得 到 整理得到
σ ^ 2 = ∑ i = 1 I ( x i − μ ^ ) 2 I \hat{\sigma}^2=\sum_{i=1}^I \frac{(x_i-\hat\mu)^2}{I} σ^2=i=1II(xiμ^)2
得 证 得证

4.2

μ ^ , σ ^ 2 = argmax μ , σ 2 [ ∑ i = 1 I log [ Norm x i [ μ , σ 2 ] ] + log [ NormInvGam μ , σ 2 [ α , β , γ , δ ] ] ] = argmax μ , σ 2 [ − 0.5 I log [ 2 π ] − 0.5 I log σ 2 − 0.5 ∑ i = 1 I ( x i − μ ) 2 σ 2 + log [ γ β α 2 π Γ [ α ] ] − ( α + 1.5 ) log [ σ 2 ] − 2 β + γ ( δ − μ ) 2 2 σ 2 ] \begin{aligned} \hat\mu,\hat\sigma^2 & = \underset{\mu,\sigma^2}{\text{argmax}} \left[ \sum_{i=1}^I \text{log} \left[ \text{Norm}_{x_i} [\mu,\sigma^2]\right] + \text{log} [\text{NormInvGam}_{\mu,\sigma^2}[\alpha,\beta,\gamma,\delta]] \right] \\ & = \underset{\mu,\sigma^2}{\text{argmax}} \left[ -0.5I\text{log}[2\pi]-0.5I\text{log}\sigma^2-0.5\sum_{i=1}^I \frac{(x_i-\mu)^2}{\sigma^2} + \text{log} \left[ \frac{\sqrt\gamma \beta^{\alpha}}{\sqrt{2\pi}\Gamma[\alpha]} \right] -(\alpha+1.5)\text{log}[\sigma^2]-\frac{2\beta+\gamma(\delta-\mu)^2}{2\sigma^2} \right] \end{aligned} μ^,σ^2=μ,σ2argmax[i=1Ilog[Normxi[μ,σ2]]+log[NormInvGamμ,σ2[α,β,γ,δ]]]=μ,σ2argmax[0.5Ilog[2π]0.5Ilogσ20.5i=1Iσ2(xiμ)2+log[2π Γ[α]γ βα](α+1.5)log[σ2]2σ22β+γ(δμ)2]
求 似 然 对 数 L 对 μ 的 微 分 , 并 令 结 果 为 0 求似然对数L对\mu的微分,并令结果为0 Lμ0
∂ L ∂ μ = ∑ i = 1 I x i − μ σ 2 + γ ( δ − μ ) σ 2 = ∑ i = 1 I x i − I μ + γ δ − γ μ σ 2 = 0 \begin{aligned} \frac{\partial L}{\partial \mu} & = \sum_{i=1}^I\frac{x_i-\mu}{\sigma^2}+\frac{\gamma(\delta-\mu)}{\sigma^2} \\ & =\frac{\sum_{i=1}^I x_i -I\mu+\gamma\delta-\gamma\mu}{\sigma^2} \\ & = 0 \end{aligned} μL=i=1Iσ2xiμ+σ2γ(δμ)=σ2i=1IxiIμ+γδγμ=0
整 理 得 到 整理得到
μ ^ = ∑ i = 1 I x i + γ δ I + γ \hat{\mu}=\frac{\sum_{i=1}^I x_i+\gamma\delta}{I+\gamma} μ^=I+γi=1Ixi+γδ
同 理 求 似 然 对 数 L 对 σ 2 的 微 分 , 并 令 结 果 为 0 同理求似然对数L对\sigma^2的微分,并令结果为0 Lσ20
∂ L ∂ σ 2 = − I 2 σ 2 + ∑ ( x i − μ ) 2 2 σ 4 − 2 α + 3 2 σ 2 + 2 β + γ ( δ − μ ) 2 2 σ 4 = ∑ ( x i − μ ) 2 + 2 β + γ ( δ − μ ) 2 2 σ 4 − I + 3 + 2 α 2 σ 2 = 0 \begin{aligned} \frac{\partial L}{\partial \sigma^2} & = - \frac{I}{2\sigma^2}+\frac{\sum (x_i-\mu)^2}{2\sigma^4} - \frac{2\alpha+3}{2\sigma^2}+\frac{2\beta+\gamma(\delta-\mu)^2}{2\sigma^4} \\ & = \frac{\sum(x_i-\mu)^2+2\beta+\gamma(\delta-\mu)^2}{2\sigma^4}-\frac{I+3+2\alpha}{2\sigma^2} \\ & = 0 \end{aligned} σ2L=2σ2I+2σ4(xiμ)22σ22α+3+2σ42β+γ(δμ)2=2σ4(xiμ)2+2β+γ(δμ)22σ2I+3+2α=0
整 理 得 到 整理得到
σ 2 ^ = ∑ i = 1 I ( x i − μ ) 2 + 2 β + γ ( δ − μ ) 2 I + 3 + 2 α \hat{\sigma^2}=\frac{\sum_{i=1}^I(x_i-\mu)^2+2\beta+\gamma(\delta-\mu)^2}{I+3+2\alpha} σ2^=I+3+2αi=1I(xiμ)2+2β+γ(δμ)2

4.3

已 知 已知
L = ∑ k = 1 6 N k log [ λ k ] + ν ( ∑ k = 1 6 λ k − 1 ) L=\sum_{k=1}^6N_k\text{log}[\lambda_k]+\nu\left( \sum_{k=1}^6\lambda_k-1 \right) L=k=16Nklog[λk]+ν(k=16λk1)
求 似 然 对 数 L 对 λ k 的 微 分 , 并 令 结 果 为 0 求似然对数L对\lambda_k的微分,并令结果为0 Lλk0
∂ L ∂ λ k = N k λ k + ν = 0 \begin{aligned} \frac{\partial L}{\partial \lambda_k} & = \frac{N_k}{\lambda_k}+\nu \\ & = 0 \end{aligned} λkL=λkNk+ν=0
整 理 得 到 整理得到
λ k ^ = N k − ν \hat{\lambda_k}=\frac{N_k}{-\nu} λk^=νNk
又 因 为 又因为
∑ k = 1 6 λ k = 1 \sum_{k=1}^6 \lambda_k=1 k=16λk=1
所 以 所以
− ν = ∑ m = 1 6 N m -\nu=\sum_{m=1}^6N_m ν=m=16Nm
综 上 综上
λ k ^ = N k ∑ m = 1 6 N m \hat{\lambda_k}=\frac{N_k}{\sum_{m=1}^6N_m} λk^=m=16NmNk
得 证 得证

4.4

已 知 已知
λ ^ 1 ⋯ 6 = argmax λ 1 ⋯ 6 [ ∏ i = 1 I P r ( x i ∣ λ 1 ⋯ 6 ) P r ( λ 1 ⋯ 6 ) ] = argmax λ 1 ⋯ 6 [ ∏ i = 1 I Cat x i [ λ 1 ⋯ 6 ] Dir λ 1 ⋯ 6 [ α 1 ⋯ 6 ] ] = argmax λ 1 ⋯ 6 [ ∏ k = 1 6 λ k N k ⋅ ( 与 λ k 无 关 的 量 ) ⋅ ∏ k = 1 6 λ k α k − 1 ] = argmax λ 1 ⋯ 6 [ ∏ k = 1 6 λ k N k + α k − 1 ] \begin{aligned} \hat{\lambda}_{1\cdots6} & =\underset{\lambda_{1\cdots6}} {\text{argmax}} \left[ \prod_{i=1}^I Pr(x_i|\lambda_{1\cdots6})Pr(\lambda_{1\cdots6}) \right] \\ & =\underset{\lambda_{1\cdots6}} {\text{argmax}} \left[ \prod_{i=1}^I \text{Cat}_{x_i}[\lambda_{1\cdots6}] \text{Dir}_{\lambda_{1\cdots6}}[\alpha_{1\cdots6}] \right] \\ & = \underset{\lambda_{1\cdots6}} {\text{argmax}} \left[ \prod_{k=1}^6\lambda_k^{N_k}\cdot(与\lambda_k 无关的量)\cdot \prod_{k=1}^6 \lambda_k^{\alpha_k-1} \right] \\ & = \underset{\lambda_{1\cdots6}} {\text{argmax}} \left[ \prod_{k=1}^6 \lambda_k^{N_k+\alpha_k-1} \right] \end{aligned} λ^16=λ16argmax[i=1IPr(xiλ16)Pr(λ16)]=λ16argmax[i=1ICatxi[λ16]Dirλ16[α16]]=λ16argmax[k=16λkNk(λk)k=16λkαk1]=λ16argmax[k=16λkNk+αk1]
通 过 拉 格 朗 日 因 子 增 强 约 束 , 似 然 对 数 为 通过拉格朗日因子增强约束,似然对数为
L = ∑ k = 1 6 ( N k + α k − 1 ) log [ λ k ] + ν ( ∑ k = 1 6 λ k − 1 ) L=\sum_{k=1}^6(N_k+\alpha_k-1) \text{log}[\lambda_k]+\nu(\sum_{k=1}^6\lambda_k-1) L=k=16(Nk+αk1)log[λk]+ν(k=16λk1)
求 似 然 对 数 L 对 λ k 的 微 分 , 并 令 结 果 为 0 求似然对数L对\lambda_k的微分,并令结果为0 Lλk0
∂ L ∂ λ k = N k + α k − 1 λ k + ν = 0 \begin{aligned} \frac{\partial L}{\partial \lambda_k} & = \frac{N_k+\alpha_k-1}{\lambda_k}+\nu \\ & = 0 \end{aligned} λkL=λkNk+αk1+ν=0
整 理 得 到 整理得到
λ k ^ = N k + α k − 1 − ν \hat{\lambda_k}=\frac{N_k+\alpha_k-1}{-\nu} λk^=νNk+αk1
又 因 为 又因为
∑ k = 1 6 λ k = 1 \sum_{k=1}^6 \lambda_k=1 k=16λk=1
所 以 所以
− ν = ∑ m = 1 6 ( N m + α m − 1 ) -\nu=\sum_{m=1}^6(N_m+\alpha_m-1) ν=m=16(Nm+αm1)
综 上 综上
λ k ^ = N k + α k − 1 ∑ m = 1 6 ( N m + α m − 1 ) \hat{\lambda_k}=\frac{N_k+\alpha_k-1}{\sum_{m=1}^6(N_m+\alpha_m-1)} λk^=m=16(Nm+αm1)Nk+αk1
得 证 得证

4.5

( i ) (i) (i)
P r ( x 1 ⋯ I ) = ∫ ∏ i = 1 I P r ( x i ∣ θ ) P r ( θ ) d θ = ∬ ∏ i = 1 I Norm x i [ μ , σ 2 ] ⋅ NormInvGam μ , σ 2 [ α , β . γ , δ ] d μ d σ 2 = ∬ κ ⋅ NormInvGam μ , σ 2 [ α ~ , β ~ , γ ~ , δ ~ ] d μ d σ 2 = κ = b a l a b a l a \begin{aligned} Pr(x_{1\cdots I}) & = \int\prod_{i=1}^I Pr(x_i|\theta)Pr(\theta) \text d\theta \\ & = \iint \prod_{i=1}^I \text{Norm}_{x_i}[\mu,\sigma^2] \cdot \text{NormInvGam}_{\mu,\sigma^2}[\alpha,\beta.\gamma,\delta]\text d\mu \text d\sigma^2 \\ & = \iint \kappa \cdot\text{NormInvGam}_{\mu,\sigma^2}[\widetilde\alpha,\widetilde\beta,\widetilde\gamma,\widetilde\delta]\text d\mu \text d\sigma^2 \\ & = \kappa=balabala \end{aligned} Pr(x1I)=i=1IPr(xiθ)Pr(θ)dθ=i=1INormxi[μ,σ2]NormInvGamμ,σ2[α,β.γ,δ]dμdσ2=κNormInvGamμ,σ2[α ,β ,γ ,δ ]dμdσ2=κ=balabala

( i i ) (ii) (ii)
P r ( x 1 ⋯ I ) = ∫ ∏ i = 1 I P r ( x i ∣ θ ) P r ( θ ) d θ = ∫ ∏ i = 1 I Cat x i [ λ 1 ⋯ I ] ⋅ Dir λ 1 ⋯ I [ α 1 ⋯ I ] d λ 1 ⋯ I = ∫ κ ⋅ Dir λ 1 ⋯ I [ α ~ 1 ⋯ I ] d λ 1 ⋯ I = κ = b a l a b a l a \begin{aligned} Pr(x_{1\cdots I}) & = \int\prod_{i=1}^I Pr(x_i|\theta)Pr(\theta) \text d\theta \\ & = \int \prod_{i=1}^I \text{Cat}_{x_i}[\lambda_{1\cdots I}] \cdot \text{Dir}_{\lambda_{1\cdots I}}[\alpha_{1\cdots I}] \text d\lambda_{1\cdots I} \\ & = \int \kappa \cdot\text{Dir}_{\lambda_{1\cdots I}}[\widetilde\alpha_{1\cdots I}] \text d\lambda_{1\cdots I} \\ & = \kappa=balabala \end{aligned} Pr(x1I)=i=1IPr(xiθ)Pr(θ)dθ=i=1ICatxi[λ1I]Dirλ1I[α1I]dλ1I=κDirλ1I[α 1I]dλ1I=κ=balabala

4.6

T o D o \color{red}{ToDo} ToDo

4.7

λ ^ = argmax λ [ ∑ i = 1 I log [ Bern x i [ λ ] ] ] = argmax λ [ ( ∑ i = 1 I x i ) log [ λ ] + ( ∑ i = 1 I 1 − x i ) log [ 1 − λ ] ] \begin{aligned} \hat\lambda & = \underset{\lambda}{\text{argmax}} \left[ \sum_{i=1}^I \text{log} \left[ \text{Bern}_{x_i} [\lambda]\right] \right] \\ & = \underset{\lambda}{\text{argmax}} \left[\left(\sum_{i=1}^I x_i \right) \text{log}[\lambda] +\left(\sum_{i=1}^I 1-x_i \right) \text{log}[1-\lambda] \right] \end{aligned} λ^=λargmax[i=1Ilog[Bernxi[λ]]]=λargmax[(i=1Ixi)log[λ]+(i=1I1xi)log[1λ]]
求 似 然 对 数 L 对 λ 的 微 分 , 并 令 结 果 为 0 求似然对数L对\lambda的微分,并令结果为0 Lλ0
∂ L ∂ λ = ∑ i = 1 I x i λ − ∑ i = 1 I 1 − x i 1 − λ = 0 \begin{aligned} \frac{\partial L}{\partial \lambda} & =\frac{ \sum_{i=1}^Ix_i}{\lambda}-\frac{ \sum_{i=1}^I 1-x_i}{1-\lambda} \\ & = 0 \end{aligned} λL=λi=1Ixi1λi=1I1xi=0
整 理 得 到 整理得到
λ ^ = ∑ i = 1 I x i I \hat{\lambda}=\frac{\sum_{i=1}^I x_i}{I} λ^=Ii=1Ixi

4.8

λ ^ = argmax λ [ log [ ∏ i = 1 I P r ( x i ∣ λ ) P r ( λ ) P r ( x 1 ⋯ I ) ] ] = argmax λ [ log [ ∏ i = 1 I P r ( x i ∣ λ ) P r ( λ ) ] ] = argmax λ [ ∑ i = 1 I log [ Bern x i [ λ ] ] + log [ Beta λ [ α , β ] ] ] = argmax λ [ ( ∑ i = 1 I x i ) log [ λ ] + ( ∑ i = 1 I 1 − x i ) log [ 1 − λ ] + ( α − 1 ) log [ λ ] + ( β − 1 ) log [ 1 − λ ] ] \begin{aligned} \hat{\lambda} & =\underset{\lambda} {\text{argmax}} \left[ \text{log} \left[\frac{ \prod_{i=1}^I Pr(x_i|\lambda)Pr(\lambda)}{Pr(x_{1\cdots I})} \right] \right] \\ & =\underset{\lambda} {\text{argmax}} \left[ \text{log} \left[\prod_{i=1}^I Pr(x_i|\lambda)Pr(\lambda)\right] \right] \\ & =\underset{\lambda} {\text{argmax}} \left[\sum_{i=1}^I \text{log} \left[ \text{Bern}_{x_i} [\lambda]\right]+ \text{log} [\text{Beta}_{\lambda} [\alpha,\beta] ]\right] \\ & = \underset{\lambda}{\text{argmax}} \left[\left(\sum_{i=1}^I x_i \right) \text{log}[\lambda] +\left(\sum_{i=1}^I 1-x_i \right) \text{log}[1-\lambda] +(\alpha-1) \text{log}[\lambda]+(\beta-1) \text{log}[1-\lambda] \right] \end{aligned} λ^=λargmax[log[Pr(x1I)i=1IPr(xiλ)Pr(λ)]]=λargmax[log[i=1IPr(xiλ)Pr(λ)]]=λargmax[i=1Ilog[Bernxi[λ]]+log[Betaλ[α,β]]]=λargmax[(i=1Ixi)log[λ]+(i=1I1xi)log[1λ]+(α1)log[λ]+(β1)log[1λ]]

求 似 然 对 数 L 对 λ 的 微 分 , 并 令 结 果 为 0 求似然对数L对\lambda的微分,并令结果为0 Lλ0
∂ L ∂ λ = ∑ i = 1 I x i λ − ∑ i = 1 I 1 − x i 1 − λ + α − 1 λ − β − 1 1 − λ = 0 \begin{aligned} \frac{\partial L}{\partial \lambda} & =\frac{ \sum_{i=1}^Ix_i}{\lambda}-\frac{ \sum_{i=1}^I 1-x_i}{1-\lambda} +\frac{\alpha-1}{\lambda} -\frac{\beta-1}{1-\lambda}\\ & = 0 \end{aligned} λL=λi=1Ixi1λi=1I1xi+λα11λβ1=0
整 理 得 到 整理得到
λ ^ = ∑ i = 1 I x i + α − 1 I + α + β − 2 \hat{\lambda}=\frac{\sum_{i=1}^I x_i+\alpha-1}{I+\alpha+\beta-2} λ^=I+α+β2i=1Ixi+α1

4.9

(i)
P r ( λ ∣ x 1 ⋯ I ) = P r ( x 1 ⋯ I ∣ λ ) P r ( λ ) P r ( x 1 ⋯ I ) = ∏ i = 1 I Bern x i [ λ ] ⋅ Beta λ [ α , β ] P r ( x 1 ⋯ I ) = κ ⋅ Beta λ [ α ~ , β ~ ] P r ( x 1 ⋯ I ) = Beta λ [ α ~ , β ~ ] \begin{aligned} Pr(\lambda|x_{1\cdots I}) & = \frac{Pr(x_{1\cdots I}|\lambda) Pr(\lambda) }{Pr(x_{1 \cdots I})} \\ & =\frac{\prod_{i=1}^I \text{Bern}_{x_i}[\lambda]\cdot \text{Beta}_{\lambda}[\alpha,\beta]} {Pr(x_{1 \cdots I})} \\ & =\frac{\kappa \cdot \text{Beta}_{\lambda}[\widetilde\alpha,\widetilde\beta]} {Pr(x_{1 \cdots I})} \\ & = \text{Beta}_{\lambda}[\widetilde\alpha,\widetilde\beta] \end{aligned} Pr(λx1I)=Pr(x1I)Pr(x1Iλ)Pr(λ)=Pr(x1I)i=1IBernxi[λ]Betaλ[α,β]=Pr(x1I)κBetaλ[α ,β ]=Betaλ[α ,β ]

(ii)
P r ( x ∗ ∣ x 1 ⋯ I ) = ∫ P r ( x ∗ ∣ λ ) P r ( λ ∣ x 1 ⋯ I ) d λ = ∫ Bern x ∗ [ λ ] ⋅ Beta λ [ α ~ , β ~ ] d λ = ∫ κ ( x ∗ , α ~ , β ~ ) Beta λ [ α ˘ , β ˘ ] d λ = κ ( x ∗ , α ~ , β ~ ) \begin{aligned} Pr(x^*|x_{1\cdots I}) & = \int Pr(x^*|\lambda)Pr(\lambda|x_{1\cdots I}) \text d\lambda \\ & = \int \text{Bern}_{x^*}[\lambda]\cdot \text{Beta}_{\lambda}[\widetilde\alpha,\widetilde\beta] d\lambda \\ & = \int \kappa(x^*,\widetilde\alpha,\widetilde\beta) \text{Beta}_{\lambda} [\breve\alpha,\breve\beta] d\lambda \\ & = \kappa(x^*,\widetilde\alpha,\widetilde\beta) \end{aligned} Pr(xx1I)=Pr(xλ)Pr(λx1I)dλ=Bernx[λ]Betaλ[α ,β ]dλ=κ(x,α ,β )Betaλ[α˘,β˘]dλ=κ(x,α ,β )

4.10

方 法 与 上 面 一 致 , 过 程 略 去 方法与上面一致,过程略去
(i)
λ ^ = ∑ x i I = 0 \hat\lambda=\frac{\sum x_i}{I}=0 λ^=Ixi=0
计 算 P r ( x ∗ ∣ λ ^ ) 计算Pr(x^*|\hat\lambda) Pr(xλ^)

(ii)
λ ^ = ∑ i = 1 I x i + α − 1 I + α + β − 2 = 0 \hat{\lambda}=\frac{\sum_{i=1}^I x_i+\alpha-1}{I+\alpha+\beta-2}=0 λ^=I+α+β2i=1Ixi+α1=0
计 算 P r ( x ∗ ∣ λ ^ ) 计算Pr(x^*|\hat\lambda) Pr(xλ^)
(iii)
计 算 P r ( x ∗ ∣ x 1 ⋯ 4 ) = κ ( x ∗ , α ~ , β ~ ) 计算Pr(x^*|x_{1\cdots 4})= \kappa(x^*,\widetilde\alpha,\widetilde\beta) Pr(xx14)=κ(x,α ,β )

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐