机器学习笔记(VI)线性模型(II)多维最小二乘法
数据集是D={(x1,y1),(x2,y2),…,(xm,ym)}其中xi=(xi1;xi2;…;xid),yi∈RD=\left\{(\mathbf{x_1},y_1),(\mathbf{x_2},y_2),\dots,(\mathbf{x_m},y_m)\right\}\\\text{其中}\\\mathbf{x_i}=(x_{i1};x_{i2};\dots;x_{id}),y_i\
·
数据集是
D={(x1,y1),(x2,y2),…,(xm,ym)}其中xi=(xi1;xi2;…;xid),yi∈R
<script type="math/tex; mode=display" id="MathJax-Element-1"> D=\left\{(\mathbf{x_1},y_1),(\mathbf{x_2},y_2),\dots,(\mathbf{x_m},y_m)\right\}\\ \text{其中}\\ \mathbf{x_i}=(x_{i1};x_{i2};\dots;x_{id}),y_i\in\mathbb{R} </script>
此时试图学得
f(xi)=wTxi+b,使得f(xi)≈yi
<script type="math/tex; mode=display" id="MathJax-Element-2"> f(\mathbf{x_i})=w^T\mathbf{x_i}+b,使得f(\mathbf{x_i})\approx{y_i} </script>
也称为多元线性回归
此时可以使用最小二乘法来对w<script type="math/tex" id="MathJax-Element-3">\mathbf{w}</script>和b<script type="math/tex" id="MathJax-Element-4">b</script>进行估计
步骤:
1:将
2:将数据集D<script type="math/tex" id="MathJax-Element-8">D</script>表示为一个
3:把标记写成向量形式
y=(y1;y2;⋯;ym)
<script type="math/tex; mode=display" id="MathJax-Element-12"> \mathbf {y}=(y_1;y_2;\cdots;y_m) </script>
于是类似一维形式
w^∗=argminw^(y−Xw^)T(y−Xw^)
<script type="math/tex; mode=display" id="MathJax-Element-13"> \hat{\mathbf {w}}^*=\mathop{\arg\min}\limits_{\hat{\mathbf {w}}}(\mathbf {y}-X\hat{\mathbf {w}})^{T}(\mathbf {y}-X\hat{\mathbf {w}}) </script>
4:令Ew^=(y−Xw^)T(y−Xw^)<script type="math/tex" id="MathJax-Element-14">E_{\hat{\mathbf {w}}}=(\mathbf {y}-X\hat{\mathbf {w}})^{T}(\mathbf {y}-X\hat{\mathbf {w}})</script>,对w^<script type="math/tex" id="MathJax-Element-15">\hat{\mathbf {w}}</script>进行求导
∂Ew^∂w^
<script type="math/tex; mode=display" id="MathJax-Element-16"> \dfrac{\partial{E_{\hat{\mathbf {w}}}}}{\partial{\hat{\mathbf {w}}}} </script>
展开
(y−Xw^)T(y−Xw^)=(yT−w^TXT)(y−Xw^)=yTy−yTXw^−w^TXTy+w^TXTXw^(1)
<script type="math/tex; mode=display" id="MathJax-Element-17"> \begin{aligned} (\mathbf {y}-X\hat{\mathbf {w}})^{T}(\mathbf {y}-X\hat{\mathbf {w}})&=(\mathbf {y}^T-\hat{\mathbf {w}}^TX^T)(\mathbf {y}-X\hat{\mathbf {w}})\\ &=\mathbf {y}^T\mathbf {y}-\mathbf {y}^TX\hat{\mathbf {w}}-\hat{\mathbf {w}}^TX^T\mathbf {y}+\hat{\mathbf{w}}^TX^TX\hat{\mathbf{w}}\tag{1}\\ \end{aligned} </script>
如何对式1<script type="math/tex" id="MathJax-Element-18">1</script>进行化简
一共有三个部分
第一个部分:
∂yTy∂w^=0
<script type="math/tex; mode=display" id="MathJax-Element-5128"> \dfrac{\partial{\mathbf{y^Ty}}}{\partial{\mathbf{\hat{w}}}}=0 </script>
因为对w^<script type="math/tex" id="MathJax-Element-5129">\mathbf{\hat{w}}</script>求导,yTy<script type="math/tex" id="MathJax-Element-5130">\mathbf{y}^T\mathbf{y}</script>相当于常数,因此求偏导的结果是0
第二个部分:
对于
yTXw^+w^TXTy(2)
<script type="math/tex; mode=display" id="MathJax-Element-5131"> \mathbf {y}^TX\hat{\mathbf {w}}+\hat{\mathbf {w}}^TX^T\mathbf {y}\tag{2} </script>
在这里yTXw^<script type="math/tex" id="MathJax-Element-5132">\mathbf {y}^TX\hat{\mathbf {w}}</script>和w^TXTy<script type="math/tex" id="MathJax-Element-5133">\hat{\mathbf {w}}^TX^T\mathbf {y}</script>都是1×1<script type="math/tex" id="MathJax-Element-5134">1\times1</script>的矩阵此时
yTXw^=(w^TXTy)T
<script type="math/tex; mode=display" id="MathJax-Element-5135"> \mathbf {y}^TX\hat{\mathbf {w}}=(\hat{\mathbf {w}}^TX^T\mathbf {y})^T </script>
对于1×1<script type="math/tex" id="MathJax-Element-5136">1\times1</script>的矩阵A<script type="math/tex" id="MathJax-Element-5137">\mathbf{A}</script>有AT=A<script type="math/tex" id="MathJax-Element-5138">\mathbf{A}^T=\mathbf{A}</script>
因此对于式(2)<script type="math/tex" id="MathJax-Element-5139">(2)</script>有
(2)=2(yTXw^)
<script type="math/tex; mode=display" id="MathJax-Element-5140"> (2)=2(\mathbf {y}^TX\hat{\mathbf {w}}) </script>
于是
∂yTXw^∂w^=?
<script type="math/tex; mode=display" id="MathJax-Element-5141"> \dfrac{\partial{\mathbf {y}^TX\hat{\mathbf {w}}}}{\partial{\mathbf{\hat{w}}}}=? </script>
分开来看
yT=(y1,y2,…,ym);X=⎛⎝⎜⎜⎜⎜⎜x11x21⋮xm1x12x22⋮xm2……⋱…x1dx2d⋮xmd11⋮1⎞⎠⎟⎟⎟⎟⎟;w^=(w1;w2;…;wd;b);
<script type="math/tex; mode=display" id="MathJax-Element-5142"> \mathbf{y^T}=(y_1,y_2,\dots,y_m);\\ X=\left( \begin{matrix} x_{11}& x_{12}&\dots&x_{1d}&1 \\ x_{21}& x_{22}&\dots&x_{2d}&1 \\ \vdots&\vdots&\ddots&\vdots&\vdots&\\ x_{m1}& x_{m2}&\dots&x_{md}&1 \\ \end{matrix} \right);\\ \mathbf{\hat{w}}=(w_1;w_2;\dots;w_d;b); </script>
相乘的结果
yTX=(∑i=1mxi1yi,∑i=1mxi2yi…,∑i=1mxidyi,∑i=1myi)(part1)
<script type="math/tex; mode=display" id="MathJax-Element-5143"> \mathbf{y^T}X=\left( \sum\limits_{i=1}^{m}x_{i1}y_i, \sum\limits_{i=1}^{m}x_{i2}y_i\dots,\sum\limits_{i=1}^{m}x_{id}y_i,\sum\limits_{i=1}^{m}y_i \right)\tag{part1} </script>
(part1)w^=(∑i=1mxi1yi,∑i=1mxi2yi…,∑i=1mxidyi,∑i=1myi)×(w1;w2;…;wd;b)=∑j=1d∑i=1mxijyiwj+b∑i=1myi(part1sum)
<script type="math/tex; mode=display" id="MathJax-Element-5144"> \begin{aligned} (part1)\mathbf{\hat{w}}&=\left( \sum\limits_{i=1}^{m}x_{i1}y_i, \sum\limits_{i=1}^{m}x_{i2}y_i\dots,\sum\limits_{i=1}^{m}x_{id}y_i,\sum\limits_{i=1}^{m}y_i \right)\times(w_1;w_2;\dots;w_d;b)\\ &=\sum\limits_{j=1}^{d}\sum\limits_{i=1}^{m}x_{ij}y_iw_j+b\sum\limits_{i=1}^{m}y_i \end{aligned}\tag{part1sum} </script>
求导
∂part1sum∂w^=⎛⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜∂part1sum∂w1∂part1sum∂w2⋮∂part1sum∂wd∂part1sum∂b⎞⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟=⎛⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜∑i=1mxi1yi∑i=1mxi2yi⋮∑i=1mxidyi∑i=1myi⎞⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
<script type="math/tex; mode=display" id="MathJax-Element-5145"> \dfrac{\partial{part1sum}}{\partial{\mathbf{\hat{w}}}}=\left( \begin{matrix} \dfrac{\partial{part1sum}}{\partial{w_1}}\\ \dfrac{\partial{part1sum}}{\partial{w_2}}\\ \vdots \\ \dfrac{\partial{part1sum}}{\partial{w_d}}\\ \dfrac{\partial{part1sum}}{\partial{b}} \end{matrix} \right)=\left( \begin{matrix} \sum\limits_{i=1}^{m}x_{i1}y_i\\ \sum\limits_{i=1}^{m}x_{i2}y_i\\ \vdots \\ \sum\limits_{i=1}^{m}x_{id}y_i\\ \sum\limits_{i=1}^{m}y_i \\ \end{matrix} \right) </script>
结果是一个(d+1)×1<script type="math/tex" id="MathJax-Element-5146">(d+1)\times1</script>的矩阵也就是列向量
而
XTy=⎛⎝⎜⎜⎜⎜⎜⎜⎜x11x12⋮x1d1x21x22⋮x2d1⋯⋯⋱⋯⋯xm1xm2⋮xmd1⎞⎠⎟⎟⎟⎟⎟⎟⎟×⎛⎝⎜⎜⎜⎜y1y2⋮ym⎞⎠⎟⎟⎟⎟=⎛⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜∑i=1mxi1yi∑i=1mxi2yi⋮∑i=1mxidyi∑i=1myi⎞⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟=∂part1sum∂w^
<script type="math/tex; mode=display" id="MathJax-Element-5147">X^T\mathbf{y}=\left(\begin{matrix} x_{11}&x_{21}&\cdots&x_{m1}\\ x_{12}&x_{22}&\cdots&x_{m2}\\ \vdots&\vdots&\ddots&\vdots\\ x_{1d}&x_{2d}&\cdots&x_{md}\\ 1&1&\cdots&1\\ \end{matrix} \right)\times\left(\begin{matrix} y_1\\ y_2\\ \vdots\\ y_m \end{matrix}\right)=\left( \begin{matrix} \sum\limits_{i=1}^{m}x_{i1}y_i\\ \sum\limits_{i=1}^{m}x_{i2}y_i\\ \vdots \\ \sum\limits_{i=1}^{m}x_{id}y_i\\ \sum\limits_{i=1}^{m}y_i \\ \end{matrix} \right)=\dfrac{\partial{part1sum}}{\partial{\mathbf{\hat{w}}}}</script>
同样的方法可以得到
∂(w^TXTXw^)∂w^=2XTXw^
<script type="math/tex; mode=display" id="MathJax-Element-5148"> \dfrac{\partial{(\hat{\mathbf{w}}^TX^TX\hat{\mathbf{w}})}}{\partial{\mathbf{\hat{w}}}}=2X^TX\hat{\mathbf{w}} </script>
于是得到最终结果
∂Ew^w^=2XT(Xw^−y)
<script type="math/tex; mode=display" id="MathJax-Element-5149"> \dfrac{\partial{E_{\hat{\mathbf{w}}}}}{ \hat{\mathbf{w}}}=2X^T(X\hat{\mathbf{w}}-\mathbf{y}) </script>
5:令求导结果等于0
∂Ew^w^=2XT(Xw^−y)XTXw^=0=XTy
<script type="math/tex; mode=display" id="MathJax-Element-5150"> \begin{aligned} \dfrac{\partial{E_{\hat{\mathbf{w}}}}}{ \hat{\mathbf{w}}}=2X^T(X\hat{\mathbf{w}}-\mathbf{y})&=0\\ X^TX\hat{\mathbf{w}}&=X^T\mathbf{y} \end{aligned} </script>
此时如果有解则:XTX<script type="math/tex" id="MathJax-Element-5151">X^TX</script>必须是可逆矩阵
所以得到:
w^∗=(XTX)−1XTy
<script type="math/tex; mode=display" id="MathJax-Element-5152"> \hat{\mathbf{w}}^*=(X^TX)^{-1}X^T\mathbf{y} </script>
因为我们试图学得
f(xi)=wTxi+b,使得f(xi)≈yi
<script type="math/tex; mode=display" id="MathJax-Element-5153"> f(\mathbf{x_i})=\mathbf{w}^T\mathbf{x_i}+b,使得f(\mathbf{x_i})\approx{y_i} </script>
但是我们在前面做出了一些调整:将w<script type="math/tex" id="MathJax-Element-5154"> \mathbf w</script>和b<script type="math/tex" id="MathJax-Element-5155">b</script>吸入向量形式
此时可以令x^i=(xi;1)<script type="math/tex" id="MathJax-Element-5157">\hat{x}_i=(x_i;1)</script>可以得到学得的模型是
f(xi)=(w;b)T(xi;1)→f(x^i)=w^Tx^i
<script type="math/tex; mode=display" id="MathJax-Element-5158"> f(x_i)=(\mathbf{w};b)^T(x_i;1)\rightarrow f(\hat{x}_i)=\hat{\mathbf{w}}^T\hat{x}_i </script>
然后将w^∗<script type="math/tex" id="MathJax-Element-5159">\hat{\mathbf{w}}^*</script>代入
得到:
f(x^i)=((XTX)−1XTy)Tx^i⇕f(x^i)=x^Ti(XTX)−1XTy
<script type="math/tex; mode=display" id="MathJax-Element-5160"> f(\hat{x}_i)=((X^TX)^{-1}X^T\mathbf{y})^T\hat{x}_i\\ \Updownarrow\\ f(\hat{x}_i)=\hat{x}_i^T(X^TX)^{-1}X^T\mathbf{y}\\ </script>更多推荐
所有评论(0)