数据集是

D={(x1,y1),(x2,y2),,(xm,ym)}xi=(xi1;xi2;;xid),yiR
<script type="math/tex; mode=display" id="MathJax-Element-1"> D=\left\{(\mathbf{x_1},y_1),(\mathbf{x_2},y_2),\dots,(\mathbf{x_m},y_m)\right\}\\ \text{其中}\\ \mathbf{x_i}=(x_{i1};x_{i2};\dots;x_{id}),y_i\in\mathbb{R} </script>
此时试图学得
f(xi)=wTxi+b,使f(xi)yi
<script type="math/tex; mode=display" id="MathJax-Element-2"> f(\mathbf{x_i})=w^T\mathbf{x_i}+b,使得f(\mathbf{x_i})\approx{y_i} </script>
也称为多元线性回归
此时可以使用最小二乘法来对w<script type="math/tex" id="MathJax-Element-3">\mathbf{w}</script>和b<script type="math/tex" id="MathJax-Element-4">b</script>进行估计
步骤:
1:将w<script type="math/tex" id="MathJax-Element-5"> \mathbf w</script>和b<script type="math/tex" id="MathJax-Element-6">b</script>吸入向量形式w^=(w;b),<script type="math/tex" id="MathJax-Element-7">\hat{\mathbf{w}}=(\mathbf{w};b),</script>
2:将数据集D<script type="math/tex" id="MathJax-Element-8">D</script>表示为一个m×(d+1)<script type="math/tex" id="MathJax-Element-9">m\times{(d+1)}</script>大小的矩阵X<script type="math/tex" id="MathJax-Element-10">X</script>
X=x11x21xm1x12x22xm2x1dx2dxmd111=xT1xT2xTm111
<script type="math/tex; mode=display" id="MathJax-Element-11"> X=\left( \begin{matrix} x_{11}& x_{12}&\dots&x_{1d}&1 \\ x_{21}& x_{22}&\dots&x_{2d}&1 \\ \vdots&\vdots&\ddots&\vdots&\vdots&\\ x_{m1}& x_{m2}&\dots&x_{md}&1 \\ \end{matrix} \right)=\left( \begin{matrix} \mathbf x_{1}^{T}& 1 \\ \mathbf x_{2}^{T}& 1 \\ \vdots& \vdots \\ \mathbf x_{m}^{T}& 1 \\ \end{matrix} \right) </script>
3:把标记写成向量形式
y=(y1;y2;;ym)
<script type="math/tex; mode=display" id="MathJax-Element-12"> \mathbf {y}=(y_1;y_2;\cdots;y_m) </script>
于是类似一维形式
w^=argminw^(yXw^)T(yXw^)
<script type="math/tex; mode=display" id="MathJax-Element-13"> \hat{\mathbf {w}}^*=\mathop{\arg\min}\limits_{\hat{\mathbf {w}}}(\mathbf {y}-X\hat{\mathbf {w}})^{T}(\mathbf {y}-X\hat{\mathbf {w}}) </script>
4:令Ew^=(yXw^)T(yXw^)<script type="math/tex" id="MathJax-Element-14">E_{\hat{\mathbf {w}}}=(\mathbf {y}-X\hat{\mathbf {w}})^{T}(\mathbf {y}-X\hat{\mathbf {w}})</script>,对w^<script type="math/tex" id="MathJax-Element-15">\hat{\mathbf {w}}</script>进行求导
Ew^w^
<script type="math/tex; mode=display" id="MathJax-Element-16"> \dfrac{\partial{E_{\hat{\mathbf {w}}}}}{\partial{\hat{\mathbf {w}}}} </script>
展开
(yXw^)T(yXw^)=(yTw^TXT)(yXw^)=yTyyTXw^w^TXTy+w^TXTXw^(1)
<script type="math/tex; mode=display" id="MathJax-Element-17"> \begin{aligned} (\mathbf {y}-X\hat{\mathbf {w}})^{T}(\mathbf {y}-X\hat{\mathbf {w}})&=(\mathbf {y}^T-\hat{\mathbf {w}}^TX^T)(\mathbf {y}-X\hat{\mathbf {w}})\\ &=\mathbf {y}^T\mathbf {y}-\mathbf {y}^TX\hat{\mathbf {w}}-\hat{\mathbf {w}}^TX^T\mathbf {y}+\hat{\mathbf{w}}^TX^TX\hat{\mathbf{w}}\tag{1}\\ \end{aligned} </script>
如何对式1<script type="math/tex" id="MathJax-Element-18">1</script>进行化简
yTyyTXw^w^TXTy+w^TXTXw^(yTy)(yTXw^+w^TXTy)+(w^TXTXw^)
<script type="math/tex; mode=display" id="MathJax-Element-19"> \mathbf {y}^T\mathbf {y}-\mathbf {y}^TX\hat{\mathbf {w}}-\hat{\mathbf {w}}^TX^T\mathbf {y}+\hat{\mathbf{w}}^TX^TX\hat{\mathbf{w}}\\ \downarrow\downarrow\\ (\mathbf {y}^T\mathbf {y})-(\mathbf {y}^TX\hat{\mathbf {w}}+\hat{\mathbf {w}}^TX^T\mathbf {y})+(\hat{\mathbf{w}}^TX^TX\hat{\mathbf{w}}) </script>
一共有三个部分
第一个部分:

yTyw^=0
<script type="math/tex; mode=display" id="MathJax-Element-5128"> \dfrac{\partial{\mathbf{y^Ty}}}{\partial{\mathbf{\hat{w}}}}=0 </script>
因为对w^<script type="math/tex" id="MathJax-Element-5129">\mathbf{\hat{w}}</script>求导,yTy<script type="math/tex" id="MathJax-Element-5130">\mathbf{y}^T\mathbf{y}</script>相当于常数,因此求偏导的结果是0
第二个部分:
对于
yTXw^+w^TXTy(2)
<script type="math/tex; mode=display" id="MathJax-Element-5131"> \mathbf {y}^TX\hat{\mathbf {w}}+\hat{\mathbf {w}}^TX^T\mathbf {y}\tag{2} </script>
在这里yTXw^<script type="math/tex" id="MathJax-Element-5132">\mathbf {y}^TX\hat{\mathbf {w}}</script>和w^TXTy<script type="math/tex" id="MathJax-Element-5133">\hat{\mathbf {w}}^TX^T\mathbf {y}</script>都是1×1<script type="math/tex" id="MathJax-Element-5134">1\times1</script>的矩阵此时
yTXw^=(w^TXTy)T
<script type="math/tex; mode=display" id="MathJax-Element-5135"> \mathbf {y}^TX\hat{\mathbf {w}}=(\hat{\mathbf {w}}^TX^T\mathbf {y})^T </script>
对于1×1<script type="math/tex" id="MathJax-Element-5136">1\times1</script>的矩阵A<script type="math/tex" id="MathJax-Element-5137">\mathbf{A}</script>有AT=A<script type="math/tex" id="MathJax-Element-5138">\mathbf{A}^T=\mathbf{A}</script>
因此对于式(2)<script type="math/tex" id="MathJax-Element-5139">(2)</script>有
(2)=2(yTXw^)
<script type="math/tex; mode=display" id="MathJax-Element-5140"> (2)=2(\mathbf {y}^TX\hat{\mathbf {w}}) </script>
于是
yTXw^w^=?
<script type="math/tex; mode=display" id="MathJax-Element-5141"> \dfrac{\partial{\mathbf {y}^TX\hat{\mathbf {w}}}}{\partial{\mathbf{\hat{w}}}}=? </script>
分开来看
yT=(y1,y2,,ym);X=x11x21xm1x12x22xm2x1dx2dxmd111;w^=(w1;w2;;wd;b);
<script type="math/tex; mode=display" id="MathJax-Element-5142"> \mathbf{y^T}=(y_1,y_2,\dots,y_m);\\ X=\left( \begin{matrix} x_{11}& x_{12}&\dots&x_{1d}&1 \\ x_{21}& x_{22}&\dots&x_{2d}&1 \\ \vdots&\vdots&\ddots&\vdots&\vdots&\\ x_{m1}& x_{m2}&\dots&x_{md}&1 \\ \end{matrix} \right);\\ \mathbf{\hat{w}}=(w_1;w_2;\dots;w_d;b); </script>
相乘的结果
yTX=(i=1mxi1yi,i=1mxi2yi,i=1mxidyi,i=1myi)(part1)
<script type="math/tex; mode=display" id="MathJax-Element-5143"> \mathbf{y^T}X=\left( \sum\limits_{i=1}^{m}x_{i1}y_i, \sum\limits_{i=1}^{m}x_{i2}y_i\dots,\sum\limits_{i=1}^{m}x_{id}y_i,\sum\limits_{i=1}^{m}y_i \right)\tag{part1} </script>
(part1)w^=(i=1mxi1yi,i=1mxi2yi,i=1mxidyi,i=1myi)×(w1;w2;;wd;b)=j=1di=1mxijyiwj+bi=1myi(part1sum)
<script type="math/tex; mode=display" id="MathJax-Element-5144"> \begin{aligned} (part1)\mathbf{\hat{w}}&=\left( \sum\limits_{i=1}^{m}x_{i1}y_i, \sum\limits_{i=1}^{m}x_{i2}y_i\dots,\sum\limits_{i=1}^{m}x_{id}y_i,\sum\limits_{i=1}^{m}y_i \right)\times(w_1;w_2;\dots;w_d;b)\\ &=\sum\limits_{j=1}^{d}\sum\limits_{i=1}^{m}x_{ij}y_iw_j+b\sum\limits_{i=1}^{m}y_i \end{aligned}\tag{part1sum} </script>
求导
part1sumw^=part1sumw1part1sumw2part1sumwdpart1sumb=i=1mxi1yii=1mxi2yii=1mxidyii=1myi
<script type="math/tex; mode=display" id="MathJax-Element-5145"> \dfrac{\partial{part1sum}}{\partial{\mathbf{\hat{w}}}}=\left( \begin{matrix} \dfrac{\partial{part1sum}}{\partial{w_1}}\\ \dfrac{\partial{part1sum}}{\partial{w_2}}\\ \vdots \\ \dfrac{\partial{part1sum}}{\partial{w_d}}\\ \dfrac{\partial{part1sum}}{\partial{b}} \end{matrix} \right)=\left( \begin{matrix} \sum\limits_{i=1}^{m}x_{i1}y_i\\ \sum\limits_{i=1}^{m}x_{i2}y_i\\ \vdots \\ \sum\limits_{i=1}^{m}x_{id}y_i\\ \sum\limits_{i=1}^{m}y_i \\ \end{matrix} \right) </script>
结果是一个(d+1)×1<script type="math/tex" id="MathJax-Element-5146">(d+1)\times1</script>的矩阵也就是列向量
XTy=x11x12x1d1x21x22x2d1xm1xm2xmd1×y1y2ym=i=1mxi1yii=1mxi2yii=1mxidyii=1myi=part1sumw^
<script type="math/tex; mode=display" id="MathJax-Element-5147">X^T\mathbf{y}=\left(\begin{matrix} x_{11}&x_{21}&\cdots&x_{m1}\\ x_{12}&x_{22}&\cdots&x_{m2}\\ \vdots&\vdots&\ddots&\vdots\\ x_{1d}&x_{2d}&\cdots&x_{md}\\ 1&1&\cdots&1\\ \end{matrix} \right)\times\left(\begin{matrix} y_1\\ y_2\\ \vdots\\ y_m \end{matrix}\right)=\left( \begin{matrix} \sum\limits_{i=1}^{m}x_{i1}y_i\\ \sum\limits_{i=1}^{m}x_{i2}y_i\\ \vdots \\ \sum\limits_{i=1}^{m}x_{id}y_i\\ \sum\limits_{i=1}^{m}y_i \\ \end{matrix} \right)=\dfrac{\partial{part1sum}}{\partial{\mathbf{\hat{w}}}}</script>
同样的方法可以得到
(w^TXTXw^)w^=2XTXw^
<script type="math/tex; mode=display" id="MathJax-Element-5148"> \dfrac{\partial{(\hat{\mathbf{w}}^TX^TX\hat{\mathbf{w}})}}{\partial{\mathbf{\hat{w}}}}=2X^TX\hat{\mathbf{w}} </script>
于是得到最终结果
Ew^w^=2XT(Xw^y)
<script type="math/tex; mode=display" id="MathJax-Element-5149"> \dfrac{\partial{E_{\hat{\mathbf{w}}}}}{ \hat{\mathbf{w}}}=2X^T(X\hat{\mathbf{w}}-\mathbf{y}) </script>
5:令求导结果等于0
Ew^w^=2XT(Xw^y)XTXw^=0=XTy
<script type="math/tex; mode=display" id="MathJax-Element-5150"> \begin{aligned} \dfrac{\partial{E_{\hat{\mathbf{w}}}}}{ \hat{\mathbf{w}}}=2X^T(X\hat{\mathbf{w}}-\mathbf{y})&=0\\ X^TX\hat{\mathbf{w}}&=X^T\mathbf{y} \end{aligned} </script>
此时如果有解则:XTX<script type="math/tex" id="MathJax-Element-5151">X^TX</script>必须是可逆矩阵
所以得到:
w^=(XTX)1XTy
<script type="math/tex; mode=display" id="MathJax-Element-5152"> \hat{\mathbf{w}}^*=(X^TX)^{-1}X^T\mathbf{y} </script>
因为我们试图学得
f(xi)=wTxi+b,使f(xi)yi
<script type="math/tex; mode=display" id="MathJax-Element-5153"> f(\mathbf{x_i})=\mathbf{w}^T\mathbf{x_i}+b,使得f(\mathbf{x_i})\approx{y_i} </script>
但是我们在前面做出了一些调整:将w<script type="math/tex" id="MathJax-Element-5154"> \mathbf w</script>和b<script type="math/tex" id="MathJax-Element-5155">b</script>吸入向量形式w^=(w;b),<script type="math/tex" id="MathJax-Element-5156">\hat{\mathbf{w}}=(\mathbf{w};b),</script>
此时可以令x^i=(xi;1)<script type="math/tex" id="MathJax-Element-5157">\hat{x}_i=(x_i;1)</script>可以得到学得的模型是
f(xi)=(w;b)T(xi;1)f(x^i)=w^Tx^i
<script type="math/tex; mode=display" id="MathJax-Element-5158"> f(x_i)=(\mathbf{w};b)^T(x_i;1)\rightarrow f(\hat{x}_i)=\hat{\mathbf{w}}^T\hat{x}_i </script>
然后将w^<script type="math/tex" id="MathJax-Element-5159">\hat{\mathbf{w}}^*</script>代入
得到:
f(x^i)=((XTX)1XTy)Tx^if(x^i)=x^Ti(XTX)1XTy
<script type="math/tex; mode=display" id="MathJax-Element-5160"> f(\hat{x}_i)=((X^TX)^{-1}X^T\mathbf{y})^T\hat{x}_i\\ \Updownarrow\\ f(\hat{x}_i)=\hat{x}_i^T(X^TX)^{-1}X^T\mathbf{y}\\ </script>
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐