20241230 基础数学-线性代数-(2)线性方程求解(numpy, scipy,scikitlearn)
线性代数理论及证明过程请参考教材。* 练习代码的实现很粗(极度简化,甚至在某些条件下错误的),目的是帮助自己(AI幼儿园水平)练习,验证,理解理论。* 开源库scipy/scikit-lean的实现是严谨非常强大的,同时API使用起来非常简单。
所有代码实现,基于教程中的理论通过python实现出来的。效率不高,但有代码可以看。
由于scipy/sckitlearn/sparkx 底层的实现都被封装了(小白兔水平有限,fortran代码实在没看懂)这里的实现至少可以和理论公式对应的上。
1. 对称矩阵分解(相似矩阵),解方程(Ax=y)
P02_eig_resolver01.py P03_svd_sym_resolver02.py
solver |
SimMat(symmetric) |
SVD(symmetric) |
ax=y | numpy.la/scipy.la中,当A对称时,本质上就是相似矩阵计算方法; | |
证明过程 | 基于特征值特征矩阵的定义: Av=lambda v Av=vLamda Avv^-1 = v Lambda v^-1 A = v Lambda v^-1 |
|
Equation | Ax=Y AtAx=AtY At=PVP' x=AtY(PV'P') |
Ax=Y A'Ax=A'Y A'A=USV' x= (V S'U') A'Y |
PYTHON 代码 练习 |
ATA=A.T @ A eva, evc = la.eig(ATA) X = A.T @ Y @ evc @ la.pinv(np.diag(eva)) @ evc.T |
ATA=A.T @ A s, u = la.eigh(ATA) s=np.diag(s) Vt=u.T X = Vt.T @ la.pinv(s) @ U.T @ A.T @ Y |
2. SVD分解,解方程(Ax=y)
P04_svd_mn_resolver02.py
有很多分解方法,这里只实现了适用性最广泛的SVD方法。
solver | SVD (m,n) |
ax=y | A非对称时,SVD的简单实现; *np中给予LAPACK的_umath_linalg.svd_f,<Fortran>umath_linalg.sgesdd_ , extern /* Subroutine */ int slasdq_, 源码:f2c_s_lapack.c |
证明过程 |
* 个人理解 (可能不准确,请参考上面高人的文章): 如果A可以表达为 A=U Sigma V^t (其中 V,U 为酉矩阵),下面3个等式必须成立: |
Ax=Y A=USV' x= (V S'U') Y |
|
PYTHON 代码 练习 |
U, s, Vt = customized_svd(A) X = Vt.T @ la.pinv(s) @ U.T @ Y def customized_svd(A): debug("---------------customized_svd (m,n)------------------") AAt=A@A.T AtA=A.T@A # debug(AAt) # debug(AtA) U_eva, U_evc = la.eig(AAt) # U_eva=sw_col(U_eva,0,1) # U_evc=sw_col(U_evc,0,1) debug(U_evc.shape, U_eva) Vt_eva, Vt_evc = la.eig(AtA) debug(Vt_evc.shape, Vt_eva) k=min(len(U_eva),len(Vt_eva)) # # for i in range(k): # if U_eva[i]<0: # U_evc[:, i] = -U_evc[:, i] # if Vt_eva[i] < 0: # Vt_evc[:, i] = -Vt_evc[:, i] s_eva=np.sqrt(np.abs(Vt_eva[:k])) #直接将特征值sqrt, 取V对应的特征值,V算U (特征值一样,但顺序不同) debug(s_eva) #real singular value s_mn = np.zeros((k, k)) for i in range(k): s_mn[i][i] = s_eva[i] #Vt (fit shape) Vt_evc = Vt_evc[:, 0:k] #calulate U based on U = A * (V * S^-1^) (remove unnecessary eigvec, corresponding to 0 eigval?) U_evc= A @ Vt_evc @ la.pinv(s_mn) #U (fit shape) U_evc = U_evc[:, 0:k] #debug debug('U_evc\n',U_evc) debug('s_mn\n',s_mn) debug('Vt_evc\n',Vt_evc.T) #re-construct debug(U_evc @s_mn @ Vt_evc.T) debug(A, np.allclose(A,U_evc@s_mn@Vt_evc.T)) return U_evc,s_mn, Vt_evc.T |
3. 梯度下降法
当数据量大,层数多时(MLP),解方程效率可能不如梯度下降;在深度学习火热的今天,掌握梯度下降的方法有必要,而且很基础。
- 最简单的梯度下降(1阶段导数)
- 牛顿法(二阶导数,减少迭代次数,提高效率)
- BFGS(拟牛顿法,不用计算海森矩阵的逆矩阵了(e.g.: XtX的逆矩阵),减少计算量,提高效率)
- BFGS-sherman-morrison (也是直接算H,但是不用解方程,基于sherman-Morrison,增量计算H)//虽然都是BFGS,但更进一步,也是LBFGS的基础,推理过程待研究
- LBFGS (不存H,通过历史s,y,H0,增量计算出来当前H;只需要最近K个历史s,y)
GD | ||
TR01_LR_gd.py | TR01_LR_nt.py | TR01_LR_bfgs.py |
w=w-f'/f'' //from Tylor Eq f(x)=f(x1) + f'(x)(x-x1)/1! + f''(x)(x-x1)^2/2! ... delta=-f'/f'' =-f''^-1@f' f' = 2 * X.T @ ( Y- X @ W) f'' = X.T@X |
f'' ~ H = B^-1 -- BFGS B' = B- (BSStB/StBS - YYt/ Yts) -- steps: s= X'-x y= f'(x') - f'(x) B'=B + E B's = (B+E)s = y to calculate B', just need to find E: set E= aUUt + b VVt; we need to find [a, U, b, V] which together can makeup (B+E)s=y; => (B+aUUt + b VVt)s=y => a U Ut S + b V Vt S=y-Bs =>a (UtS) U + b(VtS)V = y - Bs //这里有点巧妙,因为UtS,VtS是实数,所以可以换位置 set u=pBS, v=qY; so that only need to find [a and b]: => a (p B S)t S p B S + b (qY)t s qY = y - Bs => a p^2 (St Bt S) B S + b q^2 (Yt S) Y =Y - BS => a p^2 (StBtS)BS + b q^2 (YtS)Y =Y-BS => a p^2 (StBtS)BS + BS + b q^2 (YtS)Y-Y =0 => [a p^2 (StBtS) + 1]BS + [b q^2 (YtS)-1]Y =0 if set ap^2 = -1/(StBtS) //(1,n)(n,n)(n,1)=(1,1) set bq^2 = 1/YtS //(1,n)(n,1) = (1,1) then equation above solved. so, we find all of [a, U, b, V]. -------reconstruct E, and calcuate B'=B+E --------------- B' = B + E = B + aUUt + b VVt => B+ -1/((StBtS)p^2) * (pBS)(StBtp) + 1/(YtS q^2) * qY (qY)t => B- BSStBt/StBtS + YYt/YtS |
|
# lr=1e-2 #87182 itr lr=1e-1 #10084 itr # lr=0.2 #5245 itr delta0=None delta=None N=diabetes_X_train.shape[0] for i in range(100*1000): W0=np.copy(W) if delta is not None: delta0 = np.copy(delta) delta = (2 * diabetes_X_train.T @ (diabetes_y_train - diabetes_X_train @ W)) W = W + lr * delta if delta is not None and delta0 is not None: # if i % 100 == 0: # print('delta diff = ', la.norm(delta-delta0)) if la.norm(delta-delta0) < 1e-9: print("stop by delta at iteration: ",i) break if W0 is not None and W is not None: # if i % 100 == 0: # print('W0 diff = ', la.norm(W0-W)) if la.norm(W0-W) < 1e-9: print("stop by W at iteration: ",i) break # W = W + lr * delta print("(RECON) Mean squared error: regr.coef_", np.round(regr.coef_,2)) print("(RECON) Mean squared error: W ", np.round(W,2)) print("(RECON) Mean squared error: W comp ", np.allclose(np.round(regr.coef_,2), np.round(W,2))) # print("(RECON) Mean squared error: comp ", la.norm(diabetes_X_train@W), la.norm(diabetes_y_train)) # print("(RECON) Mean squared error: comp ", np.allclose(np.round((diabetes_X_train@W),2), np.round(diabetes_y_train,2))) print("(RECON) Mean squared error: %.2f" % ( np.sum((diabetes_y_train-diabetes_X_train@W)**2)/N)) |
delta = la.inv(diabetes_X_train.T @ diabetes_X_train) @ (2 * diabetes_X_train.T @ (diabetes_y_train - diabetes_X_train @ W)) | delta = computeG(diabetes_X_train, diabetes_y_train, W) # d = la.inv(B) @ delta d = la.solve(B,delta) W = W + lr * d s = W - W0 y = computeG(diabetes_X_train, diabetes_y_train, W) - delta # print('s .shape',s.shape) # print('y .shape',y.shape) s = s.reshape(10, 1) y = - y.reshape(10, 1) B = B - (B@s@s.T@B)/(s.T@B@s) + y@y.T/(y.T@s) |
stop by delta at iteration: 10084 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34 |
//迭代确实快好多118<10084 stop by W at iteration: 118 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34 |
stop by W at iteration: 196 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34 |
TR01_LR_bfgs_sherman_morrison.py | TR01_LR_Lbfgs_recentK.py |
# sherman-morrison calculate H = B^-1 by [Y,S,Y.T,S.T] instead of solving from (B,delta) //直接计算H,不用解方程;但是推导过程可能较复杂; V= I - YSt/YtS, P = SSt/YtS H' = Vt H V + P |
不用存储H,改用从[s,y]history和H0 构建出H’(目标是减少内存占用) H公式不变; H' = Vt H V + P 构建: H1 = Vt H0 V + P H2 =Vt1( Vt H0 V + P )V1+P1 ... |
# sherman-morrison calculate H = B^-1 by [Y,S,Y.T,S.T] instead of solving from (B,delta) d = H @ delta ... H = (np.eye(10) - (s @ y.T)/ (y.T @ s)).T @ H @ (np.eye(10) - (y@s.T)/ (y.T @ s)) + (s @ s.T) / (y.T @ s) |
H = H0 # K = 20 # iterate 187 # K = 10 # iterate 169 K = 9 # iterate 164 (best) seems approximation by K=9 steps leading to least bias # K = 8 # iterate 178 # K = 6 # iterate 176 # K = 3 # iterate 209 a little more than full history # K = 1 # iterate 596 even more for j in range(max(0, len(Sh)-K), len(Sh)): # H' = Vt H V + P | V= I - YSt/YtS, P = SSt/YtS _s = Sh[j] _y = Yh[j] _V_ = (np.eye(10) - (_y@ _s.T)/ (_y.T @ _s)) _P_ = (_s @ _s.T) / (_y.T @ _s) H = _V_.T @ H @ _V_ + _P_ |
stop by W at iteration: 362 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34 |
stop by delta at iteration: 164 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34 |
4. 含bias/intercept,解方程(Ax+b=y)
solver | intercept 基于偏移后的X,Y计算Xw=Y,得出w;intercept 就是原始Xo |
Xw+b=y |
-- preprocess |
4. 线性回归(验证练习)
TR01_LR.py
用上面的练习solver,验证线性回归,求出的coef,intercept是一样的。
验证 |
LinearRegression(fit_intercept=False) |
min(||Ax-y||**2) |
# Create linear regression object regr = linear_model.LinearRegression(fit_intercept=False) # Train the model using the training sets regr.fit(diabetes_X_train, diabetes_y_train) # The coefficients print("Coefficients: \n", regr.coef_) print("intercept_: \n", regr.intercept_) # =================================== # other solver test # =================================== x, res, rk, sig = la.lstsq(diabetes_X_train,diabetes_y_train) print("lstsq: \n", x, np.ravel(x), res, np.sqrt(res), rk, sig) x, res, rk, sig =scipy.linalg.lstsq(diabetes_X_train,diabetes_y_train) print("lstsq: \n", x, np.ravel(x), res, np.sqrt(res), rk, sig) x = solve01(diabetes_X_train,diabetes_y_train, _debug=False) print("solve01: \n",x) x = solve_02_svd_sym(diabetes_X_train,diabetes_y_train, _debug=False) print("solve_02_svd_sym: \n",x) x = solve_02_svd_mn(diabetes_X_train,diabetes_y_train, _debug=False) print("solve_02_svd_mn: \n",x) |
Coefficients: [970.17] intercept_: 0.0 is sparse False lstsq: [970.17] [970.17] [11536167.22] [3396.49] 1 [0.98] lstsq: [970.17] [970.17] 11536167.220484123 3396.493371182126 1 [0.98] solve01: [970.17] solve_02_svd_sym: [970.17] solve_02_svd_mn: [970.17] |
参考代码
参考代码: AITutorial02: AI初学,AI幼儿园练习
写在最后
* 线性代数理论及证明过程请参考教材。公式推理可以参考AI。感谢数学大师们。
* 练习代码的实现很粗(极度简化,甚至在某些条件下错误的),目的是帮助自己(AI幼儿园水平)练习,验证,理解理论。
* 开源库scipy/scikit-lean的实现是严谨非常强大的,同时API使用起来非常简单。
更多推荐
所有评论(0)