20241230 基础数学-线性代数-（2）线性方程求解（numpy, scipy，scikitlearn）

线性代数理论及证明过程请参考教材。* 练习代码的实现很粗（极度简化，甚至在某些条件下错误的），目的是帮助自己（AI幼儿园水平）练习，验证，理解理论。* 开源库scipy/scikit-lean的实现是严谨非常强大的，同时API使用起来非常简单。

weixin_46449024

93人浏览 · 2024-12-07 21:40:36

weixin_46449024 · 2024-12-07 21:40:36 发布

所有代码实现，基于教程中的理论通过python实现出来的。效率不高，但有代码可以看。

由于scipy/sckitlearn/sparkx 底层的实现都被封装了（小白兔水平有限，fortran代码实在没看懂）这里的实现至少可以和理论公式对应的上。

1. 对称矩阵分解（相似矩阵），解方程（Ax=y)

P02_eig_resolver01.py  P03_svd_sym_resolver02.py

solver	SimMat(symmetric)	SVD(symmetric)
ax=y		numpy.la/scipy.la中，当A对称时，本质上就是相似矩阵计算方法；
证明过程	基于特征值特征矩阵的定义： Av=lambda v Av=vLamda Avv^-1 = v Lambda v^-1 A = v Lambda v^-1
Equation	Ax=Y AtAx=AtY At=PVP' x=AtY(PV'P')	Ax=Y A'Ax=A'Y A'A=USV' x= (V S'U') A'Y
PYTHON 代码练习	ATA=A.T @ A eva, evc = la.eig(ATA) X = A.T @ Y @ evc @ la.pinv(np.diag(eva)) @ evc.T	ATA=A.T @ A s, u = la.eigh(ATA) s=np.diag(s) Vt=u.T X = Vt.T @ la.pinv(s) @ U.T @ A.T @ Y

2. SVD分解，解方程（Ax=y)

P04_svd_mn_resolver02.py

有很多分解方法，这里只实现了适用性最广泛的SVD方法。

solver	SVD (m,n)
ax=y	A非对称时，SVD的简单实现； np中给予LAPACK的_umath_linalg.svd_f，<Fortran>umath_linalg.sgesdd_ , extern / Subroutine */ int slasdq_，源码：f2c_s_lapack.c
证明过程	高人给的证明： http://什么是奇异值分解？线性代数之奇异值分解SVD的深度解析与应用 https://baijiahao.baidu.com/s?id=1795462577820713099&wfr=spider&for=pc&searchword=%E7%BA%BF%E6%80%A7%E4%BB%A3%E6%95%B0%20%E9%AB%98%E5%A5%87%E5%BC%82%E5%80%BC%E5%88%86%E8%A7%A3%20%E8%AF%81%E6%98%8E Equation * 个人理解 (可能不准确，请参考上面高人的文章）：如果A可以表达为 A=U Sigma V^t (其中 V,U 为酉矩阵），下面3个等式必须成立： 1）A^t A = (U Sigma V^t)^t (U Sigma V^t) 2）A^t A = V Sigma U^t U Signam V^t 3）A^t A = V Sigma^2 V^t 由于ATA是对称矩阵，等式3满足相似矩阵的定义由于等式3成立，等式1也成立，即A可以表达为 A=U Sigma V^t，其中 V=就是 ATA 对角化后的P Sigma = 就是 ATA 对角化后的sqrt(Lambda)
	Ax=Y A=USV' x= (V S'U') Y
PYTHON 代码练习	U, s, Vt = customized_svd(A) X = Vt.T @ la.pinv(s) @ U.T @ Y def customized_svd(A): debug("---------------customized_svd (m,n)------------------") AAt=A@A.T AtA=A.T@A # debug(AAt) # debug(AtA) U_eva, U_evc = la.eig(AAt) # U_eva=sw_col(U_eva,0,1) # U_evc=sw_col(U_evc,0,1) debug(U_evc.shape, U_eva) Vt_eva, Vt_evc = la.eig(AtA) debug(Vt_evc.shape, Vt_eva) k=min(len(U_eva),len(Vt_eva)) # # for i in range(k): # if U_eva[i]<0: # U_evc[:, i] = -U_evc[:, i] # if Vt_eva[i] < 0: # Vt_evc[:, i] = -Vt_evc[:, i] s_eva=np.sqrt(np.abs(Vt_eva[:k])) #直接将特征值sqrt，取V对应的特征值，V算U （特征值一样，但顺序不同） debug(s_eva) #real singular value s_mn = np.zeros((k, k)) for i in range(k): s_mn[i][i] = s_eva[i] #Vt (fit shape) Vt_evc = Vt_evc[:, 0:k] #calulate U based on U = A * (V * S^-1^) (remove unnecessary eigvec, corresponding to 0 eigval?) U_evc= A @ Vt_evc @ la.pinv(s_mn) #U (fit shape) U_evc = U_evc[:, 0:k] #debug debug('U_evc\n',U_evc) debug('s_mn\n',s_mn) debug('Vt_evc\n',Vt_evc.T) #re-construct debug(U_evc @s_mn @ Vt_evc.T) debug(A, np.allclose(A,U_evc@s_mn@Vt_evc.T)) return U_evc,s_mn, Vt_evc.T

3. 梯度下降法

当数据量大，层数多时（MLP），解方程效率可能不如梯度下降；在深度学习火热的今天，掌握梯度下降的方法有必要，而且很基础。

最简单的梯度下降（1阶段导数）
牛顿法（二阶导数，减少迭代次数，提高效率）
BFGS（拟牛顿法，不用计算海森矩阵的逆矩阵了（e.g.: XtX的逆矩阵），减少计算量，提高效率）
BFGS-sherman-morrison (也是直接算H，但是不用解方程，基于sherman-Morrison，增量计算H）//虽然都是BFGS，但更进一步，也是LBFGS的基础，推理过程待研究
LBFGS (不存H，通过历史s，y，H0，增量计算出来当前H；只需要最近K个历史s，y）

GD
TR01_LR_gd.py	TR01_LR_nt.py	TR01_LR_bfgs.py
	w=w-f'/f'' //from Tylor Eq f(x)=f(x1) + f'(x)(x-x1)/1! + f''(x)(x-x1)^2/2! ... delta=-f'/f'' =-f''^-1@f' f' = 2 * X.T @ ( Y- X @ W) f'' = X.T@X	f'' ~ H = B^-1 -- BFGS B' = B- (BSStB/StBS - YYt/ Yts) -- steps: s= X'-x y= f'(x') - f'(x) B'=B + E B's = (B+E)s = y to calculate B', just need to find E: set E= aUUt + b VVt; we need to find [a, U, b, V] which together can makeup (B+E)s=y; => (B+aUUt + b VVt)s=y => a U Ut S + b V Vt S=y-Bs =>a (UtS) U + b(VtS)V = y - Bs //这里有点巧妙，因为UtS，VtS是实数，所以可以换位置 set u=pBS, v=qY; so that only need to find [a and b]: => a (p B S)t S p B S + b (qY)t s qY = y - Bs => a p^2 (St Bt S) B S + b q^2 (Yt S) Y =Y - BS => a p^2 (StBtS)BS + b q^2 (YtS)Y =Y-BS => a p^2 (StBtS)BS + BS + b q^2 (YtS)Y-Y =0 => [a p^2 (StBtS) + 1]BS + [b q^2 (YtS)-1]Y =0 if set ap^2 = -1/(StBtS) //(1,n)(n,n)(n,1)=(1,1) set bq^2 = 1/YtS //(1,n)(n,1) = (1,1) then equation above solved. so, we find all of [a, U, b, V]. -------reconstruct E, and calcuate B'=B+E --------------- B' = B + E = B + aUUt + b VVt => B+ -1/((StBtS)p^2) * (pBS)(StBtp) + 1/(YtS q^2) * qY (qY)t => B- BSStBt/StBtS + YYt/YtS
# lr=1e-2 #87182 itr lr=1e-1 #10084 itr # lr=0.2 #5245 itr delta0=None delta=None N=diabetes_X_train.shape[0] for i in range(1001000): W0=np.copy(W) if delta is not None: delta0 = np.copy(delta) delta = (2 diabetes_X_train.T @ (diabetes_y_train - diabetes_X_train @ W)) W = W + lr * delta if delta is not None and delta0 is not None: # if i % 100 == 0: # print('delta diff = ', la.norm(delta-delta0)) if la.norm(delta-delta0) < 1e-9: print("stop by delta at iteration: ",i) break if W0 is not None and W is not None: # if i % 100 == 0: # print('W0 diff = ', la.norm(W0-W)) if la.norm(W0-W) < 1e-9: print("stop by W at iteration: ",i) break # W = W + lr * delta print("(RECON) Mean squared error: regr.coef_", np.round(regr.coef_,2)) print("(RECON) Mean squared error: W ", np.round(W,2)) print("(RECON) Mean squared error: W comp ", np.allclose(np.round(regr.coef_,2), np.round(W,2))) # print("(RECON) Mean squared error: comp ", la.norm(diabetes_X_train@W), la.norm(diabetes_y_train)) # print("(RECON) Mean squared error: comp ", np.allclose(np.round((diabetes_X_train@W),2), np.round(diabetes_y_train,2))) print("(RECON) Mean squared error: %.2f" % ( np.sum((diabetes_y_train-diabetes_X_train@W)**2)/N))	delta = la.inv(diabetes_X_train.T @ diabetes_X_train) @ (2 * diabetes_X_train.T @ (diabetes_y_train - diabetes_X_train @ W))	delta = computeG(diabetes_X_train, diabetes_y_train, W) # d = la.inv(B) @ delta d = la.solve(B,delta) W = W + lr * d s = W - W0 y = computeG(diabetes_X_train, diabetes_y_train, W) - delta # print('s .shape',s.shape) # print('y .shape',y.shape) s = s.reshape(10, 1) y = - y.reshape(10, 1) B = B - (B@s@s.T@B)/(s.T@B@s) + y@y.T/(y.T@s)
stop by delta at iteration: 10084 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34	//迭代确实快好多118<10084 stop by W at iteration: 118 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34	stop by W at iteration: 196 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34

TR01_LR_bfgs_sherman_morrison.py	TR01_LR_Lbfgs_recentK.py
# sherman-morrison calculate H = B^-1 by [Y,S,Y.T,S.T] instead of solving from (B,delta) //直接计算H，不用解方程；但是推导过程可能较复杂； V= I - YSt/YtS, P = SSt/YtS H' = Vt H V + P	不用存储H，改用从[s,y]history和H0 构建出H’（目标是减少内存占用） H公式不变； H' = Vt H V + P 构建： H1 = Vt H0 V + P H2 =Vt1( Vt H0 V + P )V1+P1 ...
# sherman-morrison calculate H = B^-1 by [Y,S,Y.T,S.T] instead of solving from (B,delta) d = H @ delta ... H = (np.eye(10) - (s @ y.T)/ (y.T @ s)).T @ H @ (np.eye(10) - (y@s.T)/ (y.T @ s)) + (s @ s.T) / (y.T @ s)	H = H0 # K = 20 # iterate 187 # K = 10 # iterate 169 K = 9 # iterate 164 (best) seems approximation by K=9 steps leading to least bias # K = 8 # iterate 178 # K = 6 # iterate 176 # K = 3 # iterate 209 a little more than full history # K = 1 # iterate 596 even more for j in range(max(0, len(Sh)-K), len(Sh)): # H' = Vt H V + P \| V= I - YSt/YtS, P = SSt/YtS _s = Sh[j] _y = Yh[j] _V_ = (np.eye(10) - (_y@ _s.T)/ (_y.T @ _s)) _P_ = (_s @ _s.T) / (_y.T @ _s) H = _V_.T @ H @ _V_ + _P_
stop by W at iteration: 362 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34	stop by delta at iteration: 164 (RECON) Mean squared error: regr.coef_ [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W [ 42.92 -261.96 547.54 352.46 -634.05 285.12 -9.39 197.5 670.76 11.67] (RECON) Mean squared error: W comp True (RECON) Mean squared error: 26217.34

4. 含bias/intercept，解方程（Ax+b=y)

solver

intercept
基于偏移后的X，Y计算Xw=Y,得出w；intercept 就是原始Xo

Xw+b=y

-- preprocess
X= Xo - X_offset
y= yo - y_offset

-- solve
X@w = y

-- 得出w后，计算intercept
Xo@w - X_offset@w = yo - y_offset

Xo@w - X_offset@w + y_offset = yo

Xo@w + （y_offset - X_offset@w） = yo

intercept_ = y_offset - X_offset @ coef_

4. 线性回归（验证练习）

TR01_LR.py

用上面的练习solver，验证线性回归，求出的coef，intercept是一样的。

验证

LinearRegression(fit_intercept=False)

min(||Ax-y||**2)

# Create linear regression object
regr = linear_model.LinearRegression(fit_intercept=False)

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# The coefficients
print("Coefficients: \n", regr.coef_)
print("intercept_: \n", regr.intercept_)

# ===================================
# other solver test
# ===================================
x, res, rk, sig = la.lstsq(diabetes_X_train,diabetes_y_train)
print("lstsq: \n", x, np.ravel(x), res, np.sqrt(res), rk, sig)

x, res, rk, sig =scipy.linalg.lstsq(diabetes_X_train,diabetes_y_train)
print("lstsq: \n", x, np.ravel(x), res, np.sqrt(res), rk, sig)

x = solve01(diabetes_X_train,diabetes_y_train, _debug=False)
print("solve01: \n",x)

x = solve_02_svd_sym(diabetes_X_train,diabetes_y_train, _debug=False)
print("solve_02_svd_sym: \n",x)

x = solve_02_svd_mn(diabetes_X_train,diabetes_y_train, _debug=False)
print("solve_02_svd_mn: \n",x)

Coefficients:
[970.17]
intercept_:
0.0
is sparse False
lstsq:
[970.17] [970.17] [11536167.22] [3396.49] 1 [0.98]
lstsq:
[970.17] [970.17] 11536167.220484123 3396.493371182126 1 [0.98]
solve01:
[970.17]
solve_02_svd_sym:
[970.17]
solve_02_svd_mn:
[970.17]