所有代码实现,基于教程中的理论通过python实现出来的。效率不高,但有代码可以看。

由于scipy/sckitlearn/sparkx 底层的实现都被封装了(小白兔水平有限,fortran代码实在没看懂)这里的实现至少可以和理论公式对应的上。

1. 对称矩阵分解(相似矩阵),解方程(Ax=y)

P02_eig_resolver01.py  P03_svd_sym_resolver02.py
solver

SimMat(symmetric) 

SVD(symmetric) 

ax=y numpy.la/scipy.la中,当A对称时,本质上就是相似矩阵计算方法;
证明过程 基于特征值特征矩阵的定义:
Av=lambda v
Av=vLamda
Avv^-1 = v Lambda v^-1
A = v Lambda v^-1
Equation Ax=Y
AtAx=AtY
At=PVP'
x=AtY(PV'P')
Ax=Y
A'Ax=A'Y
A'A=USV'
x= (V S'U') A'Y

PYTHON

代码

练习

ATA=A.T @ A
eva, evc = la.eig(ATA)
X = A.T @ Y @ evc @ la.pinv(np.diag(eva)) @ evc.T
ATA=A.T @ A
s, u = la.eigh(ATA)
s=np.diag(s)
Vt=u.T
X = Vt.T @ la.pinv(s) @ U.T @ A.T @ Y

2. SVD分解,解方程(Ax=y)

P04_svd_mn_resolver02.py

有很多分解方法,这里只实现了适用性最广泛的SVD方法。

solver SVD (m,n)
ax=y A非对称时,SVD的简单实现;
*np中给予LAPACK的_umath_linalg.svd_f,<Fortran>umath_linalg.sgesdd_ , extern /* Subroutine */ int slasdq_,  源码:f2c_s_lapack.c
证明过程


高人给的证明:

http://什么是奇异值分解?线性代数之奇异值分解SVD的深度解析与应用 https://baijiahao.baidu.com/s?id=1795462577820713099&wfr=spider&for=pc&searchword=%E7%BA%BF%E6%80%A7%E4%BB%A3%E6%95%B0%20%E9%AB%98%E5%A5%87%E5%BC%82%E5%80%BC%E5%88%86%E8%A7%A3%20%E8%AF%81%E6%98%8E Equation

* 个人理解 (可能不准确,请参考上面高人的文章):

如果A可以表达为 A=U Sigma V^t  (其中 V,U 为酉矩阵),下面3个等式必须成立:
1)A^t A = (U Sigma V^t)^t (U Sigma V^t)
2)A^t A = V Sigma U^t U Signam V^t
3)A^t A = V Sigma^2 V^t   

由于ATA是对称矩阵,等式3满足相似矩阵的定义
由于等式3成立,等式1也成立,即A可以表达为 A=U Sigma V^t, 其中
V=就是 ATA 对角化后的P
Sigma = 就是 ATA 对角化后的sqrt(Lambda)

Ax=Y
A=USV'
x= (V S'U') Y 

PYTHON

代码

练习

U, s, Vt = customized_svd(A)
X = Vt.T @ la.pinv(s) @ U.T @ Y


def customized_svd(A):
    debug("---------------customized_svd (m,n)------------------")
    AAt=A@A.T
    AtA=A.T@A
    # debug(AAt)
    # debug(AtA)

    U_eva, U_evc = la.eig(AAt)

    # U_eva=sw_col(U_eva,0,1)
    # U_evc=sw_col(U_evc,0,1)
    debug(U_evc.shape, U_eva)

    Vt_eva, Vt_evc = la.eig(AtA)
    debug(Vt_evc.shape, Vt_eva)


    k=min(len(U_eva),len(Vt_eva))
    #
    # for i in range(k):
    #     if U_eva[i]<0:
    #         U_evc[:, i] = -U_evc[:, i]
    #     if Vt_eva[i] < 0:
    #         Vt_evc[:, i] = -Vt_evc[:, i]

    s_eva=np.sqrt(np.abs(Vt_eva[:k])) #直接将特征值sqrt, 取V对应的特征值,V算U (特征值一样,但顺序不同)
    debug(s_eva)

    #real singular value
    s_mn = np.zeros((k, k))
    for i in range(k):
        s_mn[i][i] = s_eva[i]

    #Vt (fit shape)
    Vt_evc = Vt_evc[:, 0:k]

    #calulate U based on U = A * (V * S^-1^) (remove unnecessary eigvec, corresponding to 0 eigval?)
    U_evc= A @ Vt_evc @ la.pinv(s_mn)
    #U (fit shape)
    U_evc = U_evc[:, 0:k]

    #debug
    debug('U_evc\n',U_evc)
    debug('s_mn\n',s_mn)
    debug('Vt_evc\n',Vt_evc.T)

    #re-construct
    debug(U_evc @s_mn @ Vt_evc.T)
    debug(A, np.allclose(A,U_evc@s_mn@Vt_evc.T))
    return U_evc,s_mn, Vt_evc.T

3. 梯度下降法

当数据量大,层数多时(MLP),解方程效率可能不如梯度下降;在深度学习火热的今天,掌握梯度下降的方法有必要,而且很基础。

  • 最简单的梯度下降(1阶段导数)
  • 牛顿法(二阶导数,减少迭代次数,提高效率)
  • BFGS(拟牛顿法,不用计算海森矩阵的逆矩阵了(e.g.: XtX的逆矩阵),减少计算量,提高效率)
  • BFGS-sherman-morrison (也是直接算H,但是不用解方程,基于sherman-Morrison,增量计算H)//虽然都是BFGS,但更进一步,也是LBFGS的基础,推理过程待研究
  • LBFGS (不存H,通过历史s,y,H0,增量计算出来当前H;只需要最近K个历史s,y)
GD
TR01_LR_gd.py TR01_LR_nt.py TR01_LR_bfgs.py
w=w-f'/f'' 
//from Tylor Eq f(x)=f(x1) + f'(x)(x-x1)/1! + f''(x)(x-x1)^2/2! ...

delta=-f'/f'' =-f''^-1@f'
f' = 2 * X.T @ ( Y- X @ W)
f'' = X.T@X
f'' ~ H = B^-1

-- BFGS
B' = B-  (BSStB/StBS - YYt/ Yts)

-- steps:
s= X'-x
y= f'(x') - f'(x)
B'=B + E
B's = (B+E)s = y

to calculate B', just need to find E:

set E= aUUt + b VVt; we need to find [a, U, b, V] which together can makeup (B+E)s=y;

=> (B+aUUt + b VVt)s=y
=>    a U Ut S + b V Vt S=y-Bs
=>a (UtS) U + b(VtS)V = y - Bs
//这里有点巧妙,因为UtS,VtS是实数,所以可以换位置

set u=pBS, v=qY; so that only need to find [a and b]:
=> a (p B S)t S p B S  + b (qY)t s qY  = y - Bs

=> a p^2 (St Bt S) B S + b q^2 (Yt S) Y =Y - BS

=> a p^2 (StBtS)BS + b q^2 (YtS)Y =Y-BS

=> a p^2 (StBtS)BS + BS + b q^2 (YtS)Y-Y =0

=> [a p^2 (StBtS) + 1]BS + [b q^2 (YtS)-1]Y =0

if
set ap^2 = -1/(StBtS) //(1,n)(n,n)(n,1)=(1,1)
set bq^2 = 1/YtS      //(1,n)(n,1) = (1,1)
then equation above solved.

so, we find all of [a, U, b, V].

-------reconstruct E, and calcuate B'=B+E ---------------

B' = B + E = B + aUUt + b VVt

=> B+ -1/((StBtS)p^2) * (pBS)(StBtp) + 1/(YtS q^2) * qY (qY)t

=> B- BSStBt/StBtS + YYt/YtS


 
# lr=1e-2     #87182 itr
lr=1e-1   #10084 itr
# lr=0.2    #5245  itr
delta0=None
delta=None
N=diabetes_X_train.shape[0]

for i in range(100*1000):
    W0=np.copy(W)
    if delta is not None:
        delta0 = np.copy(delta)

    delta = (2 * diabetes_X_train.T @ (diabetes_y_train - diabetes_X_train @ W))

    W = W + lr * delta

    if delta is not None and delta0 is not None:
        # if i % 100 == 0:
        #     print('delta diff = ', la.norm(delta-delta0))
        if la.norm(delta-delta0) < 1e-9:
            print("stop by delta at iteration: ",i)
            break

    if W0 is not None and W is not None:
        # if i % 100 == 0:
        #     print('W0    diff = ', la.norm(W0-W))
        if la.norm(W0-W) < 1e-9:
            print("stop by W at iteration: ",i)
            break

    # W = W + lr * delta

print("(RECON) Mean squared error: regr.coef_", np.round(regr.coef_,2))
print("(RECON) Mean squared error: W         ", np.round(W,2))
print("(RECON) Mean squared error: W comp  ", np.allclose(np.round(regr.coef_,2), np.round(W,2)))
# print("(RECON) Mean squared error: comp  ", la.norm(diabetes_X_train@W), la.norm(diabetes_y_train))
# print("(RECON) Mean squared error: comp  ", np.allclose(np.round((diabetes_X_train@W),2), np.round(diabetes_y_train,2)))
print("(RECON) Mean squared error: %.2f" % ( np.sum((diabetes_y_train-diabetes_X_train@W)**2)/N))
delta =  la.inv(diabetes_X_train.T @ diabetes_X_train) @ (2 * diabetes_X_train.T @ (diabetes_y_train - diabetes_X_train @ W))     delta = computeG(diabetes_X_train, diabetes_y_train, W)

    # d = la.inv(B) @ delta
    d = la.solve(B,delta)
    W = W + lr * d

    s = W - W0
    y = computeG(diabetes_X_train, diabetes_y_train, W) - delta

    # print('s .shape',s.shape)
    # print('y .shape',y.shape)
    s = s.reshape(10, 1)
    y = - y.reshape(10, 1)
B = B - (B@s@s.T@B)/(s.T@B@s) + y@y.T/(y.T@s)
stop by delta at iteration:  10084
(RECON) Mean squared error: regr.coef_ [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W          [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W comp   
True
(RECON) Mean squared error: 26217.34
//迭代确实快好多118<10084
stop by W at iteration:  
118
(RECON) Mean squared error: regr.coef_ [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W          [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W comp   True
(RECON) Mean squared error: 26217.34
stop by W at iteration:  196
(RECON) Mean squared error: regr.coef_ [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W          [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W comp   True
(RECON) Mean squared error: 26217.34

TR01_LR_bfgs_sherman_morrison.py TR01_LR_Lbfgs_recentK.py
# sherman-morrison calculate H = B^-1 by [Y,S,Y.T,S.T] instead of solving from (B,delta)

//直接计算H,不用解方程;但是推导过程可能较复杂;
V= I - YSt/YtS,
P = SSt/YtS

H' = Vt H V + P 
不用存储H,改用从[s,y]history和H0 构建出H’(目标是减少内存占用)

H公式不变;
H' = Vt H V + P


构建:
H1 = Vt H0 V + P
H2 =Vt1( Vt H0 V + P )V1+P1
...
# sherman-morrison calculate H = B^-1 by [Y,S,Y.T,S.T] instead of solving from (B,delta)
    d = H @ delta

...
H =  (np.eye(10) - (s @ y.T)/ (y.T @ s)).T @ H @ (np.eye(10) - (y@s.T)/ (y.T @ s))  + (s @ s.T) / (y.T @ s)
H = H0
    # K = 20  # iterate 187
    # K = 10  # iterate 169
    K = 9     # iterate 164  (best)  seems approximation by K=9 steps leading to least bias
    # K = 8   # iterate 178
    # K = 6   # iterate 176
    # K = 3   # iterate 209 a little more than full history
    # K = 1   # iterate 596 even more
    for j in range(max(0, len(Sh)-K), len(Sh)):
        # H' = Vt H V + P   |  V= I - YSt/YtS, P = SSt/YtS
        _s = Sh[j]
        _y = Yh[j]
        _V_ = (np.eye(10) - (_y@ _s.T)/ (_y.T @ _s))
        _P_ = (_s @ _s.T) / (_y.T @ _s)
        H = _V_.T @ H @ _V_ + _P_
stop by W at iteration:  362
(RECON) Mean squared error: regr.coef_ [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W          [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W comp   True
(RECON) Mean squared error: 26217.34
stop by delta at iteration:  164
(RECON) Mean squared error: regr.coef_ [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W          [  42.92 -261.96  547.54  352.46 -634.05  285.12   -9.39  197.5   670.76
   11.67]
(RECON) Mean squared error: W comp   True
(RECON) Mean squared error: 26217.34

4. 含bias/intercept,解方程(Ax+b=y)

solver intercept
基于偏移后的X,Y计算Xw=Y,得出w;intercept 就是原始Xo 
Xw+b=y

-- preprocess
X= Xo - X_offset
y= yo - y_offset

-- solve
X@w = y

-- 得出w后,计算intercept
Xo@w - X_offset@w =  yo - y_offset

Xo@w - X_offset@w + y_offset =  yo

Xo@w + (y_offset - X_offset@w) =  yo

        intercept_ = y_offset - X_offset @ coef_
  

4. 线性回归(验证练习)

TR01_LR.py

用上面的练习solver,验证线性回归,求出的coef,intercept是一样的。

验证
LinearRegression(fit_intercept=False)
min(||Ax-y||**2)
# Create linear regression object
regr = linear_model.LinearRegression(fit_intercept=False)

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)


# The coefficients
print("Coefficients: \n", regr.coef_)
print("intercept_: \n", regr.intercept_)

# ===================================
#  other solver test
# ===================================
x, res, rk, sig = la.lstsq(diabetes_X_train,diabetes_y_train)
print("lstsq: \n", x, np.ravel(x), res, np.sqrt(res), rk, sig)

x, res, rk, sig =scipy.linalg.lstsq(diabetes_X_train,diabetes_y_train)
print("lstsq: \n", x, np.ravel(x), res, np.sqrt(res), rk, sig)


x = solve01(diabetes_X_train,diabetes_y_train, _debug=False)
print("solve01: \n",x)

x = solve_02_svd_sym(diabetes_X_train,diabetes_y_train, _debug=False)
print("solve_02_svd_sym: \n",x)

x = solve_02_svd_mn(diabetes_X_train,diabetes_y_train, _debug=False)
print("solve_02_svd_mn: \n",x)
Coefficients:
 [970.17]
intercept_:
 0.0
is sparse False
lstsq:
 [970.17] [970.17] [11536167.22] [3396.49] 1 [0.98]
lstsq:
 [970.17] [970.17] 11536167.220484123 3396.493371182126 1 [0.98]
solve01:
 [970.17]
solve_02_svd_sym:
 [970.17]
solve_02_svd_mn:
 [970.17]

参考代码

参考代码: AITutorial02: AI初学,AI幼儿园练习

写在最后

* 线性代数理论及证明过程请参考教材。公式推理可以参考AI。感谢数学大师们。

* 练习代码的实现很粗(极度简化,甚至在某些条件下错误的),目的是帮助自己(AI幼儿园水平)练习,验证,理解理论。

* 开源库scipy/scikit-lean的实现是严谨非常强大的,同时API使用起来非常简单。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐