LoRA原理与PyTorch代码实现

无论是火热的大模型(LLM)还是文生图模型(Stable Diffusion)微调的时候，都需要大量的GPU显存，个人的显卡上很难实现，因此各种参数高效(Parameter-Efficient)的方法层出不穷，最受大家欢迎的就是LoRA 《LoRA:Low-Rank Adaptation of Large Language Modules》LoRA有很多的优点，节约显存，训练快，效果损失小(相当

shenxianasi

623人浏览 · 2025-09-16 23:26:13

shenxianasi · 2025-09-16 23:26:13 发布

背景

核心原理

矩阵都可以表示为若干个线性无关向量，最大的线性无关向量个数就是秩

PyTorch代码实现

致谢

背景

LoRA有很多的优点，节约显存，训练快，效果损失小(相当于全参数微调)，推理的时候不增加耗时，可以做一个插入式组件使用。缺点当然也有，那就是还是会有一些效果的损失

核心原理

核心原理非常的简单，任意一个矩阵 $W_0$ ，都可以对它进行低秩分解，把一个很大的矩阵拆分成两个小矩阵 $\left ( A,B \right )$ ，在训练的过程中不去改变 $W_0$ 参数，而是去改变 $AB$ ，具体可以表示为

$W_{new} = W_0 + AB$

最终在训练计算的时候是

$h = W_0x + ABx = W_0 + \frac{\alpha }{r}ABx$

$s.t. W_0 \epsilon R^{n\times m}, A \epsilon R^{n\times r}, B \epsilon R^{r\times m}$

其中， $r < < n$ and $r < < m$ ，r 甚至可以设置成1 为什么说只优化AB两个矩阵就可以了呢？这里面的假设是什么？ W不是满秩的，里面有大量参数是冗余的，那么其实可以用更接近满秩的矩阵AB代替

矩阵都可以表示为若干个线性无关向量，最大的线性无关向量个数就是秩

PyTorch代码实现

import torch
import torch.nn as nn
import torch.nn.functional as F
import math

class LinearLoRALayer(nn.Module):
    def __init__(self,in_features,out_features,merge = False,rank = 8,lora_alpha = 16,dropout = 0.1):
        super().__init__()
        self.in_fatures = in_features
        self.out_features = out_features
        self.merge = merge
        self.rank = rank

        self.linear = nn.Linear(in_features,out_features)
        # linear : weight shape is (out_features,in_features)
        # input x shape is (batch_size,seq_len,in_features)
        # 计算过程是 x @ weight.T
        # 所以weight shape is (out_features,in_features)

        if rank > 0:
            # 这里是为了标注lora_a和lora_b是可训练参数
            self.lora_a = nn.Parameter(torch.zeros(out_features,rank))
            # lora_a需要初始化为高斯分布
            nn.init.kaiming_normal_(self.lora_a,a = 0.01)
            # a表示leaky_relu的负斜率系数，一般是0.01这样的小值，不可能>1

            self.lora_b = nn.Parameter(torch.zeros(rank,in_features))
            self.scale = lora_alpha / rank
        
            # linear 需要设置为不可以训练
            self.linear.weight.requires_grad = False
            self.linear.bias.requires_grad = False


        self.dropout = nn.Dropout(dropout) if dropout > 0 else nn.Identity()

        # merge 是 bool 类型，如果为True,则将lora的权重和linear的权重合并
        if merge:
            self.merge_weight()
    
    def merge_weight(self,):
        if self.merge and self.rank > 0:
            # (output_features,rank) @ (rank,input_features) = (output_features,input_features)
            self.linear.weight.data += self.scale * (self.lora_a @ self.lora_b)
    
    def unmerge_weight(self,):
        if self.rank > 0:
            self.linear.weight.data -= self.scale * (self.lora_a @ self.lora_b)

    def forward(self,x):
        # x shape is (batch_size,seq_len,in_features)

        if self.rank > 0 and not self.merge:
            output = self.linear(x) + self.scale * (x @ (self.lora_a @ self.lora_b).T)
        elif self.rank > 0 and self.merge:
            output = self.linear(x)
        else:
            output = self.linear(x)

        output = self.dropout(output)
        return output
    

# Test the LoRALinear layer
batch_size = 32
seq_len = 128
in_features = 768
out_features = 512
rank = 8
lora_alpha = 16
dropout = 0.1

X = torch.randn(batch_size,seq_len,in_features)

lora_layer = LinearLoRALayer(
    in_features = in_features,
    out_features = out_features,
    rank = rank,
    lora_alpha = lora_alpha,
    dropout = dropout,
    merge = False
)

# Forward pass
output = lora_layer(X)
print(f"Output shape (no merge):{output.shape}")

# Test weight merging/unmerging
lora_layer.merge_weight()
output_after_merge = lora_layer(X)
lora_layer.unmerge_weight()
output_after_unmerge = lora_layer(X)

print("Max difference after merge/unmerge cycle:",
      torch.max(torch.abs(output - output_after_unmerge)).item())

致谢

由于本人水平有限，博客中如出现理解偏颇之处，欢迎大家指正

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

美团悄悄上线了生活Agent，懒人的春天真的要来了。

2048 AI社区

揭秘全解：提示工程架构师提示工程文档规范指南揭秘全解

提示工程文档是记录提示全生命周期（设计→测试→迭代→协作）的结构化文档沉淀知识：把“个人经验”变成“团队资产”；统一认知：让技术、业务、运营对“AI能力”达成共识；可追溯性：迭代时能快速定位“为什么变”“变了什么”；降低风险：避免因“口口相传”导致的错误。这是文档的“核心”，要写清楚“提示的目标、输入输出、规则、示例”，让所有人都能看懂“这个提示是做什么的”。输入：明确AI需要的“信息项”（必填/