LoRA是一种低资源微调大模型的方法。优点包含不增加推理耗时,便于优化、大幅度降低微调的显存占用率。

一般全量微调(full fine-tuning)需要对所有参数进行微调,即参数更新的增量$\Delta\Phi$与预训练参数$\Phi$的维度是一致的。LoRA则认为可以微调更少的参数实现相同的微调效果。

对于预训练权重矩阵$W_0\in\mathbb{R}^{d\times k}$,可以使用一个低秩分解来表示参数的微调量$\Delta W$,即

此时只需要对参数$B$和$A$进行微调即可,大幅度降低了微调参数量的维度。LoRA微调可以用于embedding层、线性层和卷积层中的权重。

针对线性层进行LoRA微调的PyTorch代码实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class LoRALinear(nn.Module):
def __init__(self, in_dim, out_dim, merge, rank=16, lora_alpha=16, dropout=0.5):
super().__init__()

self.in_dim = in_dim
self.out_dim = out_dim
self.merge = merge
self.rank = rank
self.lora_alpha = lora_alpha

self.linear = nn.Linear(in_dim, out_dim)

self.lora_b = nn.Parameter(torch.zeros(out_dim, rank))
self.lora_a = nn.Parameter(torch.zeros(rank, in_dim))
self.scale = self.lora_alpha / self.rank
self.linear.weight.requires_grad = False

self.dropout = nn.Dropout(self.dropout) if dropout > 0 else nn.Identity()

nn.init.kaiming_uniform_(self.lora_a, a=math.sqrt(5))
nn.init.zeros_(self.lora_b)

def forward(self, x):
if self.rank > 0 and self.merge:
output = F.linear(x, self.linear.weight + self.lora_b @ self.lora_a * self.scale, self.linear.bias)
output = self.dropout(output)
else:
return self.dropout(self.linear(x))