LoRA是一种低资源微调大模型的方法。优点包含不增加推理耗时,便于优化、大幅度降低微调的显存占用率。
一般全量微调(full fine-tuning)需要对所有参数进行微调,即参数更新的增量$\Delta\Phi$与预训练参数$\Phi$的维度是一致的。LoRA则认为可以微调更少的参数实现相同的微调效果。
对于预训练权重矩阵$W_0\in\mathbb{R}^{d\times k}$,可以使用一个低秩分解来表示参数的微调量$\Delta W$,即
此时只需要对参数$B$和$A$进行微调即可,大幅度降低了微调参数量的维度。LoRA微调可以用于embedding层、线性层和卷积层中的权重。
针对线性层进行LoRA微调的PyTorch代码实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| class LoRALinear(nn.Module): def __init__(self, in_dim, out_dim, merge, rank=16, lora_alpha=16, dropout=0.5): super().__init__() self.in_dim = in_dim self.out_dim = out_dim self.merge = merge self.rank = rank self.lora_alpha = lora_alpha self.linear = nn.Linear(in_dim, out_dim) self.lora_b = nn.Parameter(torch.zeros(out_dim, rank)) self.lora_a = nn.Parameter(torch.zeros(rank, in_dim)) self.scale = self.lora_alpha / self.rank self.linear.weight.requires_grad = False self.dropout = nn.Dropout(self.dropout) if dropout > 0 else nn.Identity() nn.init.kaiming_uniform_(self.lora_a, a=math.sqrt(5)) nn.init.zeros_(self.lora_b) def forward(self, x): if self.rank > 0 and self.merge: output = F.linear(x, self.linear.weight + self.lora_b @ self.lora_a * self.scale, self.linear.bias) output = self.dropout(output) else: return self.dropout(self.linear(x))
|