1-313233···80
← Back to index
PHASE 3 LLM Architecture · Day 32 of 80 · Raschka LLMs From Scratch

LoRA — Parameter-Efficient Finetuning

Low-Rank Adaptation: finetune a fraction of parameters while matching full finetuning quality.

Leverage in finance means doing more with less capital. LoRA is leverage for model training: achieve full-finetune quality with 0.1% of the trainable parameters.— Day 32 Principle

I. LoRA Mechanism

Instead of updating the full weight matrix W, LoRA adds a low-rank decomposition: W + BA where B is [d, r] and A is [r, d] with r « d. Only A and B are trained.

class LoRALinear(nn.Module): def __init__(self, linear, rank=8, alpha=16): super().__init__() self.linear = linear self.lora_A = nn.Parameter(torch.randn(linear.in_features, rank) * 0.01) self.lora_B = nn.Parameter(torch.zeros(rank, linear.out_features)) self.scaling = alpha / rank linear.weight.requires_grad = False def forward(self, x): return self.linear(x) + (x @ self.lora_A @ self.lora_B) * self.scaling

Why Low Rank Works

Weight updates during finetuning are empirically low-rank. LoRA exploits this by constraining updates to a low-rank subspace. With rank=8, you train ~0.1% of parameters and match full finetuning.

V. Deliverables

LoRA makes LLM finetuning accessible on consumer GPUs. Tomorrow: RLHF.— Day 32 Closing