Leverage in finance means doing more with less capital. LoRA is leverage for model training: achieve full-finetune quality with 0.1% of the trainable parameters.— Day 32 Principle
I. LoRA Mechanism
Instead of updating the full weight matrix W, LoRA adds a low-rank decomposition: W + BA where B is [d, r] and A is [r, d] with r « d. Only A and B are trained.
class LoRALinear(nn.Module):
def __init__(self, linear, rank=8, alpha=16):
super().__init__()
self.linear = linear
self.lora_A = nn.Parameter(torch.randn(linear.in_features, rank) * 0.01)
self.lora_B = nn.Parameter(torch.zeros(rank, linear.out_features))
self.scaling = alpha / rank
linear.weight.requires_grad = False
def forward(self, x):
return self.linear(x) + (x @ self.lora_A @ self.lora_B) * self.scaling
Why Low Rank Works
Weight updates during finetuning are empirically low-rank. LoRA exploits this by constraining updates to a low-rank subspace. With rank=8, you train ~0.1% of parameters and match full finetuning.
V. Deliverables
- LoRA implementation
- Rank selection
- Alpha/rank scaling
- Apply to attention layers
- Parameter count comparison
- Quality benchmark
LoRA makes LLM finetuning accessible on consumer GPUs. Tomorrow: RLHF.— Day 32 Closing