Day 32 - LoRA — Parameter-Efficient Finetuning

1-313233···80

Leverage in finance means doing more with less capital. LoRA is leverage for model training: achieve full-finetune quality with 0.1% of the trainable parameters.— Day 32 Principle

I. LoRA Mechanism

Instead of updating the full weight matrix W, LoRA adds a low-rank decomposition: W + BA where B is [d, r] and A is [r, d] with r « d. Only A and B are trained.

class LoRALinear(nn.Module):
    def __init__(self, linear, rank=8, alpha=16):
        super().__init__()
        self.linear = linear
        self.lora_A = nn.Parameter(torch.randn(linear.in_features, rank) * 0.01)
        self.lora_B = nn.Parameter(torch.zeros(rank, linear.out_features))
        self.scaling = alpha / rank
        linear.weight.requires_grad = False
    def forward(self, x):
        return self.linear(x) + (x @ self.lora_A @ self.lora_B) * self.scaling

Why Low Rank Works

Weight updates during finetuning are empirically low-rank. LoRA exploits this by constraining updates to a low-rank subspace. With rank=8, you train ~0.1% of parameters and match full finetuning.

V. Deliverables

LoRA implementation
Rank selection
Alpha/rank scaling
Apply to attention layers
Parameter count comparison
Quality benchmark

LoRA makes LLM finetuning accessible on consumer GPUs. Tomorrow: RLHF.— Day 32 Closing