Not every model needs to generate. Sometimes the most valuable output is a single decision: buy or sell, spam or not, positive or negative.— Day 29 Principle
I. Classification Head
Take the last token’s hidden state from the LLM, pass through a linear layer to produce class logits. Fine-tune end-to-end or freeze the backbone and train only the head.
hidden = model.transformer(input_ids)[:, -1, :] # last token
logits = classifier_head(hidden) # [B, num_classes]
loss = F.cross_entropy(logits, labels)
V. Deliverables
- Classification head
- Feature extraction vs fine-tuning
- Last token pooling
- Accuracy metrics
- Comparison to traditional ML
LLMs are powerful feature extractors. Tomorrow: instruction finetuning.— Day 29 Closing