Day 31 - Supervised Finetuning (SFT) in Practice

Training a specialist from a generalist. SFT narrows the model’s behavior from ‘predict any text’ to ‘follow instructions helpfully.’— Day 31 Principle

I. SFT Training

# Only compute loss on assistant tokens
labels = input_ids.clone()
labels[~assistant_mask] = -100  # ignore non-assistant tokens
loss = F.cross_entropy(logits.view(-1, V), labels.view(-1), ignore_index=-100)

Loss Masking

During SFT, you only want to optimize the model’s responses, not its ability to reproduce the instructions. Setting non-response tokens to -100 in the labels tells cross_entropy to ignore them.

V. Deliverables

SFT loop
Loss masking
Freeze strategies
Eval on held-out instructions
Generation quality comparison

SFT transforms a base model into an assistant. Tomorrow: parameter-efficient methods.— Day 31 Closing