1-303132···80
← Back to index
PHASE 3 LLM Architecture · Day 31 of 80 · Raschka LLMs From Scratch

Supervised Finetuning (SFT) in Practice

Run the SFT training loop: freeze/unfreeze strategies, loss masking, and evaluation.

Training a specialist from a generalist. SFT narrows the model’s behavior from ‘predict any text’ to ‘follow instructions helpfully.’— Day 31 Principle

I. SFT Training

# Only compute loss on assistant tokens labels = input_ids.clone() labels[~assistant_mask] = -100 # ignore non-assistant tokens loss = F.cross_entropy(logits.view(-1, V), labels.view(-1), ignore_index=-100)

Loss Masking

During SFT, you only want to optimize the model’s responses, not its ability to reproduce the instructions. Setting non-response tokens to -100 in the labels tells cross_entropy to ignore them.

V. Deliverables

SFT transforms a base model into an assistant. Tomorrow: parameter-efficient methods.— Day 31 Closing