1-262728···80
← Back to index
PHASE 3 LLM Architecture · Day 27 of 80 · Raschka LLMs From Scratch

Pre-training a Small GPT on Text Data

Run the full pre-training loop on a text corpus. Monitor loss, generate samples, save checkpoints.

Execution follows preparation. With proper init and LR schedule, training is remarkably stable. Watch the loss curve and the generated text evolve together.— Day 27 Principle

I. The Training Loop

for step in range(max_steps): lr = get_lr(step, warmup_steps, max_steps, max_lr, min_lr) for param_group in optimizer.param_groups: param_group['lr'] = lr x, y = get_batch('train') logits, loss = model(x, y) optimizer.zero_grad() loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) optimizer.step() if step % eval_interval == 0: val_loss = estimate_loss() print(f"step {step}: train {loss:.4f}, val {val_loss:.4f}")

V. Deliverables

You’ve pre-trained a language model. Tomorrow: text generation strategies.— Day 27 Closing