Day 27 - Pre-training a Small GPT on Text Data

Execution follows preparation. With proper init and LR schedule, training is remarkably stable. Watch the loss curve and the generated text evolve together.— Day 27 Principle

I. The Training Loop

for step in range(max_steps):
    lr = get_lr(step, warmup_steps, max_steps, max_lr, min_lr)
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr
    x, y = get_batch('train')
    logits, loss = model(x, y)
    optimizer.zero_grad()
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    optimizer.step()
    if step % eval_interval == 0:
        val_loss = estimate_loss()
        print(f"step {step}: train {loss:.4f}, val {val_loss:.4f}")

V. Deliverables

Full training loop
Loss monitoring
Checkpoint saving
Sample generation during training
Train vs val tracking
Gradient norm logging

You’ve pre-trained a language model. Tomorrow: text generation strategies.— Day 27 Closing