Execution follows preparation. With proper init and LR schedule, training is remarkably stable. Watch the loss curve and the generated text evolve together.— Day 27 Principle
I. The Training Loop
for step in range(max_steps):
lr = get_lr(step, warmup_steps, max_steps, max_lr, min_lr)
for param_group in optimizer.param_groups:
param_group['lr'] = lr
x, y = get_batch('train')
logits, loss = model(x, y)
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
if step % eval_interval == 0:
val_loss = estimate_loss()
print(f"step {step}: train {loss:.4f}, val {val_loss:.4f}")
V. Deliverables
- Full training loop
- Loss monitoring
- Checkpoint saving
- Sample generation during training
- Train vs val tracking
- Gradient norm logging
You’ve pre-trained a language model. Tomorrow: text generation strategies.— Day 27 Closing