1-192021···80
← Back to index
PHASE 2 Deep Networks · Day 20 of 80 · makemore & GPT

Training GPT on Shakespeare

Train your GPT. Watch it go from random characters to coherent text. Phase 2 finale.

The moment a thesis generates its first real return. Today: months of foundation work culminate in watching GPT generate Shakespeare from scratch.— Day 20 Principle

I. Training Setup

batch_size = 64 block_size = 256 max_iters = 5000 lr = 3e-4 model = GPT() optimizer = torch.optim.AdamW(model.parameters(), lr=lr) for step in range(max_iters): xb, yb = get_batch('train') logits, loss = model(xb, yb) optimizer.zero_grad(set_to_none=True) loss.backward() optimizer.step()

II. Sampling Strategies

Greedy (argmax), temperature sampling, top-k. Each trades quality for diversity.

context = torch.zeros((1,1), dtype=torch.long) print(decode(model.generate(context, 500)[0].tolist())) # "ROMEO: And therefore, since you have to the market # Of fortune's fall, I pray you tell me this..."

Phase 2 Complete

From micrograd Value to GPT generating Shakespeare in 20 days. Every component built from scratch. Phase 3: Raschka’s systematic LLM engineering.

IV. The Matrix

Deep Intuition
Surface Only
Quick
🎯

DO FIRST

Train 5000 steps. Generate 500 chars. Marvel.

IF TIME

Temperature experiments: 0.5, 1.0, 1.5.

Slow
🖐

CAREFULLY

Plot train vs val loss curves.

🚫

AVOID

Scaling up. Celebrate the milestone.

V. Today’s Deliverables

Phase 2 complete. From Value to GPT. Phase 3: Raschka’s production LLM engineering.— Day 20 Closing · End of Phase 2