Day 20 - Training GPT on Shakespeare

1-192021···80

The moment a thesis generates its first real return. Today: months of foundation work culminate in watching GPT generate Shakespeare from scratch.— Day 20 Principle

I. Training Setup

batch_size = 64
block_size = 256
max_iters = 5000
lr = 3e-4
model = GPT()
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
for step in range(max_iters):
    xb, yb = get_batch('train')
    logits, loss = model(xb, yb)
    optimizer.zero_grad(set_to_none=True)
    loss.backward()
    optimizer.step()

II. Sampling Strategies

Greedy (argmax), temperature sampling, top-k. Each trades quality for diversity.

context = torch.zeros((1,1), dtype=torch.long)
print(decode(model.generate(context, 500)[0].tolist()))
# "ROMEO: And therefore, since you have to the market
#  Of fortune's fall, I pray you tell me this..."

Phase 2 Complete

From micrograd Value to GPT generating Shakespeare in 20 days. Every component built from scratch. Phase 3: Raschka’s systematic LLM engineering.

IV. The Matrix

Deep Intuition

Surface Only

Quick

🎯

DO FIRST

Train 5000 steps. Generate 500 chars. Marvel.

⏭

IF TIME

Temperature experiments: 0.5, 1.0, 1.5.

Slow

🖐

CAREFULLY

Plot train vs val loss curves.

🚫

AVOID

Scaling up. Celebrate the milestone.

V. Today’s Deliverables

Data pipeline for Shakespeare
Training 5000 steps
Loss tracking
Text generation
Temperature experiments
Phase 2 review

Phase 2 complete. From Value to GPT. Phase 3: Raschka’s production LLM engineering.— Day 20 Closing · End of Phase 2