The moment a thesis generates its first real return. Today: months of foundation work culminate in watching GPT generate Shakespeare from scratch.— Day 20 Principle
I. Training Setup
batch_size = 64
block_size = 256
max_iters = 5000
lr = 3e-4
model = GPT()
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
for step in range(max_iters):
xb, yb = get_batch('train')
logits, loss = model(xb, yb)
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()
II. Sampling Strategies
Greedy (argmax), temperature sampling, top-k. Each trades quality for diversity.
context = torch.zeros((1,1), dtype=torch.long)
print(decode(model.generate(context, 500)[0].tolist()))
# "ROMEO: And therefore, since you have to the market
# Of fortune's fall, I pray you tell me this..."
Phase 2 Complete
From micrograd Value to GPT generating Shakespeare in 20 days. Every component built from scratch. Phase 3: Raschka’s systematic LLM engineering.
IV. The Matrix
Deep Intuition
Surface Only
Quick
🎯
DO FIRST
Train 5000 steps. Generate 500 chars. Marvel.
⏭
IF TIME
Temperature experiments: 0.5, 1.0, 1.5.
Slow
🖐
CAREFULLY
Plot train vs val loss curves.
🚫
AVOID
Scaling up. Celebrate the milestone.
V. Today’s Deliverables
- Data pipeline for Shakespeare
- Training 5000 steps
- Loss tracking
- Text generation
- Temperature experiments
- Phase 2 review
Phase 2 complete. From Value to GPT. Phase 3: Raschka’s production LLM engineering.— Day 20 Closing · End of Phase 2