1-141516···80
← Back to index
PHASE 2 Deep Networks · Day 15 of 80 · makemore & GPT

WaveNet Architecture

Build a hierarchical language model using dilated causal convolutions and the WaveNet tree-like structure.

A hierarchical thesis stacks micro, meso, and macro analysis. WaveNet uses the same principle: pairs merge in a tree, so receptive field grows exponentially.— Day 15 Principle

I. Hierarchical Merge

Instead of concatenating all context characters flat, merge them in pairs hierarchically. Three levels give 8-character context with nonlinear depth.

# WaveNet-style hierarchical model emb = C[X] # [B, 8, emb_dim] x = emb.view(B, 4, emb_dim*2) x = torch.tanh(x @ W1 + b1) x = x.view(B, 2, n_hidden*2) x = torch.tanh(x @ W2 + b2) x = x.view(B, n_hidden*2) logits = x @ W3 + b3

Exponential Receptive Field

Each merge level doubles context. Three levels: 2x2x2=8 characters. Parameters grow linearly while context grows exponentially.

IV. The Matrix

Deep Intuition
Surface Only
Quick
🎯

DO FIRST

Implement hierarchical merge with block_size=8. Beat flat MLP.

IF TIME

Add BatchNorm between levels.

Slow
🖐

CAREFULLY

Trace character influence through levels.

🚫

AVOID

Full dilated convolutions.

V. Today’s Deliverables

The hierarchical structure previews Transformers. Tomorrow: self-attention.— Day 15 Closing