Day 15 - WaveNet Architecture

1-141516···80

A hierarchical thesis stacks micro, meso, and macro analysis. WaveNet uses the same principle: pairs merge in a tree, so receptive field grows exponentially.— Day 15 Principle

I. Hierarchical Merge

Instead of concatenating all context characters flat, merge them in pairs hierarchically. Three levels give 8-character context with nonlinear depth.

# WaveNet-style hierarchical model
emb = C[X]   # [B, 8, emb_dim]
x = emb.view(B, 4, emb_dim*2)
x = torch.tanh(x @ W1 + b1)
x = x.view(B, 2, n_hidden*2)
x = torch.tanh(x @ W2 + b2)
x = x.view(B, n_hidden*2)
logits = x @ W3 + b3

Exponential Receptive Field

Each merge level doubles context. Three levels: 2x2x2=8 characters. Parameters grow linearly while context grows exponentially.

IV. The Matrix

Deep Intuition

Surface Only

Quick

🎯

DO FIRST

Implement hierarchical merge with block_size=8. Beat flat MLP.

⏭

IF TIME

Add BatchNorm between levels.

Slow

🖐

CAREFULLY

Trace character influence through levels.

🚫

AVOID

Full dilated convolutions.

V. Today’s Deliverables

Hierarchical model: 3-level merge
Beat flat MLP dev NLL
BatchNorm between levels
Receptive field trace
Sampling from hierarchical model

The hierarchical structure previews Transformers. Tomorrow: self-attention.— Day 15 Closing