A word in isolation means little. Its meaning emerges from context and position. Embeddings capture the first; positional encodings capture the second.— Day 23 Principle
I. Embedding Tables
An embedding table is a learnable matrix E[vocab_size, d_model]. Token index i returns row E[i]. After training, semantically similar tokens have similar vectors.
token_emb = nn.Embedding(vocab_size, d_model)
pos_emb = nn.Embedding(max_seq_len, d_model)
x = token_emb(token_ids) + pos_emb(positions)
Embeddings Are Learned Representations
The embedding table is randomly initialized. Through training, gradient descent organizes it so similar tokens cluster. This emergent structure is one of deep learning’s most elegant phenomena.
V. Deliverables
- Token embedding table
- Positional embedding
- Sinusoidal vs learned comparison
- Embedding visualization
- Dimension analysis
Embeddings convert discrete symbols to continuous geometry. Tomorrow: causal attention revisited.— Day 23 Closing