Day 23 - Token Embeddings & Positional Embeddings

1-222324···80

A word in isolation means little. Its meaning emerges from context and position. Embeddings capture the first; positional encodings capture the second.— Day 23 Principle

I. Embedding Tables

An embedding table is a learnable matrix E[vocab_size, d_model]. Token index i returns row E[i]. After training, semantically similar tokens have similar vectors.

token_emb = nn.Embedding(vocab_size, d_model)
pos_emb = nn.Embedding(max_seq_len, d_model)
x = token_emb(token_ids) + pos_emb(positions)

Embeddings Are Learned Representations

The embedding table is randomly initialized. Through training, gradient descent organizes it so similar tokens cluster. This emergent structure is one of deep learning’s most elegant phenomena.

V. Deliverables

Token embedding table
Positional embedding
Sinusoidal vs learned comparison
Embedding visualization
Dimension analysis

Embeddings convert discrete symbols to continuous geometry. Tomorrow: causal attention revisited.— Day 23 Closing