1-222324···80
← Back to index
PHASE 3 LLM Architecture · Day 23 of 80 · Raschka LLMs From Scratch

Token Embeddings & Positional Embeddings

Deep dive into how tokens become vectors and how position information is encoded.

A word in isolation means little. Its meaning emerges from context and position. Embeddings capture the first; positional encodings capture the second.— Day 23 Principle

I. Embedding Tables

An embedding table is a learnable matrix E[vocab_size, d_model]. Token index i returns row E[i]. After training, semantically similar tokens have similar vectors.

token_emb = nn.Embedding(vocab_size, d_model) pos_emb = nn.Embedding(max_seq_len, d_model) x = token_emb(token_ids) + pos_emb(positions)

Embeddings Are Learned Representations

The embedding table is randomly initialized. Through training, gradient descent organizes it so similar tokens cluster. This emergent structure is one of deep learning’s most elegant phenomena.

V. Deliverables

Embeddings convert discrete symbols to continuous geometry. Tomorrow: causal attention revisited.— Day 23 Closing