Day 19 - Building GPT from Scratch

Building a firm from scratch means assembling research, trading, risk, compliance. Each well-understood individually. The art is in assembly. Today you assemble GPT.— Day 19 Principle

I. The Full GPT

class GPT(nn.Module):
    def __init__(self):
        super().__init__()
        self.token_emb = nn.Embedding(vocab_size, n_embd)
        self.pos_emb = nn.Embedding(block_size, n_embd)
        self.blocks = nn.Sequential(*[Block(n_embd, n_head) for _ in range(n_layer)])
        self.ln_f = nn.LayerNorm(n_embd)
        self.lm_head = nn.Linear(n_embd, vocab_size)
    def forward(self, idx, targets=None):
        B, T = idx.shape
        x = self.token_emb(idx) + self.pos_emb(torch.arange(T))
        x = self.blocks(x)
        logits = self.lm_head(self.ln_f(x))
        loss = F.cross_entropy(logits.view(-1,logits.size(-1)), targets.view(-1)) if targets is not None else None
        return logits, loss
    def generate(self, idx, max_new):
        for _ in range(max_new):
            logits, _ = self(idx[:, -block_size:])
            probs = F.softmax(logits[:,-1,:], dim=-1)
            idx = torch.cat((idx, torch.multinomial(probs,1)), dim=1)
        return idx

~10M Parameters

n_embd=384, n_head=6, n_layer=6, block_size=256, vocab=65: ~10.6M params. GPT-2 small: 124M. Same architecture, different scale.

IV. The Matrix

Deep Intuition

Surface Only

Quick

🎯

DO FIRST

Assemble full GPT. Print param count. Forward pass test.

⏭

IF TIME

Load Shakespeare dataset.

Slow

🖐

CAREFULLY

Trace shapes through full forward pass.

🚫

AVOID

Training. Assembly first.

V. Today’s Deliverables

GPT class with all components
Forward pass verification
Generate method
Parameter count
Shape trace
Smoke test

You have built GPT. The actual architecture. Tomorrow you train it on Shakespeare.— Day 19 Closing