Building a firm from scratch means assembling research, trading, risk, compliance. Each well-understood individually. The art is in assembly. Today you assemble GPT.— Day 19 Principle
I. The Full GPT
class GPT(nn.Module):
def __init__(self):
super().__init__()
self.token_emb = nn.Embedding(vocab_size, n_embd)
self.pos_emb = nn.Embedding(block_size, n_embd)
self.blocks = nn.Sequential(*[Block(n_embd, n_head) for _ in range(n_layer)])
self.ln_f = nn.LayerNorm(n_embd)
self.lm_head = nn.Linear(n_embd, vocab_size)
def forward(self, idx, targets=None):
B, T = idx.shape
x = self.token_emb(idx) + self.pos_emb(torch.arange(T))
x = self.blocks(x)
logits = self.lm_head(self.ln_f(x))
loss = F.cross_entropy(logits.view(-1,logits.size(-1)), targets.view(-1)) if targets is not None else None
return logits, loss
def generate(self, idx, max_new):
for _ in range(max_new):
logits, _ = self(idx[:, -block_size:])
probs = F.softmax(logits[:,-1,:], dim=-1)
idx = torch.cat((idx, torch.multinomial(probs,1)), dim=1)
return idx
~10M Parameters
n_embd=384, n_head=6, n_layer=6, block_size=256, vocab=65: ~10.6M params. GPT-2 small: 124M. Same architecture, different scale.
IV. The Matrix
Deep Intuition
Surface Only
Quick
🎯
DO FIRST
Assemble full GPT. Print param count. Forward pass test.
⏭
IF TIME
Load Shakespeare dataset.
Slow
🖐
CAREFULLY
Trace shapes through full forward pass.
🚫
AVOID
Training. Assembly first.
V. Today’s Deliverables
- GPT class with all components
- Forward pass verification
- Generate method
- Parameter count
- Shape trace
- Smoke test
You have built GPT. The actual architecture. Tomorrow you train it on Shakespeare.— Day 19 Closing