Agent Architecture · Tool Design · February 2026

Seeing Like an Agent

A first-principles walkthrough of tool design, action spaces, and progressive disclosure — from the lessons of building Claude Code.

One of the hardest parts of building an agent harness is constructing its action space — the set of tools and capabilities the model can invoke. Too few tools and the agent is crippled. Too many and it's paralyzed by choice.

"You want to give it tools that are shaped to its own abilities. But how do you know what those abilities are? You pay attention, read its outputs, experiment."

— Claude Code Team, Anthropic

Foundation

The Tool Power Spectrum

fig. 1 — power vs. skill tradeoff in tool design

Paper = No Tools

The model just outputs text. It can reason but can't act on the world. Like solving math with just pen and paper.

Calculator = Defined Tools

Structured tool calls (search, file_read, API). Powerful but each call round-trips through context — the "composition tax."

Computer = Code Execution

Bash, Python, or PTC. The agent writes code that orchestrates tools directly. Highest ceiling, but the model must know how to code.

The Core Problem

The Composition Tax

Every tool call pays a tax

In traditional tool calling, each action round-trips through Claude's context window. The result gets serialized (even thousands of rows when you only need five), triggers a new reasoning step, and adds latency.

With 3 sequential tool calls, you're paying 3× the latency, 3× the context bloat, and 3 full reasoning steps.

"The composition tax grows with the number of actions. This is the fundamental tension in tool design."

TAKEAWAY → This is why "just add more tools" doesn't scale. Each tool adds cognitive and computational overhead.

Programmatic Tool Calling

Compose in Code, Not in Context

Code as the orchestration layer

With PTC, Claude writes code that calls tools as functions inside a container. Intermediate results stay in the code — they never bloat the context window.

The container pauses when a tool is invoked, the call crosses the sandbox boundary, gets fulfilled externally, and the result returns to the running code.

"Rather than pulling 50 raw search results into context, the code can parse, filter, and cross-reference results programmatically. This keeps what's relevant and discards the rest."

RESULT → +11% accuracy, −24% input tokens. Opus 4.6 with PTC is #1 on LMArena Search Arena.

Elicitation Design

Finding the Sweet Spot

fig. 4 — three attempts at elicitation in claude code

The Claude Code team tried three approaches to get Claude to ask better clarifying questions. Modified markdown was too loose (Claude broke format). ExitPlanTool parameter was too rigid (questions came after the plan was already made). The AskUserQuestion tool hit the sweet spot — structured enough for reliable UI, flexible enough that Claude actually liked calling it.

· · ·

Capability Drift

Tools That Helped Become Constraints

The TodoWrite → Task evolution

Early Claude needed a Todo list + reminders every 5 turns to stay on track. But as models improved, the reminders became a cage — Claude felt it had to stick to the list instead of adapting.

Opus 4.5 got better at subagents, but TodoWrite couldn't handle inter-agent coordination. The Task Tool replaced it with dependencies, shared updates, and deletable tasks.

"As model capabilities increase, the tools that your models once needed might now be constraining them. It's important to constantly revisit previous assumptions."

TAKEAWAY → Schedule regular "tool audits." What helped your weak model may be hurting your strong one.

Search Design

From RAG to Self-Built Context

fig. 6 — context acquisition evolution in claude code

Key Pattern

Progressive Disclosure

The tree, not the library

Instead of stuffing all knowledge into the system prompt or a RAG index, progressive disclosure lets the agent explore a tree of files. Each file references others. The agent only loads what's relevant.

This is how Claude Code added self-documentation without adding a tool. A "Guide" subagent follows links, searches docs, and returns just the answer.

"Progressive disclosure is now a common technique we use to add new functionality without adding a tool."

RULE → Claude Code has ~20 tools. The bar to add a new one is high. Before adding tool #21, ask: can progressive disclosure handle this?

Decision Framework

When to Add a Tool vs. Not

fig. 8 — decision tree for adding tools to your agent

· · ·

Cognitive Load

The 20-Tool Ceiling

More tools ≠ more capability

Claude Code operates with ~20 tools. Each additional tool gives the model one more option to reason about — increasing decision complexity and the chance of misuse.

Before adding tool #21, the team asks: can this be handled by progressive disclosure (skills, docs, subagents) instead?

"The bar to add a new tool is high, because this gives the model one more option to think about."

ANALOGY → Think of portfolio diversification: the 21st holding rarely improves a well-constructed portfolio. Same with tools — quality of curation beats raw quantity.

Complete Mental Model

The Agent Action Space

fig. 10 — claude code's complete action space architecture

Every tool in Claude Code's ~20-tool set serves one of four roles: composability (bash, PTC for code execution), guardrails (file_edit with staleness checks), context building (grep, skills, subagents), or user interaction (AskUser for elicitation). Progressive disclosure — skills, subagents, linked docs — extends the action space without adding tools.