Tool definitions are a fixed tax
MCP tools aren't loaded lazily. When your agent starts, the full JSON schema for every tool on every connected server is serialized into the model's context as static overhead. That cost is paid on every single turn, whether or not the agent ever calls those tools.
In a neutral benchmark, 756 tools wired directly into an agent consumed 138,417 tokens of static context. At that scale the direct path is effectively unusable: the agent degrades, struggles to reason over the tool set, and burns budget before doing any work.
Front everything with three tools
Universal MCP Bridge sits in front of all your MCP servers and exposes only three meta-tools — list_tools, list_mcps, and route_mcp_call. The agent discovers tools on demand and pulls a full schema only for the one it's about to use.
That same 756-tool surface collapses from 138,417 tokens to ~1,200 — a ~99.1% reduction in static tool context. The figures below are measured benchmark results, not estimates.
From ~99.1% static to 55–75% working
The ~99.1% headline is the reduction in static tool context specifically. In a real session, tool definitions are only one slice of everything in the window — system prompt, history, files, and outputs share the budget too.
Measured across real agent sessions, removing the tool-definition tax yields a 55%–75% reduction in total working context. More room for reasoning, longer sessions before compaction, and lower per-turn cost — without giving up access to any backing tool.
Why does adding MCP servers slow my agent down?
Every tool from every connected MCP server is loaded into the model’s context as a static tool definition before the conversation starts. In a neutral benchmark, 756 tools wired directly into an agent consumed 138,417 tokens of static context — spent before the agent does any work.
How does a gateway reduce MCP context?
Instead of exposing every tool, Universal MCP Bridge presents three meta-tools — list_tools, list_mcps, and route_mcp_call — and serves full tool schemas only on demand. The same 756-tool surface collapses from 138,417 tokens to roughly 1,200, a ~99.1% reduction in static tool context.
How much working context does this actually save?
The ~99.1% figure is the static tool-context reduction. Across real agent sessions, where tool definitions are one part of total context, the practical working-context reduction measures 55%–75%.