Semantic Tool Selection: Building Smarter AI Agents with Context-Aware Routing
Anthropic recently published an insightful blog post on code execution with MCP, highlighting a critical challenge in modern AI systems: as agents connect to more tools, loading all tool definitions upfront becomes increasingly inefficient. Their solution—using code execution to load tools on-demand—demonstrates how established software engineering patterns can dramatically improve agent efficiency.
This resonates deeply with our experience building the vLLM Semantic Router. We've observed the same problem from a different angle: when AI agents have access to hundreds or thousands of tools, how do they know which tools are relevant for a given task?
Our solution: semantic tool selection—using semantic similarity to automatically select the most relevant tools for each user query before the request even reaches the LLM.

The Problem: Tool Overload in AI Agents
Context Window Bloat
Consider an AI agent with access to hundreds of tools across multiple domains. Loading all tool definitions into the context window for every request:
- Consumes significant tokens for tool definitions (e.g., 741 tools require ~120K tokens)
- Increases latency as the model processes a large number of tools
- Raises costs due to increased token usage
- May reduce accuracy as the model faces more complex selection decisions
The Relevance Problem
In many cases, most tools are not relevant for a given query:
- User asks: "What's the weather in San Francisco?"
- Agent receives: Hundreds of tool definitions (weather, finance, database, email, calendar, etc.)
- Reality: Only a small subset of tools are actually relevant
This creates inefficiency in terms of tokens, latency, cost, and model decision-making complexity.
The Research Evidence
Recent academic studies have measured the impact of large tool catalogs on LLM performance:
Accuracy Degradation: Research testing tool selection with growing catalogs found that:
- With ~50 tools (8K tokens): Most models maintain 84-95% accuracy
- With ~200 tools (32K tokens): Accuracy ranges from 41-83% depending on model
- With ~740 tools (120K tokens): Accuracy drops to 0-20% for most models
Different models show varying degrees of degradation, with open-source models showing 79-100% degradation when scaling from small to large tool catalogs.
The "Lost in the Middle" Effect: Research has documented position bias where tools in the middle of long lists are less likely to be selected correctly. For example, with 741 tools, middle positions (40-60%) showed 22-52% accuracy compared to 31-32% at the beginning/end positions for some models.
Non-Linear Degradation: Performance degradation is not gradual. Research shows that accuracy can drop sharply as tool count increases, with the transition from 207 to 417 tools showing particularly steep declines (e.g., from 64% to 20% for one model tested).
Our Solution: Semantic Tool Selection
The vLLM Semantic Router implements semantic tool selection as an intelligent filter that sits between the user and the LLM:
How It Works
Step 1: Tool Database with Embeddings
Each tool in our database has:
- Tool definition (name, parameters, schema)
- Rich description optimized for semantic matching
- Pre-computed embedding vector
- Optional metadata (category, tags)
Step 2: Query Embedding and Similarity Search
When a user query arrives:
- Generate an embedding for the query text
- Calculate cosine similarity with all tool embeddings
- Select top-K tools above a similarity threshold
- Inject only relevant tools into the request
Step 3: Request Modification
The router modifies the API request to include only selected tools, dramatically reducing token usage.
Experimental Results
We conducted extensive experiments comparing traditional "load all tools" approaches with our semantic tool selection system across three real-world scenarios. Our findings align with recent research showing that LLMs struggle significantly with large tool catalogs and long contexts in tool-calling scenarios.

