How to Track Per-Tool Costs Across MCP Servers
This article explains how organizations can track per-tool costs across MCP servers using centralized AI gateways, unified audit logs, virtual key budgets, and observability tools like OpenTelemetry and Prometheus. It covers the challenges of cost attribution in distributed AI systems, explains model-side versus tool-side costs, and highlights optimization strategies such as semantic caching and Code Mode to reduce LLM and API expenses.
Track per-tool costs across MCP servers with centralized audit logs, virtual key budgets, and unified attribution for LLM and tool spend.
As AI agents integrate with a growing number of Model Context Protocol (MCP) servers, cost management becomes more complex than tracking LLM token usage alone. A single agent execution may involve filesystem operations, web searches, database queries, and paid external APIs, each with distinct pricing models and cost implications. Teams attempting to track per-tool costs across MCP servers often find that conventional observability platforms attribute spend to the model rather than to the underlying tools invoked.
Bifrost, the open-source AI gateway from Maxim AI, addresses this by routing every MCP tool invocation through a centralized audit layer. Each execution captures the tool name, originating server, virtual key, latency, token impact, and execution status, with model-side cost computed in the same trace. The result is a unified dataset for cost attribution across LLM and tool usage.
Why Per-Tool Cost Tracking Across MCP Servers Is Challenging
In a basic agent architecture, requests are sent directly from the application to both the LLM provider and individual MCP servers. This design introduces several challenges for accurate cost attribution:
-
Fragmented audit trails: Each MCP server maintains its own logs, typically without including the virtual key, team, or session context responsible for the request.
-
Tool name ambiguity: Different MCP servers frequently expose tools with identical names. For example, both a GitHub server and a Jira server may include a search tool, making aggregated logs difficult to interpret.
-
Separated cost sources: LLM token usage is tracked in the model provider's dashboard, while paid MCP tool usage appears in separate vendor billing systems. There is no unified schema connecting these costs to a single agent execution.
Without a centralized layer, answering questions such as "which tool caused the $8,000 increase last week?" requires time-consuming analysis across multiple logging systems.
Defining Cost in MCP Tooling
Effective MCP cost tracking must account for two primary contributors for each tool invocation:e
-
Model-side costs: Every tool made available to the LLM consumes tokens. In environments running multiple MCP servers with hundreds of tools, tool definitions alone can consume thousands of tokens per interaction before any computation begins.
-
Tool-side costs: Many MCP servers act as wrappers around paid APIs, including search services, geocoding platforms, SaaS integrations, and image generation systems. Each invocation may generate a separate billable event in the upstream vendor's system.
A robust cost tracking solution must capture both dimensions, associate them with a shared request identifier, and attribute them to the same virtual key, team, or customer. The MCP gateway is designed around this principle, providing a unified logging schema that incorporates model costs and tool execution metadata per request.
How the Gateway Enables Per-Tool Cost Tracking
The platform functions as both an MCP client for upstream tool servers and an MCP server for clients such as Claude Code, Claude Desktop, and Cursor. By routing all requests and tool invocations through the gateway, the system ensures that every action is recorded within a single structured logging stream. Explore the MCP gateway for full architectural details.
Unified audit trail for each tool execution
When an LLM response triggers tool calls that are executed via the gateway, the platform records:
-
The originating MCP client or server
-
The fully qualified tool name
-
Input arguments passed to the tool
-
The associated virtual key
-
Latency, token usage, and execution status
-
Any guardrail policies applied before or after execution
Tool names are automatically prefixed with the MCP client name, such as filesystem_list_directory or github_search, ensuring uniqueness across servers. This naming convention enables precise per-tool attribution by mapping each log entry to a specific tool and source.
Combined model and tool cost attribution
For each request, the gateway calculates LLM costs using real-time provider pricing, token consumption, request type (such as chat, embeddings, or speech), cache utilization, and batch optimizations. Tool execution metadata is recorded within the same request trace, allowing platform teams to correlate model spend and tool activity through a shared request identifier.
Since tool execution remains controlled by the application, each invocation maintains a clear execution context that can be joined with upstream vendor billing data through the captured tool name and virtual key.
Configuring Cost Attribution with Virtual Keys
Virtual keys serve as the primary mechanism for governance and cost attribution within the platform. Each key is associated with its own budget, rate limits, tool access policies, and reporting scope, ensuring that all activity can be attributed to a specific team, project, or customer.
It is recommended to assign one virtual key per team, feature, or environment. The budget and limits framework supports a hierarchical structure:
-
Customer: Budget allocation for a business unit or external account
-
Team: Subdivision with independent budget controls
-
Virtual key: Granular control over usage, rate limits, and tool access
-
Provider configuration: Separate tracking for providers such as OpenAI and Anthropic within a single virtual key
When a request is processed, the platform evaluates all applicable budget constraints. If any limit is exceeded, the request is blocked. Costs are deducted across all levels, enabling both granular and aggregated visibility. Budget reset intervals range from minutes to yearly cycles, with calendar-aligned resets occurring at UTC boundaries for daily, weekly, monthly, and yearly periods.
Tool access can also be restricted per virtual key. Using MCP tool filtering, teams can expose only relevant tools to specific environments. For example, a production-support key limited to filesystem.read_file and github.list_prs cannot incur charges from external APIs that are not explicitly enabled in its allowlist. This is the most direct mechanism for capping tool-side spend at the gateway layer.
Observability and Telemetry for Tool-Level Cost Analysis
Comprehensive cost analysis requires more than raw logs. The platform provides multiple telemetry integrations to support production-grade observability:
-
Native Prometheus metrics: Counters and histograms segmented by tool, virtual key, and provider
-
OpenTelemetry tracing: OTLP export to backends such as Grafana Tempo, Honeycomb, and New Relic
-
Datadog and BigQuery integration: Direct access to APM traces, metrics, and LLM observability without additional infrastructure
-
Log exports: Long-term storage and analysis in data lakes or external systems
By correlating traces through a shared request identifier, teams can quickly answer operational and financial questions, such as identifying the most expensive tool invocation, determining which virtual key contributed to a spike in API usage, or analyzing latency trends across MCP servers.
For environments with strict compliance requirements, content logging can be disabled while still retaining metadata such as tool name, server, latency, and execution status. This preserves complete cost visibility without exposing sensitive data, which is essential for regulated industries deploying the platform in enterprise environments.
Transitioning From Cost Tracking to Cost Reduction
Visibility is essential, but reducing cost is the ultimate objective. The gateway includes features designed to translate cost insights into measurable savings.
Code Mode addresses a significant inefficiency in multi-server environments: excessive token consumption from tool definitions. In traditional MCP setups, hundreds of tool schemas are included in each LLM interaction. Code Mode replaces this approach with four meta-tools that allow the model to write Python code, executed inside a Starlark sandbox, to orchestrate multiple tool calls programmatically. Tool definitions are loaded on demand from virtual .pyi stub files rather than injected into every request. This approach reduces token usage by 50% or more and improves execution latency by 40-50% in typical multi-server workflows.
Semantic caching operates at the gateway level to eliminate redundant LLM calls. When incoming requests are semantically similar to previous ones, cached responses are served instead of re-executing the model. This reduction is directly reflected in per-virtual-key cost metrics as fewer billable operations.
Together, these capabilities enable teams to move from cost visibility to actionable optimization using the same dataset that powers attribution.
Get Started With Per-Tool Cost Tracking
Tracking per-tool costs across MCP servers becomes feasible when all tool invocations pass through a centralized gateway that provides structured logging, consistent attribution, and hierarchical budget enforcement. The platform unifies LLM and tool activity within a single audit trail, associates each cost with a virtual key, and exports the data to systems such as Prometheus, OpenTelemetry, BigQuery, or Datadog. The open-source implementation is available on GitHub and can be deployed with a single command.
Subscribe & get all related Blog notification.
Post your comment