AI agents now burn more tokens than humans, blowing enterprise budgets

Agentic AI usage just overtook human chat in real production data from OpenRouter, which routes 28 trillion tokens weekly. The problem: one agent task burns more tokens than 100 human chats, and most sales teams budgeted for chat, not agents. If your AI spend forecast still models people typing into boxes, you are about to blow your annual budget early.

AI agents now burn more tokens than humans, blowing enterprise budgets

The shift happened faster than anyone budgeted for

OpenRouter processes roughly 28 trillion tokens per week, about 1% of all global AI inference. That is more than Salesforce has run through in the company's entire history. Half that volume is US, half is rest of world, which makes their data a solid proxy for what is actually happening in production.

The big shift: agentic token usage just overtook human usage. For two years, teams dropped custom data into chat interfaces and got modest results. In the last few months, agents started working. You ask an agent to do something, it gets done. The problem is what nobody priced in.

One agentic task costs more than 100 human chats

A human turn in a chat is short. An agentic turn carries heavy context: tool call definitions, MCP gateway specs, skill front matter, plus reasoning and tool calls looping before the agent returns anything. The token burn for one agentic task can dwarf a hundred human interactions.

Chris Clark, COO of OpenRouter, put it plainly: if your forecast still models AI spend like people typing into a box, your forecast is wrong. Large enterprises are already blowing through annual AI budgets early because they sized for chat and got hit with agents.

What this means for sales teams using AI

Sales orgs running AI SDR tools, conversation intelligence, or deal-room agents need to re-forecast now. The token economics changed. Budget for agentic usage as a multiple of human chat, not an extension of it.

Three technical realities matter:

Inference quality varies by provider. Same model, same weights, different performance depending on who serves it. Artificial Analysis benchmarked one open-weight model across providers and got different scores from identical math. The software between raw weights and API response introduces bugs, misconfigurations, and broken tool calls.

Tool calling is load-bearing. Looking at one frontier model on OpenRouter: 55% of requests asked for tools, the model used those tools 83% of the time, and 46% of completions finished because of a tool call. Agents are not chatting. They are calling tools, reading results, calling more tools. A model that reasons well but botches tool calls is useless in an agent.

Success rates vary by provider. Clark ran a live demo: 213 tool calls at an open-weight model on one provider, got errors. Switched providers, same model, same code, errors disappeared. OpenRouter monitors thousands of API endpoints in real time and routes around providers that are failing.

The cost reality

If you are running AI automation for sales, marketing, or support, the hard part is no longer just picking a good model. It is managing inference quality that varies by provider, tool-call success that varies by provider, and failover when things degrade.

The token blowups are real. Forecast agentic spend for what it is: expensive, context-heavy, and fundamentally different from human chat. The chat era is the baseline. Agents are the new load, and they cost more than anyone budgeted for.