Cloudflare announces AI usage budget management feature, allowing users to track who spent how much on which AI.

Cloudflare has announced the addition of 'Spend limits' to its 'Cloudflare AI Gateway' service, which acts as an intermediary between AI applications and various AI providers to manage usage. This new feature allows users to set budget limits for AI usage fees.
Your AI bill is out of control. Cloudflare can fix it now.

The use of AI within companies is rapidly increasing, from internal chatbots summarizing documents and development teams using AI for code reviews to sales representatives drafting customer emails. While the widespread use of AI streamlines operations, companies sometimes only realize the problem at the end of the month when they look at invoices: 'We don't know which department spent how much on which model.'
Cloudflare explains that 'there are cases where companies distribute shared API keys to all their engineers, giving them access to high-performance AI models, and as a result of a sudden surge in usage, the invoice amount balloons without being able to explain the breakdown of usage.' They also state that 'without budget limits or visibility into usage, users tend to choose the most powerful model without considering the cost.'

This difficulty in managing expenses becomes even greater when API keys and management screens are separate for each AI provider. To manage costs while expanding AI use within a company, a system is needed to handle requests centrally before reaching the AI provider.
Cloudflare AI Gateway is a gateway service that acts as an intermediary between your application and AI providers such as OpenAI, Anthropic, and Google. By routing your application through AI Gateway instead of directly calling AI providers, you can centrally record usage from multiple providers, monitor request counts and costs, and utilize features like caching and rate limiting.
Traditional AI Gateways allowed users to view overall account usage, but lacked a mechanism to track individual costs or allocate budgets to teams or users. Cloudflare's newly announced Spend Limits feature extends the existing AI Gateway functionality, allowing users to set budget limits on AI usage fees.
Spend limits are set as a budget in US dollars, not in terms of tokens. While rate limits control the number of requests that can be sent in a given time, Spend limits calculate the cost of each request based on token usage and the price of the AI model, and track whether the cumulative spending within a specified period has reached the limit.

The budget range can be specified by a combination of model, provider, and custom attributes defined by the administrator. For example, you can set a budget only for specific AI models, create separate budget limits for each user ID, or set a monthly budget only for engineering teams. The period can be set daily, weekly, or monthly, and you can choose between a fixed period or a rolling period.
The standard action when the limit is reached is to block requests, but if you don't want to interrupt operations, it's also possible to switch to a less expensive alternative model after the budget for the main, more expensive model has been exhausted.
Spend limits are available as an open beta to AI Gateway users on all plans at the time of writing. Settings can be configured via the Cloudflare dashboard or API, and according to Cloudflare's documentation, up to 20 rules can be defined per gateway. Note that cost tracking is an estimate based on token count and model pricing, and accurate billing amounts should be viewed on each provider's dashboard.
In addition to Spend limits, Cloudflare also announced 'Identity-Based Budgeting and Policies,' which combines Cloudflare Access with existing identity providers, as a closed beta.
When combined with Cloudflare Access, AI Gateway can see information such as the employee who sent the request, their group, and their service account. According to Cloudflare, it's possible to set user-level budgets, such as $500 per month for individual contributors and $2,000 per month for senior engineers, as well as team-level policies, such as allowing machine learning teams to use high-performance models and letting interns use open-source models.

For automated CI/CD pipelines and AI agents, named IDs can be assigned using Cloudflare Access service tokens. This allows for tracking how tokens are being used—for example, a code review bot using 5 million tokens per week and a documentation generation bot using 500,000 tokens—enabling budget policies to be applied without affecting other processes if a particular automated process malfunctions.

Cloudflare states that one of its future initiatives is task-based routing, which distributes requests to the model that can deliver the most appropriate results at the lowest cost. By assigning tasks like summarization to small, inexpensive models and directing only large-scale code modifications to high-performance models, they aim to move from simple spending constraints to cost optimization.
Related Posts:
in AI, Web Service, Posted by log1d_ts







