LLM10:2025 - Unbounded Consumption

Unbounded Consumption is the tenth risk in the OWASP Top 10 for LLM Applications 2025. It occurs when resource usage spirals out of control through excessively large inputs, repeated requests, or recursive operations, causing performance degradation, denial of service, or unexpected costs that can exhaust budgets and destabilize production systems.

Overview

LLM inference is computationally expensive. Every API call consumes tokens, compute time, and money. Without explicit resource controls, a single malicious user, a misconfigured agent loop, or a burst of legitimate traffic can generate runaway costs, degrade service for all users, or trigger a complete denial of service. Unlike traditional web endpoints where resource consumption is relatively predictable, LLM calls have highly variable costs depending on input length, output length, model selection, and the number of completions requested. Applications that expose user-controlled parameters such as max_tokens, temperature, or n without server-side bounds hand attackers direct control over resource consumption. Recursive AI agent architectures amplify the risk further. An agent that calls itself can generate exponentially growing API costs in seconds. This category also covers input-side abuse: attackers submitting excessively large context windows, documents, or conversation histories designed to maximize per-request compute costs.

What Radar Detects

LLM API calls without token or context length limits.** API invocations that do not set max_tokens, max_completion_tokens, or equivalent parameters, allowing the model to generate arbitrarily long responses that consume excessive compute and budget.
Missing rate limiting on endpoints that trigger LLM inference.** Application routes that invoke LLM APIs without per-user, per-session, or per-endpoint rate limits, enabling a single client to submit unlimited requests.
No timeout configuration on LLM API calls.** HTTP client or SDK configurations that lack explicit timeout settings for LLM provider calls, allowing requests to hang indefinitely during provider outages or slow responses.
Missing cost controls on LLM usage.** Application architectures that invoke LLM APIs without budget limits, spending alerts, or usage tracking mechanisms to detect and halt unexpected cost spikes.
Recursive or looping LLM calls without depth or iteration limits.** AI agent implementations that call themselves or other agents in loops without enforcing a maximum recursion depth, iteration count, or total token budget per session.
User-controlled parameters that directly affect LLM resource consumption without server-side bounds.** Request parameters such as max_tokens, temperature, top_p, or n (number of completions) that are passed directly from user input to the LLM API without server-side validation and capping.
Missing retry backoff on LLM API calls.** Retry logic that resubmits failed LLM requests immediately or at fixed intervals without exponential backoff and jitter, amplifying resource consumption during provider outages.
Missing input size validation before LLM processing.** Endpoints that accept user-submitted text, documents, or conversation histories without validating their size before passing them to the LLM, allowing attackers to submit excessively large inputs that maximize compute costs.

Cost = Security

Unlike traditional denial-of-service attacks that degrade availability, unbounded LLM consumption also has a direct financial dimension. A single misconfigured endpoint or runaway agent loop can generate thousands of dollars in API costs within minutes.

CWE-400 (Uncontrolled Resource Consumption), CWE-770 (Allocation of Resources Without Limits), CWE-834 (Excessive Iteration).

See the CWE Reference for details.

Overlap with OWASP Top 10 Web

Unbounded Consumption relates to A06:2025 Insecure Design in the traditional OWASP Top 10. Missing rate limiting, absent resource controls, and unbounded loops are design-level issues that cannot be patched after deployment. They must be addressed in the application architecture. The same principles that prevent traditional denial-of-service attacks apply to LLM-facing endpoints, with the added dimension of direct financial cost per request.

Prevention

Set explicit max_tokens or equivalent limits on every LLM API call to bound the maximum response length and cost per request.
Implement per-user, per-session, and per-endpoint rate limiting on all routes that trigger LLM inference.
Configure explicit timeouts on all LLM API calls (both connection timeouts and read timeouts) to prevent requests from hanging indefinitely.
Set budget alerts and spending caps at the provider level and within the application to detect and halt unexpected cost spikes before they escalate.
Enforce maximum recursion depth and total token budgets for AI agent loops. Terminate agent sessions that exceed predefined iteration or cost thresholds.
Validate and bound all user-controlled LLM parameters server-side. Enforce maximum values for max_tokens, n, and other cost-affecting parameters regardless of client input.
Implement exponential backoff with jitter for retry logic on failed LLM API calls to prevent resource amplification during provider outages.
Validate input size before LLM processing. Enforce maximum character counts, document sizes, and conversation history lengths to prevent resource abuse through oversized inputs.
Monitor LLM usage metrics (tokens consumed, requests per user, cost per session) and implement automated circuit breakers that throttle or halt processing when thresholds are exceeded.

LLM10:2025 - Unbounded Consumption

Overview

What Radar Detects

Overlap with OWASP Top 10 Web

Prevention

Next Steps

Previous: LLM09:2025

2021 Reference

OWASP Top 10 Overview

Overview

What Radar Detects

Related CWEs

Overlap with OWASP Top 10 Web

Prevention

Next Steps