API Rate Limiting Algorithms [2026 Dev Cheat Sheet]
Bottom Line
Use Token Bucket for burst-friendly APIs, Leaky Bucket for steady downstream protection, and Fixed Window only when simplicity outweighs edge-case fairness.
Key Takeaways
- ›Token Bucket allows short bursts while keeping a hard long-term refill rate.
- ›Leaky Bucket smooths traffic best, but it can add queueing latency under load.
- ›Fixed Window is easiest to ship, but boundary spikes can double effective traffic.
- ›Distributed rate limits fail on identity, clock, and storage design before math fails.
API rate limiting looks simple until real traffic shows up: bursts, retries, uneven tenants, and horizontally scaled gateways all stress the model. The three algorithms most teams compare are Token Bucket, Leaky Bucket, and Fixed Window. This cheat sheet is built for fast engineering decisions, with copy-ready commands, practical configuration patterns, and the tradeoffs that matter when you are protecting APIs instead of explaining them on a whiteboard.
- Token Bucket is the usual default for public APIs because it tolerates bursts without losing long-term control.
- Leaky Bucket wins when you need smooth output rates for workers, queues, or fragile backends.
- Fixed Window is operationally simple, but window boundaries can create unfair spikes.
- Most production incidents come from bad key design, unsynced counters, or weak observability, not from the algorithm name.
| Dimension | Token Bucket | Leaky Bucket | Fixed Window | Edge |
|---|---|---|---|---|
| Burst tolerance | High up to bucket capacity | Low to medium | High near reset boundaries | Token Bucket |
| Output smoothness | Moderate | High, steady drain | Low | Leaky Bucket |
| Implementation simplicity | Moderate | Moderate | High | Fixed Window |
| Fairness over time | Good | Good | Weak at window edges | Token Bucket / Leaky Bucket |
| Latency behavior | Reject or delay on empty bucket | Often queues first, then rejects | Hard allow or reject | Depends on product goal |
| Best fit | Public APIs, tiered quotas | Traffic shaping, worker protection | Simple per-minute caps | No single winner |
At a Glance
Bottom Line
Token Bucket is the default choice for most API gateways. Use Leaky Bucket when backend stability matters more than burst tolerance, and keep Fixed Window for simpler limits where boundary unfairness is acceptable.
Mental model
- Token Bucket: tokens refill at a steady rate; each request spends one or more tokens.
- Leaky Bucket: incoming requests enter a bucket that drains at a fixed rate.
- Fixed Window: count requests in a time bucket such as 1 minute, then reset the counter.
What teams usually get wrong
- They rate limit by IP when the real fairness unit is tenant, API key, user, or route.
- They use a global limit but forget expensive endpoints need tighter per-route controls.
- They ship counters without exposing Retry-After and remaining-budget headers.
- They pick a simple algorithm, then quietly rebuild it later to handle distributed state.
When to Choose Each Algorithm
Choose Token Bucket when:
- Your API should absorb short spikes such as page loads, retries, or batch fan-out.
- You sell usage tiers and want a clear refill rate plus a clear burst allowance.
- You want better fairness than a simple window without building a more complex sliding model.
Choose Leaky Bucket when:
- Your backend, queue consumer, or third-party dependency needs a steady request drain.
- You prefer smoothing traffic over maximizing user-visible burst capacity.
- You are comfortable with queueing behavior and possible added latency under load.
Choose Fixed Window when:
- You need the fastest implementation for a low-risk internal or admin API.
- Your traffic pattern is predictable enough that boundary spikes are acceptable.
- You plan to graduate later to Token Bucket or a sliding approach once usage grows.
Live Search and Shortcuts
For a cheat-sheet style page, a lightweight client-side filter is enough. Tag each block with data-cheat-item, let users press / to focus search, and keep the results purely local for instant filtering.
Live search filter
const input = document.querySelector('[data-filter-input]');
const items = [...document.querySelectorAll('[data-cheat-item]')];
function applyFilter(query) {
const q = query.trim().toLowerCase();
items.forEach((item) => {
const haystack = item.textContent.toLowerCase();
const match = q === '' || haystack.includes(q);
item.hidden = !match;
});
const url = new URL(window.location.href);
if (q) url.searchParams.set('q', q);
else url.searchParams.delete('q');
history.replaceState(null, '', url);
}
input.addEventListener('input', (event) => {
applyFilter(event.target.value);
});
window.addEventListener('keydown', (event) => {
if (event.key === '/') {
event.preventDefault();
input.focus();
}
});
const params = new URL(window.location.href).searchParams;
const initialQuery = params.get('q') || '';
input.value = initialQuery;
applyFilter(initialQuery);
Keyboard shortcuts
| Shortcut | Action | Why it matters |
|---|---|---|
/ |
Focus search | Fastest way to jump to a command, header, or algorithm note. |
Esc |
Clear focus | Lets users return to reading without touching the mouse. |
j / k |
Next or previous section | Useful for dense reference content with a sticky table of contents. |
c |
Copy active code block | Pairs well with automatic copy buttons on every <pre>. |
Commands by Purpose
These commands are for verification, not synthetic benchmarks. Use -i to inspect headers, -D to dump response headers, and -P to parallelize burst traffic tests.
Inspect rate-limit headers
curl -i https://api.example.com/v1/search
curl -s -D - -o /dev/null https://api.example.com/v1/search \
| grep -iE 'x-ratelimit|retry-after'
Simulate a burst
seq 1 20 | xargs -P 20 -I{} curl -s -o /dev/null -w '%{http_code}\n' \
https://api.example.com/v1/search | sort | uniq -c
Watch refill behavior over time
for n in 1 2 3 4 5 6 7 8 9 10; do
date -u '+%H:%M:%S'
curl -s -D - -o /dev/null https://api.example.com/v1/search \
| grep -iE 'x-ratelimit-remaining|retry-after'
sleep 1
done
Compare multiple identities
curl -s -H 'Authorization: Bearer tenant-a-key' -D - -o /dev/null \
https://api.example.com/v1/search
curl -s -H 'Authorization: Bearer tenant-b-key' -D - -o /dev/null \
https://api.example.com/v1/search
Configuration Patterns
Keep configs explicit about four things: identity key, refill or drain behavior, burst allowance, and response headers. If those are vague, operators will reverse-engineer policy from production logs.
Token Bucket baseline
{
"algorithm": "token_bucket",
"key": "tenant_id:route",
"capacity": 120,
"refill_tokens": 2,
"refill_interval": "1s",
"cost_by_method": {
"GET": 1,
"POST": 2
},
"headers": true
}
- Map
capacityto tolerated burst size. - Map refill values to the steady-state rate you want to sell or enforce.
- Use request cost when some routes are materially more expensive than others.
Leaky Bucket baseline
{
"algorithm": "leaky_bucket",
"key": "api_key",
"queue_capacity": 200,
"drain_rate": 25,
"drain_interval": "1s",
"overflow_policy": "reject"
}
- Use queue capacity to cap backlog growth before latency becomes product debt.
- Pick
rejectordelaybased on whether clients can tolerate waiting. - Drain rate should reflect backend safety, not just gateway throughput.
Fixed Window baseline
{
"algorithm": "fixed_window",
"key": "user_id",
"limit": 100,
"window": "1m",
"headers": true,
"reset_header": true
}
- Use per-route overrides for login, search, export, and webhook endpoints.
- Keep window sizes human-readable for support teams and customer-facing docs.
- If traffic is spiky, plan a migration path before customers discover edge bursts.
Advanced Usage and Gotchas
Layer limits instead of forcing one global rule
- Apply a global tenant limit, then narrower route limits for expensive endpoints.
- Separate write-heavy limits from read-heavy limits when backend cost differs sharply.
- Use concurrency limits alongside rate limits for long-running operations.
Design identities before counters
- Prefer stable business identities such as tenant ID or API key over raw IP address.
- Store enough metadata to debug collisions, but redact sensitive values in logs and dashboards.
- If you need to sanitize exported traces or support samples, the Data Masking Tool is a practical fit for scrubbing tokens, emails, and customer identifiers.
Distributed systems change the failure mode
- Cross-node counters introduce consistency tradeoffs that do not exist in single-process demos.
- Clock drift matters when resets or refill schedules are time-based.
- Hot keys appear fast on shared infrastructure, especially when one tenant dominates traffic.
Expose policy clearly to clients
- Return common headers such as current limit, remaining budget, reset time, and Retry-After.
- Document whether rejected requests are safe to retry and how backoff should work.
- Keep examples synchronized with production behavior or customers will code around the docs.
Frequently Asked Questions
What is the best rate limiting algorithm for a public API? +
Why is Fixed Window considered unfair at scale? +
When should I use Leaky Bucket instead of Token Bucket? +
Should API rate limits be keyed by IP address or API key? +
tenant_id, user_id, or API key rather than raw IP address. IP-based limiting is still useful as a coarse abuse control, but it is a weak primary key for shared networks, NAT, proxies, and enterprise traffic.Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.