What is the best rate limiting algorithm for a public API?

For most public APIs, Token Bucket is the safest default because it supports short bursts while still enforcing a predictable long-term rate. It is easier to explain to customers than queue-based shaping, and it avoids the worst fairness problems of Fixed Window.

Why is Fixed Window considered unfair at scale?

Fixed Window resets counters on hard boundaries, so clients can send a full limit at the end of one window and another full limit at the start of the next. That edge case can create a short spike far above the intended steady rate, which is why boundary-sensitive systems often move away from it.

When should I use Leaky Bucket instead of Token Bucket?

Use Leaky Bucket when downstream services need smooth, predictable throughput more than clients need burst flexibility. It is especially useful when your system would rather queue or steadily drain work than let request spikes hit a fragile dependency.

Should API rate limits be keyed by IP address or API key?

Key by the identity that matches product fairness, usually tenant_id, user_id, or API key rather than raw IP address. IP-based limiting is still useful as a coarse abuse control, but it is a weak primary key for shared networks, NAT, proxies, and enterprise traffic.

API Rate Limiting Algorithms [2026 Dev Cheat Sheet]

API rate limiting looks simple until real traffic shows up: bursts, retries, uneven tenants, and horizontally scaled gateways all stress the model. The three algorithms most teams compare are Token Bucket, Leaky Bucket, and Fixed Window. This cheat sheet is built for fast engineering decisions, with copy-ready commands, practical configuration patterns, and the tradeoffs that matter when you are protecting APIs instead of explaining them on a whiteboard.

Token Bucket is the usual default for public APIs because it tolerates bursts without losing long-term control.
Leaky Bucket wins when you need smooth output rates for workers, queues, or fragile backends.
Fixed Window is operationally simple, but window boundaries can create unfair spikes.
Most production incidents come from bad key design, unsynced counters, or weak observability, not from the algorithm name.

Dimension	Token Bucket	Leaky Bucket	Fixed Window	Edge
Burst tolerance	High up to bucket capacity	Low to medium	High near reset boundaries	Token Bucket
Output smoothness	Moderate	High, steady drain	Low	Leaky Bucket
Implementation simplicity	Moderate	Moderate	High	Fixed Window
Fairness over time	Good	Good	Weak at window edges	Token Bucket / Leaky Bucket
Latency behavior	Reject or delay on empty bucket	Often queues first, then rejects	Hard allow or reject	Depends on product goal
Best fit	Public APIs, tiered quotas	Traffic shaping, worker protection	Simple per-minute caps	No single winner

At a Glance

Bottom Line

Token Bucket is the default choice for most API gateways. Use Leaky Bucket when backend stability matters more than burst tolerance, and keep Fixed Window for simpler limits where boundary unfairness is acceptable.

Mental model

Token Bucket: tokens refill at a steady rate; each request spends one or more tokens.
Leaky Bucket: incoming requests enter a bucket that drains at a fixed rate.
Fixed Window: count requests in a time bucket such as 1 minute, then reset the counter.

What teams usually get wrong

They rate limit by IP when the real fairness unit is tenant, API key, user, or route.
They use a global limit but forget expensive endpoints need tighter per-route controls.
They ship counters without exposing Retry-After and remaining-budget headers.
They pick a simple algorithm, then quietly rebuild it later to handle distributed state.

When to Choose Each Algorithm

Choose Token Bucket when:

Your API should absorb short spikes such as page loads, retries, or batch fan-out.
You sell usage tiers and want a clear refill rate plus a clear burst allowance.
You want better fairness than a simple window without building a more complex sliding model.

Choose Leaky Bucket when:

Your backend, queue consumer, or third-party dependency needs a steady request drain.
You prefer smoothing traffic over maximizing user-visible burst capacity.
You are comfortable with queueing behavior and possible added latency under load.

Choose Fixed Window when:

You need the fastest implementation for a low-risk internal or admin API.
Your traffic pattern is predictable enough that boundary spikes are acceptable.
You plan to graduate later to Token Bucket or a sliding approach once usage grows.

Watch out: Fixed Window can effectively allow nearly 2x the intended rate across a window boundary, such as 100 requests in the last second of one minute and 100 more in the first second of the next.

Live Search and Shortcuts

For a cheat-sheet style page, a lightweight client-side filter is enough. Tag each block with data-cheat-item, let users press / to focus search, and keep the results purely local for instant filtering.

Live search filter

const input = document.querySelector('[data-filter-input]');
const items = [...document.querySelectorAll('[data-cheat-item]')];

function applyFilter(query) {
  const q = query.trim().toLowerCase();

  items.forEach((item) => {
    const haystack = item.textContent.toLowerCase();
    const match = q === '' || haystack.includes(q);
    item.hidden = !match;
  });

  const url = new URL(window.location.href);
  if (q) url.searchParams.set('q', q);
  else url.searchParams.delete('q');
  history.replaceState(null, '', url);
}

input.addEventListener('input', (event) => {
  applyFilter(event.target.value);
});

window.addEventListener('keydown', (event) => {
  if (event.key === '/') {
    event.preventDefault();
    input.focus();
  }
});

const params = new URL(window.location.href).searchParams;
const initialQuery = params.get('q') || '';
input.value = initialQuery;
applyFilter(initialQuery);

Keyboard shortcuts

Shortcut	Action	Why it matters
`/`	Focus search	Fastest way to jump to a command, header, or algorithm note.
`Esc`	Clear focus	Lets users return to reading without touching the mouse.
`j` / `k`	Next or previous section	Useful for dense reference content with a sticky table of contents.
`c`	Copy active code block	Pairs well with automatic copy buttons on every `<pre>`.

Commands by Purpose

These commands are for verification, not synthetic benchmarks. Use -i to inspect headers, -D to dump response headers, and -P to parallelize burst traffic tests.

Inspect rate-limit headers

curl -i https://api.example.com/v1/search

curl -s -D - -o /dev/null https://api.example.com/v1/search \
  | grep -iE 'x-ratelimit|retry-after'

Simulate a burst

seq 1 20 | xargs -P 20 -I{} curl -s -o /dev/null -w '%{http_code}\n' \
  https://api.example.com/v1/search | sort | uniq -c

Watch refill behavior over time

for n in 1 2 3 4 5 6 7 8 9 10; do
  date -u '+%H:%M:%S'
  curl -s -D - -o /dev/null https://api.example.com/v1/search \
    | grep -iE 'x-ratelimit-remaining|retry-after'
  sleep 1
done

Compare multiple identities

curl -s -H 'Authorization: Bearer tenant-a-key' -D - -o /dev/null \
  https://api.example.com/v1/search

curl -s -H 'Authorization: Bearer tenant-b-key' -D - -o /dev/null \
  https://api.example.com/v1/search

Configuration Patterns

Keep configs explicit about four things: identity key, refill or drain behavior, burst allowance, and response headers. If those are vague, operators will reverse-engineer policy from production logs.

Token Bucket baseline

{
  "algorithm": "token_bucket",
  "key": "tenant_id:route",
  "capacity": 120,
  "refill_tokens": 2,
  "refill_interval": "1s",
  "cost_by_method": {
    "GET": 1,
    "POST": 2
  },
  "headers": true
}

Map capacity to tolerated burst size.
Map refill values to the steady-state rate you want to sell or enforce.
Use request cost when some routes are materially more expensive than others.

Leaky Bucket baseline

{
  "algorithm": "leaky_bucket",
  "key": "api_key",
  "queue_capacity": 200,
  "drain_rate": 25,
  "drain_interval": "1s",
  "overflow_policy": "reject"
}

Use queue capacity to cap backlog growth before latency becomes product debt.
Pick reject or delay based on whether clients can tolerate waiting.
Drain rate should reflect backend safety, not just gateway throughput.

Fixed Window baseline

{
  "algorithm": "fixed_window",
  "key": "user_id",
  "limit": 100,
  "window": "1m",
  "headers": true,
  "reset_header": true
}

Use per-route overrides for login, search, export, and webhook endpoints.
Keep window sizes human-readable for support teams and customer-facing docs.
If traffic is spiky, plan a migration path before customers discover edge bursts.

Advanced Usage and Gotchas

Layer limits instead of forcing one global rule

Apply a global tenant limit, then narrower route limits for expensive endpoints.
Separate write-heavy limits from read-heavy limits when backend cost differs sharply.
Use concurrency limits alongside rate limits for long-running operations.

Design identities before counters

Prefer stable business identities such as tenant ID or API key over raw IP address.
Store enough metadata to debug collisions, but redact sensitive values in logs and dashboards.
If you need to sanitize exported traces or support samples, the Data Masking Tool is a practical fit for scrubbing tokens, emails, and customer identifiers.

Distributed systems change the failure mode

Cross-node counters introduce consistency tradeoffs that do not exist in single-process demos.
Clock drift matters when resets or refill schedules are time-based.
Hot keys appear fast on shared infrastructure, especially when one tenant dominates traffic.

Expose policy clearly to clients

Return common headers such as current limit, remaining budget, reset time, and Retry-After.
Document whether rejected requests are safe to retry and how backoff should work.
Keep examples synchronized with production behavior or customers will code around the docs.

Pro tip: If your product has both interactive users and batch clients, ship different limits for each class instead of pretending one number is fair to both.

API Rate Limiting Algorithms [2026 Dev Cheat Sheet]

Bottom Line

At a Glance

Bottom Line

Mental model

What teams usually get wrong

When to Choose Each Algorithm

Choose Token Bucket when:

Choose Leaky Bucket when:

Choose Fixed Window when:

Live Search and Shortcuts

Live search filter

Keyboard shortcuts

Commands by Purpose

Inspect rate-limit headers

Simulate a burst

Watch refill behavior over time

Compare multiple identities

Configuration Patterns

Token Bucket baseline

Leaky Bucket baseline

Fixed Window baseline

Advanced Usage and Gotchas

Layer limits instead of forcing one global rule

Design identities before counters

Distributed systems change the failure mode

Expose policy clearly to clients

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox