Cache Stampede Prevention

Prevent multiple clients from simultaneously regenerating an expired cache key using locking, probabilistic early refresh, or request coalescing.

A cache stampede (also called "thundering herd") occurs when a popular cache key expires, causing many concurrent requests to simultaneously query the database to regenerate the value. This can overwhelm the database and cause cascading failures.

The Problem

Consider a cache key accessed 10,000 times per second. When it expires:

  1. Thousands of requests arrive within milliseconds
  2. All see a cache miss
  3. All query the database simultaneously
  4. The database becomes overloaded
  5. Response times spike across the entire system

This is particularly dangerous for keys with expensive regeneration costs (complex queries, aggregations, external API calls).

Solution 1: Probabilistic Early Expiration (X-Fetch)

This technique decouples logical expiration from physical expiration. The idea is to have a small probability that any request will refresh the cache before it actually expires.

How It Works

Set the physical TTL longer than needed (e.g., 1 hour). As requests arrive, each one calculates whether it should proactively refresh based on how close the key is to expiration and a random factor.

When the key is fresh, the probability of refresh is near zero. As expiration approaches, the probability increases. Eventually, one "lucky" request triggers a refresh while all others continue serving the slightly stale data.

The Algorithm

The decision to refresh is based on:

current_time - (expiry_time - delta * beta * log(random()))

Where: - delta is the time needed to regenerate the value - beta is a tuning parameter (typically 1.0) - random() produces a value between 0 and 1

This formula ensures that as the gap to expiration shrinks, the probability of any single request triggering a refresh increases.

Outcome

Instead of thousands of simultaneous database queries at expiration, a single request refreshes the cache moments before expiration while others continue serving cached data. The traffic spike is smoothed away.

Solution 2: Mutex Locking

A deterministic approach where only one process is allowed to regenerate the cache.

How It Works

When a cache miss occurs:

  1. Attempt to acquire a lock using SET lock:mykey <token> NX PX 5000
  2. If the lock is acquired, query the database and update the cache
  3. Other requests either wait and poll, or return a stale/default value
  4. Release the lock after cache is populated

Redis Commands

SET lock:popular_key abc123 NX PX 5000

The NX ensures only one client acquires the lock. The PX 5000 sets a 5-second timeout to prevent deadlocks if the lock holder crashes.

Handling Waiting Clients

Clients that fail to acquire the lock have options:

Releasing the Lock

The lock must be released safely. Simply calling DEL risks deleting another client's lock if yours expired. Use a Lua script that checks the token matches before deleting:

if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
end
return 0

Comparison

Approach Complexity Database Queries Staleness
No protection Low N (one per request) None
X-Fetch Medium 1-2 Brief window
Mutex lock Higher Exactly 1 None

Recommendation

For most applications, start with mutex locking—it's easier to reason about and provides strong protection. Consider X-Fetch for extremely high-traffic keys where even brief lock contention is problematic.

Some systems combine both: use X-Fetch for normal operation, with a mutex fallback for hard cache misses.


← Back to Index | Markdown source