Count unique items with HyperLogLog (12KB fixed), test set membership with Bloom filters, or estimate frequencies with Count-Min Sketch—trading small accuracy loss for massive memory savings.
When datasets grow to billions of items, exact counting and membership testing become prohibitively expensive. Redis provides probabilistic data structures that offer 1000x memory reduction with configurable error rates.
HyperLogLog (HLL) estimates the cardinality (count of unique items) in a set. A standard Set storing 1 billion unique user IDs would require roughly 12GB of RAM. An HLL requires exactly 12KB regardless of how many elements you add, with about 0.81% standard error.
HLL hashes each element and observes the maximum number of leading zeros in the resulting binary string. The probability of seeing N leading zeros is 1/2^N. By tracking the maximum leading zeros seen, HLL estimates how many unique elements have been added.
To reduce variance, HLL uses multiple "registers" (buckets) and combines their estimates using harmonic mean.
Add elements:
PFADD visitors:2024-01-30 "user:123" "user:456" "user:789"
Get the estimated count:
PFCOUNT visitors:2024-01-30
Merge multiple HLLs (e.g., daily into weekly):
PFMERGE visitors:week visitors:2024-01-28 visitors:2024-01-29 visitors:2024-01-30
HLL provides an estimate with ~0.81% standard error. For 1 million actual unique items, the estimate will typically be between 991,900 and 1,008,100.
A Bloom Filter answers "Is this item in the set?" with possible false positives but zero false negatives. If it says "no," the item is definitely not in the set. If it says "yes," the item is probably in the set.
Items are hashed by multiple hash functions, each setting a bit in a shared bit array. To check membership, compute the same hashes—if all corresponding bits are set, the item might be present. If any bit is zero, the item is definitely absent.
Create a filter with 1% false positive rate for 1 million items:
BF.RESERVE usernames 0.01 1000000
Add items:
BF.ADD usernames "john_doe"
Check membership:
BF.EXISTS usernames "john_doe"
Without a Bloom Filter, attackers can flood your system with requests for non-existent keys. Each request misses the cache and hits the database. A Bloom Filter lets you reject these requests immediately—if the filter says the key doesn't exist, don't bother checking the cache or database.
Similar to Bloom Filters but supports deletion. Uses Cuckoo Hashing to store fingerprints of items.
CF.RESERVE sessions 1000000
CF.ADD sessions "session:abc123"
CF.EXISTS sessions "session:abc123"
CF.DEL sessions "session:abc123"
| Feature | Bloom Filter | Cuckoo Filter |
|---|---|---|
| Deletion | Not supported | Supported |
| Space efficiency | Good | Better for low FP rates |
| Insert performance | Faster | Slightly slower |
| Lookup performance | Slightly slower | Faster |
Use Cuckoo when you need to remove items from the filter (e.g., tracking active sessions that expire).
Computing exact percentiles requires sorting all data points—impractical for streaming data. T-Digest maintains a compressed representation that provides accurate percentile estimates, especially at the tails (p99, p99.9).
Create a T-Digest:
TDIGEST.CREATE latencies
Add observations:
TDIGEST.ADD latencies 45.2 89.1 12.5 156.7 23.4
Get percentiles:
TDIGEST.QUANTILE latencies 0.5 0.95 0.99
This returns the estimated 50th, 95th, and 99th percentile values.
Estimates how many times each item has been seen, with possible over-counting but never under-counting.
CMS.INITBYDIM page_views 2000 5
CMS.INCRBY page_views "/home" 1 "/about" 1
CMS.QUERY page_views "/home"
Useful for estimating frequencies in streaming data where exact counts are impractical.
Track the K most frequent items in a stream without storing all items:
TOPK.RESERVE trending 10 50 3 0.9
TOPK.ADD trending "topic:redis" "topic:python" "topic:redis" "topic:go"
TOPK.LIST trending
Returns the top 10 most frequently added items. The algorithm may temporarily misrank items but converges to accuracy over time. Ideal for trending topics, popular products, or frequent errors.
| Need | Structure | Error Type |
|---|---|---|
| Count unique items | HyperLogLog | Cardinality ±0.81% |
| Check set membership | Bloom/Cuckoo Filter | False positives possible |
| Calculate percentiles | T-Digest | Percentile estimates |
| Count item frequency | Count-Min Sketch | Over-counting possible |