Linux Kernel Tuning for Redis

Configure Linux kernel parameters to prevent latency spikes, persistence failures, and connection drops in production Redis deployments—essential settings that override defaults hostile to in-memory databases.

Standard Linux kernel defaults are optimized for general workloads, not in-memory databases. Production Redis instances require specific kernel tuning to achieve consistent sub-millisecond latency.

Transparent Huge Pages (THP)

THP is the most critical setting. When enabled, it causes severe latency spikes during persistence operations.

The Problem:

Redis uses fork() to create child processes for RDB snapshots and AOF rewrites. Linux uses copy-on-write (COW) to share memory between parent and child. With THP enabled, the kernel uses 2MB pages instead of 4KB pages. When the parent modifies a single byte, the kernel must copy the entire 2MB page—a 500x amplification.

Symptoms: - Latency spikes of 10-100x during BGSAVE or BGREWRITEAOF - Memory usage doubling temporarily during persistence - Client timeouts during background saves

Fix (required):

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Make it permanent in /etc/rc.local or via systemd:

[Service]
ExecStartPre=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'

Redis logs a warning at startup if THP is enabled. Never ignore this warning.

Memory Overcommit

The fork() system call can fail if the kernel thinks there's insufficient memory.

The Problem:

When Redis forks, the kernel may pessimistically assume the child needs as much memory as the parent. On a 32GB instance with 30GB used, the fork can fail even though COW means minimal actual memory is needed.

Fix:

sysctl vm.overcommit_memory=1

This tells the kernel to always allow memory allocation requests, trusting that COW will work correctly. Add to /etc/sysctl.conf:

vm.overcommit_memory=1

Swappiness

Redis data in swap is a catastrophic failure mode.

The Problem:

If any Redis memory pages are swapped to disk, access times go from nanoseconds to milliseconds—a 1,000,000x slowdown. A single swapped page can cause client timeouts.

Fix:

sysctl vm.swappiness=1

Setting to 0 is not recommended as it can cause OOM kills. Setting to 1 makes swapping extremely unlikely while still allowing the kernel to swap in desperate situations:

vm.swappiness=1

TCP Stack Tuning

Under high connection rates, default TCP settings cause connection failures.

Connection Backlog:

The listen queue for incoming connections defaults to 128. During traffic spikes, new connections are dropped before Redis sees them.

sysctl net.core.somaxconn=65536
sysctl net.ipv4.tcp_max_syn_backlog=65536

Also set in redis.conf:

tcp-backlog 65536

The effective backlog is min(tcp-backlog, somaxconn).

TIME_WAIT Connections:

With many short-lived connections, ports can be exhausted by TIME_WAIT sockets:

sysctl net.ipv4.tcp_tw_reuse=1

TCP Keepalive:

For long-lived connections through load balancers:

tcp-keepalive 300

Redis will send keepalives every 300 seconds to prevent idle connection termination.

File Descriptor Limits

Each Redis connection consumes a file descriptor.

Check Current Limits:

ulimit -n

Set Appropriately:

In /etc/security/limits.conf:

redis soft nofile 65536
redis hard nofile 65536

Or in the systemd service file:

[Service]
LimitNOFILE=65536

Redis's maxclients is limited by available file descriptors minus ~32 for internal use.

Summary Configuration

Add to /etc/sysctl.conf:

# Redis kernel tuning
vm.overcommit_memory=1
vm.swappiness=1
net.core.somaxconn=65536
net.ipv4.tcp_max_syn_backlog=65536
net.ipv4.tcp_tw_reuse=1

Disable THP at boot:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Apply without reboot:

sysctl -p

Verification

Check Redis startup logs for warnings:

grep -i warning /var/log/redis/redis-server.log

Common warnings to address: - "THP is enabled" - disable transparent huge pages - "overcommit_memory is set to 0" - set to 1 - "somaxconn is lower than" - increase TCP backlog

Memory Allocator

Redis uses jemalloc by default, which is well-suited for its allocation patterns. Key behaviors:

High-Water Mark: After deleting large amounts of data, jemalloc retains memory to satisfy future allocations. The process RSS reflects peak usage, not current dataset size.

Fragmentation: Monitor mem_fragmentation_ratio in INFO MEMORY: - Below 1.0: Redis is using swap (critical) - 1.0-1.5: Normal - Above 1.5: Significant fragmentation

If fragmentation is high:

MEMORY PURGE

This forces jemalloc to return memory to the OS. In Redis 4.0+, active defragmentation can be enabled:

activedefrag yes

Source

Shopify RedisDays presentation, Redis documentation, and production post-mortems from Netflix, Uber, and AWS ElastiCache teams.


← Back to Index | Markdown source