Skip to main content

CRE-2025-0072

Redis Out-Of-Memory → Persistence Crash → Replica/ACL Write FailuresCritical
Impact: 10/10
Mitigation: 7/10

CRE-2025-0072View on GitHub

Description

Detects a cascade of critical Redis failure modes in a single session: - Redis refuses writes when maxmemory is exceeded (OOM). - RDB snapshot (BGSAVE) fails (MISCONF) due to simulated full-disk. - Replica refuses writes (READONLY). - ACL denies a write (NOPERM).

Mitigation

IMMEDIATE: - Check Redis memory usage: `INFO memory` - Inspect `maxmemory` / `maxmemory-policy`: `CONFIG GET maxmemory maxmemory-policy` - Free up memory or increase `maxmemory`. - Clear disk space so BGSAVE can succeed (remove dummy “/data/filler”). - Restart Redis if it was killed. RECOVERY ACTIONS (15-60 minutes): - Restore from last valid RDB/AOF snapshot. - Change eviction policy (e.g. `volatile-lru`). - Monitor memory, disk, and persist errors (e.g. via RedisExporter → Prometheus alerts). - Scale out / shard large keys to avoid a single Redis hitting 100 MB. PREVENTION STRATEGIES: - Avoid `noeviction` unless absolutely needed; use a TTL/eviction policy. - Ensure persistence disk has enough headroom. - Configure `stop-writes-on-bgsave-error` carefully. - Use ACLs and replica roles deliberately, but monitor for “READONLY” or “NOPERM” events.

References