Skip to main content

CRE-2025-0072

Redis Out-Of-Memory → Persistence Crash → Replica/ACL Write FailuresCritical
Impact: 10/10
Mitigation: 7/10

CRE-2025-0072View on GitHub

Description

Detects a cascade of critical Redis failure modes in a single session:

  • Redis refuses writes when maxmemory is exceeded (OOM).
  • RDB snapshot (BGSAVE) fails (MISCONF) due to simulated full-disk.
  • Replica refuses writes (READONLY).
  • ACL denies a write (NOPERM).

Cause

ROOT CAUSES:

  • Redis is configured with `maxmemory 100mb` + `noeviction`.
  • A Lua EVAL pushes memory usage over that cap → OOM refusal.
  • A manual BGSAVE is then forced while disk is full → MISCONF.
  • The instance is switched to replica mode → READONLY on write.
  • A read-only ACL user attempts a SET → NOPERM.

Mitigation

IMMEDIATE:

  • Check Redis memory usage: `INFO memory`
  • Inspect `maxmemory` / `maxmemory-policy`: `CONFIG GET maxmemory maxmemory-policy`
  • Free up memory or increase `maxmemory`.
  • Clear disk space so BGSAVE can succeed (remove dummy “/data/filler”).
  • Restart Redis if it was killed.

RECOVERY ACTIONS (15-60 minutes):

  • Restore from last valid RDB/AOF snapshot.
  • Change eviction policy (e.g. `volatile-lru`).
  • Monitor memory, disk, and persist errors (e.g. via RedisExporter → Prometheus alerts).
  • Scale out / shard large keys to avoid a single Redis hitting 100 MB.

PREVENTION STRATEGIES:

  • Avoid `noeviction` unless absolutely needed; use a TTL/eviction policy.
  • Ensure persistence disk has enough headroom.
  • Configure `stop-writes-on-bgsave-error` carefully.
  • Use ACLs and replica roles deliberately, but monitor for “READONLY” or “NOPERM” events.

References