Skip to main content

CRE-2025-0107

Redpanda Node Missing State Files on StartupMedium
Impact: 3/10
Mitigation: 6/10

CRE-2025-0107View on GitHub

Description

Detects when a Redpanda node starts up but cannot find key state files, such as the key-value store snapshot or configuration cache. This is normal behavior for a brand-new node starting for the first time but can indicate a problem (like a cleared or misconfigured volume) if it occurs on an existing node that is expected to have state.\n

Mitigation

- **On first boot:** No action is required. This is expected.\n- **On an unexpected restart of an existing node:**\n 1. Immediately investigate why the data volume was cleared. Check for manual errors, incorrect automation scripts, or issues with the underlying storage infrastructure.\n 2. Monitor the cluster health and resource utilization (`CPU`, `network`, `disk I/O`) on the other nodes, as they will be under high load to replicate data to the new empty node.\n 3. Consider throttling the recovery process (`raft_learner_recovery_rate` setting) if the replication load is impacting production traffic.\n\nPREVENTIVE MEASURES:\n - Use persistent, reliable storage for Redpanda's data directory.\n - Implement strict access controls and change management procedures for the Redpanda data volumes to prevent accidental deletion.\n - Use `rpk` to decommission a node properly before removing its data volumes.\n

References