CRE-2025-0080
Redpanda High Severity IssuesHighMitigation: 9/10
CRE-2025-0080View on GitHub
Description
Detects when Redpanda hits any of these on startup or early runtime: 1. Fails to create its crash_reports directory (POSIX error 13). 2. Heartbeat or node-status RPC failures indicating a broker is down. 3. Raft group failure. 4. Data center failure
Cause
- Runtime user lacks write permission on `/var/lib/redpanda/data`. 2. Broker unreachable or out of sync. 3. Network partition within Raft quorum. 4. Loss of brokers/VMs hosted within that data center or loss of connectivity to them
Mitigation
Immediate Actions: ```bash chown -R redpanda:redpanda /var/lib/redpanda/data chmod -R 750 /var/lib/redpanda/data # Free up disk or expand the data volume systemctl restart redpanda ``` Long-term Fixes: - InitContainer or boot-script to validate permissions before start. - Multi-broker deployment - Monitor cluster health and broker reachability. - Multi-broker deployment spread across multiple racks or network failure domains - Multi-AZ or replicated deployment - Offline backups
References
- https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/manual/high-availability/
- https://docs.redpanda.com/current/manage/cluster-maintenance/disk-utilization/
- https://docs.redpanda.com/current/manage/rack-awareness/
- https://docs.redpanda.com/current/manage/recovery-mode/
- https://docs.redpanda.com/current/manage/cluster-maintenance/nodewise-partition-recovery/
- https://docs.redpanda.com/current/manage/raft-group-reconfiguration/
- https://docs.redpanda.com/current/manage/cluster-maintenance/node-property-configuration/
- https://docs.redpanda.com/current/manage/monitoring/
- https://vectorized.io