Skip to main content

PREQUEL-2025-0082

HashiCorp Vault Raft Cluster Communication FailureHigh
Impact: 9/10
Mitigation: 7/10

PREQUEL-2025-0082View on GitHub

Description

HashiCorp Vault nodes in a Raft cluster are unable to communicate with each other for an extended period. This disrupts the Raft consensus mechanism which is critical for Vault's high availability and data consistency. When nodes can't communicate, the cluster may lose quorum, preventing operations like unsealing, authentication, or secret retrieval.\n

Mitigation

- Verify network connectivity between all Vault nodes using tools like ping, telnet, or netcat\n- Check DNS resolution for peer hostnames from each node\n- Review firewall rules to ensure cluster ports (typically 8201) are open between all nodes\n- Inspect Vault server logs for specific communication errors\n- Validate Raft cluster status with `vault operator raft list-peers`\n- Run network diagnostics to identify potential packet loss or latency issues\n- If DNS is the issue, add host entries to /etc/hosts as a temporary workaround\n- For persistent issues, remove problematic nodes using `vault operator raft remove-peer` after ensuring quorum can be maintained\n- Consider implementing a robust monitoring system to detect cluster communication issues early\n

References