CRE-2024-0014
RabbitMQ busy distribution port performance issueHighImpact: 8/10Mitigation: 5/10
Description
The Erlang VM has reported a **`busy_dist_port`** condition, meaning the send buffer of a distribution port (used for inter-node traffic inside a\nRabbitMQ cluster) is full. When this happens the scheduler suspends the process owning the port, stalling inter-node replication, management\ncalls, and any RabbitMQ process that must use that port. Throughput drops and latency rises until the buffer drains or the node is restarted.\n
Mitigation
- **Diagnose** – run `rabbitmq-diagnostics busy_dist_port` (3.13+) or inspect warnings to identify affected nodes. \n- **Raise the buffer limit** – set `RABBITMQ_DISTRIBUTION_BUFFER_SIZE=512000` (≈ 512 MB) or pass `-zdbbl 512000` to the Erlang VM; restart the node. Ensure the pod / host memory limit is increased accordingly (≥ 512 MB × node count). \n- **Reduce message size / batch weight** – split or compress payloads larger than ~100 MB. \n- **Minimise cross-node chatter** – co-locate heavily interacting queues or prefer quorum/stream queues that require fewer replicas. \n- **Check network health** – keep RTT < 1 ms and eliminate loss between nodes (same AZ / rack where possible). \n- **Upgrade** – run RabbitMQ ≥ 3.13 and Erlang ≥ 26.2, which improve distribution buffer handling. \n- **Alert & monitor** – track `busy_dist_port`, distribution send queue length, and memory alarms to intervene before throughput collapses.\n