Skip to main content

Tag: Replication

Replication failures, lag, or divergence in stateful systems.

IDTitleDescriptionCategoryTechnologyTags
CRE-2025-0020
High
Impact: 10/10
Mitigation: 6/10
Self-hosted PostgreSQL HA: WAL Streaming & HA Controller Crisis (Replication Slot Loss, Disk Full, Etcd Quorum Failure)
Detects high-severity failures in self-hosted PostgreSQL high-availability clusters managed by Patroni, Zalando, or similar HA controllers.This rule targets catastrophic conditions that break replication or cluster consensus:
  • WAL streaming failures due to missing replication slots (usually after disk full or crash events)
  • Persistent errors resolving HA controller endpoints (etcd/consul) and loss of HA controller quorum
  • Disk saturation leading to WAL write errors and replication breakage
PostgreSQL High AvailabilitypostgresqlHigh AvailabilityPatroniZalandoEtcdReplicationWALStorageQuorumCrashData LossTimeout
CRE-2025-0070
Critical
Impact: 10/10
Mitigation: 6/10
Kafka Under-Replicated Partitions Crisis
Critical Kafka cluster degradation detected: Multiple partitions have lost replicas due to broker failure,resulting in an under-replicated state. This pattern indicates a broker has become unavailable, causingpartition leadership changes and In-Sync Replica (ISR) shrinkage across multiple topics.
Message Queue ProblemskafkaKafkaReplicationData LossHigh AvailabilityBroker FailureCluster Degradation