Tag: Known Problem
This is a documented known problem with known mitigations
ID | Title | Description | Category | Technology | Tags |
---|---|---|---|---|---|
CRE-2024-0007 Critical Impact: 9/10 Mitigation: 8/10 | RabbitMQ Mnesia overloaded | The underlying Erlang process, Mnesia, is overloaded (` WARNING Mnesia is overloaded`). | Message Queue Problems | rabbitmq | Known ProblemRabbitMQPublic |
CRE-2024-0008 High Impact: 9/10 Mitigation: 6/10 | RabbitMQ memory alarm | A RabbitMQ node has entered the “memory alarm” state because the total memory used by the Erlang VM (plus allocated binaries, ETS tables,and processes) has exceeded the configured `vm_memory_high_watermark`. While the alarm is active the brokerapplies flow-control, blocking publishers and pausing most ingress activity to protect itself from running out of RAM. | Message Queue Problems | rabbitmq | Known ProblemRabbitMQPublic |
CRE-2024-0014 High Impact: 8/10 Mitigation: 5/10 | RabbitMQ busy distribution port performance issue | The Erlang VM has reported a `busy_dist_port` condition, meaning the send buffer of a distribution port (used for inter-node traffic inside aRabbitMQ cluster) is full. When this happens the scheduler suspends the process owning the port, stalling inter-node replication, managementcalls, and any RabbitMQ process that must use that port. Throughput drops and latency rises until the buffer drains or the node is restarted. | Message Queue Performance | rabbitmq | Known ProblemRabbitMQPublic |
CRE-2024-0016 Low Impact: 4/10 Mitigation: 2/10 | Google Kubernetes Engine metrics agent failing to export metrics | The Google Kubernetes Engine metrics agent is failing to export metrics. | Observability Problems | gke-metrics-agent | Known ProblemGKEPublic |
CRE-2024-0018 Medium Impact: 4/10 Mitigation: 5/10 | Neutron Open Virtual Network (OVN) high CPU usage | OVN daemons (e.g., ovn-controller) are stuck in a tight poll loop, driving CPU to 100 %. Logs show “Dropped … due to excessive rate” or“Unreasonably long … poll interval,” slowing port-binding and network traffic. | Networking Problems | neutron | Known ProblemOvnPublic |
CRE-2024-0021 High Impact: 4/10 Mitigation: 5/10 | KEDA operator reconciler ScaledObject panic | KEDA allows for fine-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. | Operator Problems | Unspecified | KEDACrashKnown ProblemPublic |
CRE-2024-0043 Medium Impact: 6/10 Mitigation: 5/10 | NGINX Upstream DNS Failure | When a NGINX upstream becomes unreachable or its DNS entry disappears, NGINX requests begin to fail. | Proxy Problems | nginx | KafkaKnown ProblemPublic |
CRE-2025-0025 Medium Impact: 6/10 Mitigation: 5/10 | Kafka broker replication mismatch | When the configured replication factor for a Kafka topic is greater than the actual number of brokers in the cluster, Kafka repeatedly fails to assign partitions and logs replication-related errors. This results in persistent warnings or an `InvalidReplicationFactorException` when the broker tries to create internal or user-defined topics. | Message Queue Problems | topic-operator | KafkaKnown ProblemPublic |
CRE-2025-0112 Critical Impact: 10/10 Mitigation: 4/10 | AWS VPC CNI Node IP Pool Depletion Crisis | Critical AWS VPC CNI node IP pool depletion detected causing cascading pod scheduling failures.This pattern indicates severe subnet IP address exhaustion combined with ENI allocation failures,leading to complete cluster networking breakdown. The failure sequence shows ipamd errors,kubelet scheduling failures, and controller-level pod creation blocks that render clustersunable to deploy new workloads, scale existing services, or recover from node failures.This represents one of the most severe Kubernetes infrastructure failures, often requiringimmediate manual intervention including subnet expansion, secondary CIDR provisioning,or emergency workload termination to restore cluster functionality. | VPC CNI Problems | aws-vpc-cni | AWSEKSKubernetesNetworkingVPC CNIAWS CNIIP ExhaustionENI AllocationSubnet ExhaustionPod Scheduling FailureCluster ParalysisAWS API LimitsKnown ProblemCritical InfrastructureService OutageCascading FailureCapacity ExceededScalability IssueRevenue ImpactCompliance ViolationThreshold ExceededInfrastructurePublic |
CRE-2025-0119 High Impact: 8/10 Mitigation: 7/10 | Kubernetes Pod Disruption Budget (PDB) Violation During Rolling Updates | During rolling updates, when a deployment's maxUnavailable setting conflicts with a Pod Disruption Budget's minAvailable requirement, it can cause service outages by terminating too many pods simultaneously, violating the availability guarantees.This can also occur during node drains, cluster autoscaling, or maintenance operations. | Kubernetes Problems | kubernetes | K8sKnown ProblemMisconfigurationOperational errorHigh Availability |