Skip to main content

Tag: Cascading Failure

Problems related to cascading failures, where one failure leads to multiple dependent failures

IDTitleDescriptionCategoryTechnologyTags
CRE-2025-0075
Critical
Impact: 10/10
Mitigation: 6/10
Nginx Upstream Failure Cascade CrisisDetects critical Nginx upstream failure cascades that lead to complete service unavailability. This advanced rule identifies comprehensive upstream failure patterns including DNS resolution failures, connection timeouts, SSL/TLS handshake errors, protocol violations, and server unavailability, followed by HTTP 5xx error responses within a 60-second window. The rule uses optimized regex patterns for maximum detection coverage while maintaining high performance and low false-positive rates. It captures both the root cause (upstream failures) and the user-facing impact (HTTP errors) to provide complete incident context.Load Balancer ProblemsnginxNginxReverse ProxyService OutageHigh AvailabilityLoad BalancerCascading Failure
CRE-2025-0112
Critical
Impact: 10/10
Mitigation: 4/10
AWS VPC CNI Node IP Pool Depletion CrisisCritical AWS VPC CNI node IP pool depletion detected causing cascading pod scheduling failures. This pattern indicates severe subnet IP address exhaustion combined with ENI allocation failures, leading to complete cluster networking breakdown. The failure sequence shows ipamd errors, kubelet scheduling failures, and controller-level pod creation blocks that render clusters unable to deploy new workloads, scale existing services, or recover from node failures. This represents one of the most severe Kubernetes infrastructure failures, often requiring immediate manual intervention including subnet expansion, secondary CIDR provisioning, or emergency workload termination to restore cluster functionality.VPC CNI Problemsaws-vpc-cniAWSEKSKubernetesNetworkingVPC CNIAWS CNIIP ExhaustionENI AllocationSubnet ExhaustionPod Scheduling FailureCluster ParalysisAWS API LimitsKnown ProblemCritical InfrastructureService OutageCascading FailureCapacity ExceededScalability IssueRevenue ImpactCompliance ViolationThreshold ExceededInfrastructurePublic