Skip to main content

Tag: Service Outage

Problems related to service outages, such as complete service unavailability or critical failures

IDTitleDescriptionCategoryTechnologyTags
CRE-2025-0075
Critical
Impact: 10/10
Mitigation: 6/10
Nginx Upstream Failure Cascade CrisisDetects critical Nginx upstream failure cascades that lead to complete service unavailability. This advanced rule identifies comprehensive upstream failure patterns including DNS resolution failures, connection timeouts, SSL/TLS handshake errors, protocol violations, and server unavailability, followed by HTTP 5xx error responses within a 60-second window. The rule uses optimized regex patterns for maximum detection coverage while maintaining high performance and low false-positive rates. It captures both the root cause (upstream failures) and the user-facing impact (HTTP errors) to provide complete incident context.Load Balancer ProblemsnginxNginxReverse ProxyService OutageHigh AvailabilityLoad BalancerCascading Failure
CRE-2025-0112
Critical
Impact: 10/10
Mitigation: 4/10
AWS VPC CNI Node IP Pool Depletion CrisisCritical AWS VPC CNI node IP pool depletion detected causing cascading pod scheduling failures. This pattern indicates severe subnet IP address exhaustion combined with ENI allocation failures, leading to complete cluster networking breakdown. The failure sequence shows ipamd errors, kubelet scheduling failures, and controller-level pod creation blocks that render clusters unable to deploy new workloads, scale existing services, or recover from node failures. This represents one of the most severe Kubernetes infrastructure failures, often requiring immediate manual intervention including subnet expansion, secondary CIDR provisioning, or emergency workload termination to restore cluster functionality.VPC CNI Problemsaws-vpc-cniAWSEKSKubernetesNetworkingVPC CNIAWS CNIIP ExhaustionENI AllocationSubnet ExhaustionPod Scheduling FailureCluster ParalysisAWS API LimitsKnown ProblemCritical InfrastructureService OutageCascading FailureCapacity ExceededScalability IssueRevenue ImpactCompliance ViolationThreshold ExceededInfrastructurePublic
CRE-2025-0114
High
Impact: 0/10
Mitigation: 0/10
Nginx Ingress Controller rewritten URI has a zero lengthDetects rewrite error which leads to service unavailability. Wrong rewrite causes responses with HTTP code 500 or 400. This CRE detects empty rewrite.Load Balancer ProblemsnginxNginxReverse ProxyService OutageIngress ControllerNGINX IngressLoad BalancerKubernetes