Tag: Service Outage

Problems related to service outages, such as complete service unavailability or critical failures

ID	Title	Description	Category	Technology	Tags
CRE-2025-0075 Critical Impact: 10/10 Mitigation: 6/10	Nginx Upstream Failure Cascade Crisis	Detects critical Nginx upstream failure cascades that lead to complete service unavailability. This advanced rule identifies comprehensive upstream failure patterns including DNS resolution failures, connection timeouts, SSL/TLS handshake errors, protocol violations, and server unavailability, followed by HTTP 5xx error responses within a 60-second window. The rule uses optimized regex patterns for maximum detection coverage while maintaining high performance and low false-positive rates. It captures both the root cause (upstream failures) and the user-facing impact (HTTP errors) to provide complete incident context.	Load Balancer Problems	nginx	Nginx Reverse Proxy Service Outage High Availability Load Balancer Cascading Failure
CRE-2025-0112 Critical Impact: 10/10 Mitigation: 4/10	AWS VPC CNI Node IP Pool Depletion Crisis	Critical AWS VPC CNI node IP pool depletion detected causing cascading pod scheduling failures. This pattern indicates severe subnet IP address exhaustion combined with ENI allocation failures, leading to complete cluster networking breakdown. The failure sequence shows ipamd errors, kubelet scheduling failures, and controller-level pod creation blocks that render clusters unable to deploy new workloads, scale existing services, or recover from node failures. This represents one of the most severe Kubernetes infrastructure failures, often requiring immediate manual intervention including subnet expansion, secondary CIDR provisioning, or emergency workload termination to restore cluster functionality.	VPC CNI Problems	aws-vpc-cni	AWS EKS Kubernetes Networking VPC CNI AWS CNI IP Exhaustion ENI Allocation Subnet Exhaustion Pod Scheduling Failure Cluster Paralysis AWS API Limits Known Problem Critical Infrastructure Service Outage Cascading Failure Capacity Exceeded Scalability Issue Revenue Impact Compliance Violation Threshold Exceeded Infrastructure Public
CRE-2025-0114 High Impact: 8/10 Mitigation: 6/10	Nginx Ingress Controller rewritten URI has a zero length	Detects rewrite error which leads to service unavailability. Wrong rewrite causes responses with HTTP code 500 or 400. This CRE detects empty rewrite.	Load Balancer Problems	nginx	Nginx Reverse Proxy Service Outage Ingress Controller NGINX Ingress Load Balancer Kubernetes