Skip to main content

CRE-2025-0122

AWS VPC CNI IP Address Exhaustion CrisisCritical
Impact: 10/10
Mitigation: 6/10

CRE-2025-0122View on GitHub

Description

Critical AWS VPC CNI IP address exhaustion detected. This pattern indicates cascading failures\nwhere subnet IP exhaustion leads to ENI allocation failures, pod scheduling failures, and\ncomplete service unavailability. The failure sequence shows IP allocation errors, ENI attachment\nfailures, and resulting pod startup failures that affect cluster scalability and workload deployment.\n

Mitigation

IMMEDIATE ACTIONS:\n- Check available IPs in subnets: `aws ec2 describe-subnets --subnet-ids subnet-xxx`\n- Verify ENI limits: `aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=i-xxx`\n- Monitor VPC CNI logs: `kubectl logs -n kube-system -l app=aws-node`\n- Check pod scheduling: `kubectl get pods --all-namespaces | grep Pending`\n- Verify CNI configuration: `kubectl get configmap -n kube-system aws-node -o yaml`\n\nRECOVERY STEPS:\n1. Add additional subnets with larger CIDR blocks\n2. Increase ENI warm pool size: `kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=2`\n3. Enable prefix delegation: `kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true`\n4. Scale down non-critical workloads to free IPs\n5. Restart VPC CNI daemonset: `kubectl rollout restart daemonset/aws-node -n kube-system`\n6. Monitor IP allocation recovery: `kubectl get pods -n kube-system -l app=aws-node`\n\nPREVENTION:\n- Implement IP address monitoring and alerting\n- Configure subnet auto-scaling with larger CIDR blocks\n- Set up VPC CNI metrics monitoring in CloudWatch\n- Implement pod density limits per node\n- Use prefix delegation for improved IP efficiency\n- Regular capacity planning for cluster growth\n- Implement network policy optimization\n- Set up automated subnet provisioning\n

References