Skip to main content

CRE-2025-0122

AWS VPC CNI IP Address Exhaustion CrisisCritical
Impact: 10/10
Mitigation: 6/10

CRE-2025-0122View on GitHub

Description

Critical AWS VPC CNI IP address exhaustion detected. This pattern indicates cascading failures

where subnet IP exhaustion leads to ENI allocation failures, pod scheduling failures, and

complete service unavailability. The failure sequence shows IP allocation errors, ENI attachment

failures, and resulting pod startup failures that affect cluster scalability and workload deployment.


Cause

  • Subnet IP address pool exhaustion in VPC
  • Maximum ENI limit reached per EC2 instance
  • Secondary IP allocation failures on existing ENIs
  • VPC CNI plugin configuration errors
  • Insufficient subnet CIDR block size for cluster scale
  • ENI warm pool depletion during traffic spikes
  • AWS API rate limiting on EC2 ENI operations
  • Security group or NACL blocking ENI operations
  • IAM permissions missing for ENI management
  • Cross-AZ networking constraints affecting IP allocation

Mitigation

IMMEDIATE ACTIONS:

  • Check available IPs in subnets: `aws ec2 describe-subnets --subnet-ids subnet-xxx`
  • Verify ENI limits: `aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=i-xxx`
  • Monitor VPC CNI logs: `kubectl logs -n kube-system -l app=aws-node`
  • Check pod scheduling: `kubectl get pods --all-namespaces | grep Pending`
  • Verify CNI configuration: `kubectl get configmap -n kube-system aws-node -o yaml`

RECOVERY STEPS:

  1. Add additional subnets with larger CIDR blocks
  2. Increase ENI warm pool size: `kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=2`
  3. Enable prefix delegation: `kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true`
  4. Scale down non-critical workloads to free IPs
  5. Restart VPC CNI daemonset: `kubectl rollout restart daemonset/aws-node -n kube-system`
  6. Monitor IP allocation recovery: `kubectl get pods -n kube-system -l app=aws-node`

PREVENTION:

  • Implement IP address monitoring and alerting
  • Configure subnet auto-scaling with larger CIDR blocks
  • Set up VPC CNI metrics monitoring in CloudWatch
  • Implement pod density limits per node
  • Use prefix delegation for improved IP efficiency
  • Regular capacity planning for cluster growth
  • Implement network policy optimization
  • Set up automated subnet provisioning

References