Skip to main content

PREQUEL-2025-0078

AWS LoadBalancer Security Group FailureLow
Impact: 6/10
Mitigation: 5/10

PREQUEL-2025-0078View on GitHub

Description

While reconciling a TargetGroupBinding the AWS Load Balancer Controller\ninspects the ENI attached to each pod (IP mode) or worker node\n(instance mode). \nIf it finds **zero or more than one** security group carrying the\ncluster-ownership tag \n`kubernetes.io/cluster/<cluster-name>: owned`, it aborts and logs:\n\n```\nReconciler error … targetGroupBinding … expected exactly one\nsecurityGroup tagged …\n```\n\nWhen this happens the controller never attaches nodes/pods to\ntarget groups, so the load balancer comes up with **0 healthy\ntargets**.\n

Mitigation

1. **Verify SG count per ENI / node**\n\n ```bash\n aws ec2 describe-network-interfaces \\\n --filters \"Name=attachment.instance-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)\" \\\n --query 'NetworkInterfaces[].Groups[?Tags[?Key==`kubernetes.io/cluster/<cluster-name>` && Value==`owned`]].GroupId'\n ```\n exactly **one** ID should be returned.\n\n2. **Remove the duplicate tag** from extra SGs **or** detach the\n extra SG entirely:\n\n ```bash\n aws ec2 delete-tags --resources sg-xxxx \\\n --tags Key=\"kubernetes.io/cluster/<cluster-name>\",Value=\"owned\"\n ```\n\n3. **Let the controller recreate** the backend SG:\n scale the deployment to 0 → 1 or run \n `kubectl annotate svc <svc> alb.ingress.kubernetes.io/healthcheck-port=traffic-port --overwrite`.\n\n4. **Prevent drift** – update IaC so only **one** SG carries the\n `owned` tag, or use the Controller annotation \n `alb.ingress.kubernetes.io/security-groups` to pin a single SG.\n

References