CRE-2025-0061

Karpenter Stability Issues on EKS During Leader ElectionMedium
Impact: 7/10
Mitigation: 4/10

CRE-2025-0061View on GitHub

Description

- EKS may be able to handle steady, predictable scale, but struggles during large‑scale auto scaling events when many workloads and nodes are spinning up or down simultaneously. - This instability affects components that implement leader election using the Kubernetes API, such as: - aws‑load‑balancer‑controller - karpenter - keda‑operator - ebs‑csi‑controller - efs‑csi‑controller

Mitigation

- Use Kubernetes API Priority and Fairness (FlowSchema and PriorityLevelConfiguration) to prioritize leader election traffic during high load. - Assign `workload‑high` priority to requests from critical components like the Karpenter controller. - Monitor etcd size and schedule regular defragmentation to reduce unplanned contention.

References

https://github.com/aws/karpenter-provider-aws/issues/4168
https://aws.amazon.com/blogs/containers/explore-etcd-defragmentation-in-amazon-eks/
https://aws.amazon.com/blogs/containers/managing-etcd-database-size-on-amazon-eks-clusters/