CRE-2025-0125
Kubelet EventedPLEG Panic Causes NodeFailureHighImpact: 9/10Mitigation: 6/10
CRE-2025-0125View on GitHub
Description
Detects a critical kubelet panic in the EventedPLEG subsystem under rapid pod launch pressure. When triggered, the node's kubelet crashes, the node becomes NotReady and all resident pods are evicted resulting in a full node-level outage until manual intervention.
Cause
- High pod creation rate on a constrained node
- EventedPLEG feature gate enabled (default in v1.33+)
- Concurrent CRI events exceeding kubelet's goroutine handling
Mitigation
IMMEDIATE ACTIONS:
- Restart kubelet on the affected node
- Distribute load across nodes
RECOVERY:
- Disable `EventedPLEG` via feature gate if persistent
- Monitor kubelet logs for panics related to evented.go
PREVENTION:
- Test EventedPLEG in staging before enabling in production
- Use pod admission control to slow pod floods on small nodes