Skip to main content

18 docs tagged with "kubernetes"

View all tags

CRE-2025-0032

Loki instances using memcached for caching may emit excessive warning or error logs when the configured`memcached_client` service port name does not match the actual Kubernetes service port. This does not cause a crash or failure, but it results in noisy logs and ineffective caching behavior.

CRE-2025-0048

A Kubernetes worker node has entered the **NotReady** state.

CRE-2025-0069

Pods that mount NFS volumes and set `securityContext.fsGroup` still have the directory owned by `root\:root`. The kubelet does not chown the share, so non\-root containers fail with \"Permission denied\".

CRE-2025-0071

CoreDNS deployment is unavailable or has no ready endpoints, indicating an imminent cluster\-wide DNS outage.

CRE-2025-0112

Critical AWS VPC CNI node IP pool depletion detected causing cascading pod scheduling failures.

CRE-2025-0114

Detects rewrite error which leads to service unavailability.

CRE-2025-0121

Critical NGINX Ingress Controller SSL certificate validation failure detected. This pattern indicates

CRE-2025-0122

Critical AWS VPC CNI IP address exhaustion detected. This pattern indicates cascading failures

CRE-2025-0125

Detects a critical kubelet panic in the EventedPLEG subsystem under rapid pod launch pressure. When triggered, the node's kubelet crashes, the node becomes NotReady and all resident pods are evicted resulting in a full node\-level outage until manual intervention.

PREQUEL-2025-0001

One or more cluster components (kubectl sessions, operators, controllers, CI/CD jobs, etc.) hit the **default client\-side rate\-limiter in client\-go** (QPS = 5, Burst = 10). The client logs messages such as `Waited for <N>s due to client\-side throttling, not priority and fairness` and delays each request until a token is available. Although the API server itself may still have spare capacity, and Priority & Fairness queueing is not the bottleneck, end\-user actions and controllers feel sluggish or appear to “stall”.

PREQUEL-2025-0002

Prometheus is failing to scrape and write Envoy metrics from Istio sidecars due to an unexpected EOF error. This occurs when trying to collect metrics from services that don't have proper protocol selection configured in their Kubernetes Service definition

PREQUEL-2025-0010

Telepresence 2.5.x versions suffer from a critical TLS handshake error between the mutating webhook and the agent injector.

PREQUEL-2025-0020

80% or more of a deployment's replica pods are scheduled on the same Kubernetes node. If this node shuts down or experiences a problem, the service will experience an outage.

PREQUEL-2025-0081

ArgoCD application controller fails to process certain custom resources due to being unable to find API fields in struct RawExtension. This commonly affects users deploying Datadog Operator CRDs, resulting in application sync errors for these resources.

PREQUEL-2025-0087

Kyverno policies with JMESPath expressions are failing due to references to keys that don't exist in the target resources. This happens when policies attempt to access object properties that aren't present in the resources being validated, resulting in \"Unknown key\" errors during policy validation.

PREQUEL-2025-0090

Karpenter is unable to provision new nodes because the current Karpenter version is not compatible with Kubernetes version . This incompatibility causes validation errors in the nodeclass controller and prevents pods from being scheduled properly in the cluster.

PREQUEL-2025-0093

The aws\-load\-balancer\-controller is unable to translate an Ingress resource into an AWS ALB Listener Rule when the path contains a wildcard (*) and the pathType is set to Prefix.