38 docs tagged with "public"

CRE-2024-0007

The underlying Erlang process, Mnesia, is overloaded (`** WARNING ** Mnesia is overloaded`).

CRE-2024-0008

A RabbitMQ node has entered the “memory alarm” state because the total memory used by the Erlang VM (plus allocated binaries, ETS tables,

CRE-2024-0014

The Erlang VM has reported a **`busy_dist_port`** condition, meaning the send buffer of a distribution port (used for inter\-node traffic inside a

CRE-2024-0016

The Google Kubernetes Engine metrics agent is failing to export metrics.

CRE-2024-0018

OVN daemons (e.g., ovn\-controller) are stuck in a tight poll loop, driving CPU to 100 %. Logs show “Dropped … due to excessive rate” or

Grafana can get into a state where it writes more errors messages than it can process. The problem is compounded when Grafana is collecting its own error logs that include the related warnings that it can no longer keep up.

CRE-2024-0020

Grafana alloy Loki fanout crashes when the number of log files exceeds the number of ingesters.

CRE-2024-0021

KEDA allows for fine\-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition.

CRE-2024-0043

When a NGINX upstream becomes unreachable or its DNS entry disappears, NGINX requests begin to fail.

CRE-2025-0025

When the configured replication factor for a Kafka topic is greater than the actual number of brokers in the cluster, Kafka repeatedly fails to assign partitions and logs replication\-related errors. This results in persistent warnings or an `InvalidReplicationFactorException` when the broker tries to create internal or user\-defined topics.

CRE-2025-0026

In clusters using the AWS EBS CSI driver, the controller may fail to detach a volume if the associated VolumeAttachment resource has an empty `spec.nodeName`. This results in a log error and skipped detachment, which may block PVC reuse or node cleanup.

CRE-2025-0027

In OpenStack deployments using Neutron with the OVN ML2 driver, ports could be bound to agents that were not alive. This behavior led to virtual machines experiencing network interface plug timeouts during provisioning, as the port binding would not complete successfully.

CRE-2025-0028

In OpenTelemetry Python, detaching a context token that was created in a different context can raise a `ValueError`. This occurs when asynchronous operations, such as generators or coroutines, are finalized in a different context than they were created, leading to context management errors and potential trace data loss.

CRE-2025-0029

\- When deploying Grafana Loki with AWS S3 as the storage backend and specifying a custom S3 endpoint (e.g., for FIPS compliance or GovCloud regions), Loki may fail to retrieve AWS credentials via IAM Roles for Service Accounts (IRSA). This results in errors during startup or when attempting to upload index tables, preventing Loki from functioning correctly.

CRE-2025-0030

SQLAlchemy applications using `create_engine()` may fail to connect to a database if the username or password contains special characters (e.g., `@`, `\:`, `/`, `\#`). These characters must be URL\-encoded when included in the database connection string. Failure to encode them leads to parsing errors or incorrect credential usage.

CRE-2025-0031

Django applications may return a \"DisallowedHost\" error when receiving requests with an unrecognized or missing Host header. This typically occurs in production environments where reverse proxies, load balancers, or external clients send requests using an unexpected domain or IP address. Django blocks these requests unless the domain is explicitly listed in `ALLOWED_HOSTS`.

CRE-2025-0032

Loki instances using memcached for caching may emit excessive warning or error logs when the configured`memcached_client` service port name does not match the actual Kubernetes service port. This does not cause a crash or failure, but it results in noisy logs and ineffective caching behavior.

CRE-2025-0033

The OpenTelemetry Collector may refuse to ingest metrics during a Prometheus scrape if it exceeds its configured memory limits. When the `memory_limiter` processor is enabled, the Collector actively drops data to prevent out\-of\-memory errors, resulting in log messages indicating that data was refused due to high memory usage.

CRE-2025-0034

If the Datadog agent or client libraries do not detect a configured API key, they will skip sending metrics, logs, and events. This results in a silent failure of observability reporting, often visible only through startup log messages.

CRE-2025-0035

Applications using psycopg2 with OpenTelemetry instrumentation or threading may fail with SSL\-related errors such as \"decryption failed or bad record mac\". This often occurs when a database connection is created before a fork or from an unsafe thread context, causing the SSL state to become invalid.

CRE-2025-0036

The OpenTelemetry Collector may drop telemetry data when an exporter backend responds with a 413 Payload Too Large error. This typically happens when large batches of metrics, logs, or traces exceed the maximum payload size accepted by the backend. By default, the collector drops these payloads unless retry behavior is explicitly enabled.

CRE-2025-0037

The OpenTelemetry Collector can panic due to a nil pointer dereference in the Prometheus Remote Write exporter. The issue occurs when attribute values are assumed to be strings, but the internal representation is nil or incompatible, leading to a runtime `SIGSEGV` segmentation fault and crashing the collector.

CRE-2025-0038

Grafana Loki may emit errors when attempting to write to a Memcached backend that has run out of available memory. This results in dropped index or query cache entries, which can degrade query performance but does not interrupt ingestion.

CRE-2025-0039

The OpenTelemetry Collector may intermittently fail to export telemetry data when the backend API is unavailable or overloaded. These failures manifest as timeouts (`context deadline exceeded`) or transient HTTP 502 responses. While retry logic is typically enabled, repeated failures can introduce delay or backpressure.

CRE-2025-0040

During load balancer creation or other operations involving logical router and logical switch associations, Neutron OVN may raise a `RowNotFound` exception when attempting to reference a logical switch that has just been deleted. This leads to a port binding failure and a rollback of the affected operation.

CRE-2025-0041

\- In redis\-py v5.x, sharing a single Redis client across async tasks or subprocesses can result in\:

CRE-2025-0042

\- Applications using Django with PostgreSQL and psycopg2 may encounter `deadlock detected` errors under concurrent write\-heavy workloads.

CRE-2025-0043

Grafana may reject custom or third\-party plugins at runtime if they are not digitally signed. When plugin signature validation is enabled (default since Grafana 8+), unsigned plugins are blocked and logged as validation errors during startup or plugin loading.

CRE-2025-0044

Detects NGINX configuration files that advertise obsolete and cryptographically weak ciphers (RC4\-MD5, RC4\-SHA, DES\-CBC3\-SHA).

CRE-2025-0045

The NATS server has emitted an **Authorization Violation** log entry, meaning a client attempted to connect, publish, subscribe, or perform another operation for which it lacks permission. Intermittent violations often point to misconfiguration or start\-up chaos. However, sustained or widespread violations can signal credential expiry or missing secrets.

CRE-2025-0046

The NATS server has emitted an **Permission Violation** log entry, meaning

CRE-2025-0048

A Kubernetes worker node has entered the **NotReady** state.

CRE-2025-0049

The NATS server is configured to publish messages with payloads that may

CRE-2025-0056

NGINX has reported that the configured worker_connections limit has been reached. This indicates that the web server

CRE-2025-0073

The Redis instance has reached its configured 'maxmemory' limit. Because its active memory

CRE-2025-0077

PostgreSQL logs an error when it cannot extend a data file (table/index) because

CRE-2025-0112

Critical AWS VPC CNI node IP pool depletion detected causing cascading pod scheduling failures.

PREQUEL-2025-0094

cert\-manager is unable to clean up Cloudflare DNS\-01 challenges due to a change in the Cloudflare API, which no longer returns zone information in individual DNS records. This breaks the interaction when cert\-manager attempts to delete the TXT record, resulting in a failed certificate generation.