Category: Observability Problems
Problems related to observability, like monitoring, logging, and tracing
ID | Title | Description | Category | Technology | Tags |
---|---|---|---|---|---|
CRE-2024-0016 Low Impact: 4/10 Mitigation: 2/10 | Google Kubernetes Engine metrics agent failing to export metrics | The Google Kubernetes Engine metrics agent is failing to export metrics. | Observability Problems | gke-metrics-agent | Known ProblemGKEPublic |
CRE-2025-0028 Low Impact: 6/10 Mitigation: 1/10 | OpenTelemetry Python fails to detach context token across async boundaries | In OpenTelemetry Python, detaching a context token that was created in a different context can raise a `ValueError`. This occurs when asynchronous operations, such as generators or coroutines, are finalized in a different context than they were created, leading to context management errors and potential trace data loss. | Observability Problems | opentelemetry-python | OpentelemetryPythonContextvarsAsyncObservabilityPublic |
CRE-2025-0032 Low Impact: 2/10 Mitigation: 4/10 | Loki generates excessive logs when memcached service port name is incorrect | Loki instances using memcached for caching may emit excessive warning or error logs when the configured`memcached_client` service port name does not match the actual Kubernetes service port. This does not cause a crash or failure, but it results in noisy logs and ineffective caching behavior. | Observability Problems | loki | LokiMemcachedConfigurationServiceCacheKnown IssueKubernetesPublic |
CRE-2025-0033 Low Impact: 7/10 Mitigation: 4/10 | OpenTelemetry Collector refuses to scrape due to memory pressure | The OpenTelemetry Collector may refuse to ingest metrics during a Prometheus scrape if it exceeds its configured memory limits. When the `memory_limiter` processor is enabled, the Collector actively drops data to prevent out-of-memory errors, resulting in log messages indicating that data was refused due to high memory usage. | Observability Problems | opentelemetry-collector | Otel CollectorPrometheusMemoryMetricsBackpressureData LossKnown IssuePublic |
CRE-2025-0034 Medium Impact: 6/10 Mitigation: 2/10 | Datadog agent disabled due to missing API key | If the Datadog agent or client libraries do not detect a configured API key, they will skip sending metrics, logs, and events. This results in a silent failure of observability reporting, often visible only through startup log messages. | Observability Problems | datadog | DatadogConfigurationApi KeyObservabilityEnvironmentTelemetryKnown IssuePublic |
CRE-2025-0036 Low Impact: 6/10 Mitigation: 3/10 | OpenTelemetry Collector drops data due to 413 Payload Too Large from exporter target | The OpenTelemetry Collector may drop telemetry data when an exporter backend responds with a 413 Payload Too Large error. This typically happens when large batches of metrics, logs, or traces exceed the maximum payload size accepted by the backend. By default, the collector drops these payloads unless retry behavior is explicitly enabled. | Observability Problems | opentelemetry-collector | Otel CollectorExporterPayloadBatchDropObservabilityTelemetryKnown IssuePublic |
CRE-2025-0037 Low Impact: 8/10 Mitigation: 4/10 | OpenTelemetry Collector panics on nil attribute value in Prometheus Remote Write translator | The OpenTelemetry Collector can panic due to a nil pointer dereference in the Prometheus Remote Write exporter. The issue occurs when attribute values are assumed to be strings, but the internal representation is nil or incompatible, leading to a runtime `SIGSEGV` segmentation fault and crashing the collector. | Observability Problems | opentelemetry-collector | CrashPrometheusOtel CollectorExporterPanicTranslationAttributeNil PointerKnown IssuePublic |
CRE-2025-0038 Low Impact: 5/10 Mitigation: 3/10 | Loki fails to cache entries due to Memcached out-of-memory error | Grafana Loki may emit errors when attempting to write to a Memcached backend that has run out of available memory. This results in dropped index or query cache entries, which can degrade query performance but does not interrupt ingestion. | Observability Problems | loki | LokiMemcachedCacheMemoryInfrastructureKnown IssuePublic |
CRE-2025-0039 Medium Impact: 5/10 Mitigation: 3/10 | OpenTelemetry Collector exporter experiences retryable errors due to backend unavailability | The OpenTelemetry Collector may intermittently fail to export telemetry data when the backend API is unavailable or overloaded. These failures manifest as timeouts (`context deadline exceeded`) or transient HTTP 502 responses. While retry logic is typically enabled, repeated failures can introduce delay or backpressure. | Observability Problems | opentelemetry-collector | Otel CollectorExporterTimeoutRetryNetworkTelemetryKnown IssuePublic |
CRE-2025-0043 Medium Impact: 4/10 Mitigation: 2/10 | Grafana fails to load plugin due to missing signature | Grafana may reject custom or third-party plugins at runtime if they are not digitally signed. When plugin signature validation is enabled (default since Grafana 8+), unsigned plugins are blocked and logged as validation errors during startup or plugin loading. | Observability Problems | grafana | GrafanaPluginValidationSignatureConfigurationSecurityKnown IssuePublic |
CRE-2025-0090 Low Impact: 0/10 Mitigation: 0/10 | Loki Log Line Exceeds Max Size Limit | Alloy detects the Loki is dropping log lines because they exceed the configured maximum line size. This typically indicates that applications are emitting extremely long log entries, which Loki is configured to reject by default. | Observability Problems | alloy log | AlloyLokiLogsObservabilityGrafana |