CRE-2025-0039
OpenTelemetry Collector exporter experiences retryable errors due to backend unavailabilityMediumImpact: 5/10Mitigation: 3/10
CRE-2025-0039View on GitHub
Description
The OpenTelemetry Collector may intermittently fail to export telemetry data when the backend API is unavailable or overloaded. These failures manifest as timeouts (`context deadline exceeded`) or transient HTTP 502 responses. While retry logic is typically enabled, repeated failures can introduce delay or backpressure.
Mitigation
- Ensure `retry_on_failure` is enabled for all exporters. - Tune `timeout` and `sending_queue` settings to absorb temporary backend disruptions. - Monitor retry counts and dropped items via Collector metrics (e.g., `exporter/send_failed_spans`). - Consider buffering via Kafka or persistent queues for high-reliability workloads.