Skip to main content

PREQUEL-2025-0024

Istio Traffic TimeoutHigh
Impact: 6/10
Mitigation: 7/10

PREQUEL-2025-0024View on GitHub

Description

Connections routed through **ztunnel** stop after the default 10s deadline. Ztunnel logs show \n`error access connection complete ... error=\"io error: deadline has elapsed\"`\nor \n`error=\"connection timed out, maybe a NetworkPolicy is blocking HBONE port 15008\"` \nwhile clients see 504 Gateway Timeout or connection-reset errors.\nThe issue is limited to workloads enrolled in Ambient mode; sidecar-injected\nor “no-mesh” pods continue to work.\n

Mitigation

1. **Allow TCP 15008** on every path (NetworkPolicy, security groups,\n node firewall). \n2. Verify ztunnel reachability:\n\n ```bash\n kubectl exec -it <pod> -- \\\n nc -vz $(kubectl get pod -l istio=ztunnel -n istio-system \\\n -ojsonpath='{.items[0].status.podIP}') 15008\n ```\n\n3. If quick relief is needed, remove the label\n `istio.io/dataplane-mode=ambient` from the namespace or pod to fall back\n to sidecars (or no mesh). \n4. Upgrade to **Istio ≥ 1.24** (Ambient GA) or at least 1.23.4, which\n resolves several timeout bugs. \n5. For Cilium, set `bpf.masquerade=false`; for Calico, upgrade to ≥ 3.29 or\n disable eBPF dataplane. \n6. After changes, monitor logs with \n `istioctl logs deploy/ztunnel -n istio-system -f` to confirm the absence\n of `deadline has elapsed` messages.\n

References