secure k8 s

How my NetworkPolicy silently turned ingress-nginx into a 5-second tarpit

18. Mai 2026 · networkpolicy · ingress-nginx · calico · rke2

This was supposed to be a celebration post. The site you’re reading had just gone live: an Astro static build, hardened nginx pod, namespace with PSA restricted, two replicas, NetworkPolicy locking it down. Cert was green. curl -I https://securek8s.de returned 200. I clicked around to admire my work — and noticed half the page loads felt strange. Not broken. Just off.

So I measured.

The symptom

$ for p in / /blog/ /blog/hello-world/ /projects/ /infrastructure/ /about/ /rss.xml; do
    curl -ks -o /dev/null \
      -w "%{http_code} %{size_download}B  total=%{time_total}s  $p\n" \
      "https://securek8s.de$p"
  done
200 4487B  total=0.177s  /
200 3435B  total=0.131s  /blog/
200 4019B  total=5.174s  /blog/hello-world/
200 3667B  total=5.210s  /projects/
200 4271B  total=5.158s  /infrastructure/
200 3379B  total=5.211s  /about/
200  499B  total=5.187s  /rss.xml

Two-hundred OKs across the board. No errors. No retries from the client’s perspective. But about half the requests took roughly five seconds, and the rest took roughly one hundred and fifty milliseconds. Same domain, same TLS session costs amortized, same payload size order of magnitude. Same static asset on a read-only filesystem.

Five seconds is a suspiciously round number.

Where the latency wasn’t

A static site has a short list of suspects. I crossed them off in order.

Not the pod. Port-forwarding directly to the Deployment bypassed everything upstream of the container — no Service, no kube-proxy, no ingress, no DNS:

$ kubectl -n securek8s port-forward deploy/web-securek8s 18088:8088 &
$ for i in 1 2 3 4 5; do
    curl -s -o /dev/null -w "  pf run $i: %{time_total}s\n" \
      http://127.0.0.1:18088/about/
  done
  pf run 1: 0.277831s
  pf run 2: 0.259784s
  pf run 3: 0.257599s
  pf run 4: 0.196659s
  pf run 5: 0.257579s

Five consecutive sub-300-ms responses. The pod was fine. Whatever was eating the five seconds lived between the client and the pod.

Not the pod’s restart loop. Pods showed RESTARTS: 0, 87 minutes uptime, zero events in the namespace.

Not DNS, not IPv6. Only an A record exists for securek8s.de. Forcing -4 on curl reproduced the issue. So did skipping DNS entirely with --resolve securek8s.de:443:91.98.218.42.

That left the path between curl and the pod: the public IP, the ingress controller, the Service, the CNI, the NetworkPolicy. Time to look at each.

Pinning the bad hop

The cluster’s ingress-nginx runs as a DaemonSet with hostNetwork: true on two worker nodes. So I targeted each worker directly with --resolve and ran ten requests against each:

=== 10x via 91.98.218.42 (Hetzner front IP) ===
  5.27s 5.18s 5.13s 5.19s 5.20s 5.16s 5.16s 5.16s 0.12s 5.16s

=== 10x direkt zu worker-1 (167.235.37.40) ===
  5.13s 5.16s 5.11s 5.17s 5.13s 5.25s 5.19s 5.18s 0.12s 5.24s

=== 10x direkt zu worker-2 (168.119.213.133) ===
  0.17s 5.13s 5.11s 0.10s 5.11s 5.13s 0.11s 0.16s 5.14s 5.13s

The pattern wasn’t random.

  • Worker-1: nine out of ten requests stalled.
  • Worker-2: roughly fifty-fifty.

Same Service. Same Deployment. Same two pods. Same NetworkPolicy. The only thing that changes when curl hits worker-1 versus worker-2 is which ingress-nginx pod terminates the TLS, and which backend pod it picks for proxying.

ingress-nginx on a node sometimes proxies to a backend pod on the same node (fast — packet never leaves the host), and sometimes to a backend pod on the other node (slow — packet crosses the CNI overlay). Worker-2 was 50:50 because one of the two backend pods happened to live there. Worker-1 was 90:10 because neither pod ran locally and one routing path was reliably broken.

Cross-node pod-to-pod traffic on the cluster was tarpitting.

The NetworkPolicy

Here is what the chart had installed, condensed:

# default deny everything for pods in securek8s
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
---
# allow ingress from the ingress controller namespace
spec:
  podSelector: { matchLabels: { app.kubernetes.io/component: web } }
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
      ports:
        - { protocol: TCP, port: 8088 }
---
# allow egress for DNS only
spec:
  podSelector: { matchLabels: { app.kubernetes.io/component: web } }
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - { protocol: UDP, port: 53 }
        - { protocol: TCP, port: 53 }

This reads correct. Default-deny, allow the ingress controller in, allow CoreDNS out. It is the textbook NetworkPolicy for a static site. Lint-clean. Reviewer-clean.

It is also wrong, for a reason that doesn’t appear in the YAML.

Why hostNetwork breaks namespaceSelector

NetworkPolicy classifies traffic by the source pod’s labels. The CNI looks at the packet’s source IP, maps it to a pod, looks up that pod’s namespace and labels, and decides whether to allow it.

hostNetwork: true short-circuits that map. The ingress-nginx pod doesn’t have its own pod IP — it uses the node’s IP. From the CNI’s point of view, the incoming packet’s source is the host, not a pod. There is no pod object to look up. There is no ingress-nginx namespace label to match. The namespaceSelector: ingress-nginx rule never fires for this traffic.

Default-deny wins.

The reason it didn’t hang completely is because pod-local responses worked fine — connections that originated inside the same node were treated differently, and on worker-2 the same-node case dominated half the requests. The cross-node case, where ingress-nginx on one node proxies to a pod on the other, is the path that NetworkPolicy refuses.

The reason it didn’t fail, just stalled, is that the dropped path triggered ingress-nginx’s TCP retransmit. Default Linux tcp_retries2 plus exponential backoff gives up at roughly five seconds. Then it tries an alternate route. The request eventually succeeds. The client sees no error, just an awkward pause.

The fix

There are three working options, in increasing order of correctness.

1. Drop NetworkPolicy and rely on the rest of the stack. Pragmatic for a public static site that has nothing internal to protect. Pod hardening (non-root, read-only rootfs, all caps dropped, RuntimeDefault seccomp) and PSA restricted are doing the load-bearing security work. This was the temporary fix while I diagnosed.

2. Add an ipBlock rule for the worker subnets. Match the host-network traffic by source IP instead of by pod label:

ingress:
  - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: ingress-nginx
      - ipBlock:
          cidr: 167.235.37.0/24       # worker-1 subnet
      - ipBlock:
          cidr: 168.119.213.0/24      # worker-2 subnet
    ports:
      - { protocol: TCP, port: 8088 }

Functional, but brittle. New workers, IP renumbering, or a CIDR-block change all silently re-break the path. It’s also strictly less specific than the namespaceSelector rule was supposed to be.

3. Switch the CNI to Cilium. Cilium’s identity model classifies hostNetwork traffic by the pod’s labels even when the source IP is the node’s. With Cilium, the original namespaceSelector: ingress-nginx rule works as written. That’s a bigger change than the bug warrants, but it removes the entire class of problem.

There’s also a fourth option — run ingress-nginx without hostNetwork, behind a LoadBalancer Service — which makes the namespaceSelector approach work, at the cost of an extra hop and (in cloud) a per-load-balancer bill. Worth it on managed clusters; on a homelab with hostNetwork ingress, less so.

What I’d remember

  • NetworkPolicy plus hostNetwork pods is a known sharp edge, not a bug in your YAML. The YAML lints. The drop is invisible.
  • Five seconds in a curl is almost never application latency. It’s almost always TCP retransmit hitting tcp_retries2 = 5 or DNS resolver fallback.
  • Half-broken intermittent slowness is the fingerprint of a partial network drop, especially when “fast or 5s” appears with no values in between.
  • When you can, prove the pod is innocent before you investigate the network. kubectl port-forward to the Deployment took ninety seconds and saved an hour.

The site is fast now. The NetworkPolicy is off until the chart grows a Cilium- compatible variant or an ipBlock overlay. Both are a separate PR.