When a security incident occurs in your Kubernetes cluster, having a practiced response plan is essential. Here’s a playbook for common scenarios.
Preparation
Before incidents happen:
- Document your architecture
- Set up monitoring and alerting
- Practice incident response
- Prepare forensic tools
Common Incident Types
Compromised Pod
Detection signals:
- Unusual network connections
- Unexpected processes
- Cryptomining activity
- Data exfiltration attempts
Response steps:
BASH
# 1. Capture pod state
kubectl get pod compromised-pod -o yaml > pod-state.yaml
# 2. Capture logs
kubectl logs compromised-pod > pod-logs.txt
# 3. Network isolate (if using Cilium)
kubectl annotate pod compromised-pod io.cilium.policy.enforcement=default
# 4. Create forensic snapshot
kubectl exec compromised-pod -- tar czf /tmp/evidence.tar.gz /proc /etc
# 5. Delete and replace
kubectl delete pod compromised-podStolen Credentials
Detection signals:
- API calls from unexpected IPs
- Access during unusual hours
- Privilege escalation attempts
Response steps:
BASH
# 1. Identify the compromised credential
kubectl get secrets -A
kubectl auth can-i --list --as=system:serviceaccount:ns:sa
# 2. Rotate credentials
kubectl delete secret compromised-secret
kubectl create secret generic new-secret --from-literal=key=newvalue
# 3. Review audit logs
grep "username\":\"compromised-user" /var/log/kubernetes/audit.log
# 4. Revoke access
kubectl delete rolebinding suspicious-bindingMalicious Container Image
Response:
BASH
# 1. Find all pods using the image
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].image | contains("malicious-image"))'
# 2. Block the image
# Add policy to reject the image
# 3. Delete affected pods
kubectl delete pods -l app=affected-app
# 4. Scan other images
trivy image --severity CRITICAL myregistry/myimageForensic Tools
Essential tools to have ready:
- kubectl-forensic: Capture pod state
- inspektor-gadget: eBPF-based investigation
- tracee: Runtime security forensics
Post-Incident
- Conduct post-mortem
- Document lessons learned
- Update runbooks
- Implement preventive controls
A well-practiced response minimizes damage and recovery time.