Troubleshooting Common Kubernetes Performance Bottlenecks

Learn to systematically diagnose and resolve common Kubernetes performance bottlenecks, including CPU throttling, memory OOMKills, and scheduling delays. This guide provides actionable commands and best practices for tuning resource requests, optimizing HPA scaling, and identifying underlying cluster constraints to ensure optimal application performance.

Troubleshooting Common Kubernetes Performance Bottlenecks

Kubernetes performance problems rarely announce themselves as "Kubernetes is slow." You see a rollout that hangs, an API that suddenly returns 5xx errors, a queue that stops draining, or pods that look healthy while users complain about latency. The cluster is only one part of that story, but it is the part you can inspect with a consistent set of commands.

The trick is to avoid jumping straight to node size or replica count. First decide what kind of bottleneck you have: CPU throttling, memory pressure, scheduling delay, slow scaling, network latency, storage latency, or an application that is simply doing more work than expected.

Phase 1: Identifying the Symptoms

Before diving into specific components, clearly define the observed performance degradation. Common symptoms often fall into one of these categories:

  • Slow Deployments/Updates: Pod creation takes an excessive amount of time, or rolling updates stall.
  • Unresponsive Applications: Pods are running but failing to respond to application-level traffic (e.g., high latency, 5xx errors).
  • High Resource Spikes: Unexplained CPU or memory utilization spikes across nodes or specific deployments.
  • Scheduling Delays: New pods remain in the Pending state indefinitely.

Phase 2: Diagnosing Resource Constraints (CPU and Memory)

Resource mismanagement is the most frequent cause of Kubernetes performance issues. Improperly set requests and limits lead to throttling or OOMKills.

1. Checking Resource Utilization and Limits

Start by inspecting the resource allocations for the affected application using kubectl describe and kubectl top.

Actionable Check: Compare the requests and limits against actual usage reported by metrics servers.

# Get resource usage for all pods in a namespace
kubectl top pods -n <namespace>

# Examine resource requests/limits for a specific pod
kubectl describe pod <pod-name> -n <namespace>

Also inspect the owning workload so you understand whether the problem affects one pod or the whole Deployment:

kubectl get deploy <deployment-name> -n <namespace> -o yaml
kubectl get pods -n <namespace> -l app=<label> -o wide

If only one pod is slow and it sits on a different node from the others, node-level pressure is more likely. If every replica is slow, the resource settings, downstream dependencies, or application behavior deserve more attention.

2. CPU Throttling

If a container's CPU usage repeatedly hits its defined limit, the kernel will throttle it, leading to severe latency spikes even if the node itself has available capacity. This is often mistaken for general CPU starvation.

Diagnosis Tip: Look for high latency responses, even when kubectl top doesn't show 100% CPU usage on the node. Throttling happens per container.

For deeper confirmation, use your metrics system if it exposes container CPU throttling metrics. In Prometheus-based setups, teams often watch metrics such as throttled CPU periods alongside request latency. Raw CPU usage alone can hide throttling because a container can be throttled before it ever appears to use a full node core.

Resolution:

  • Increase the CPU limit if the workload legitimately requires more processing power.
  • If the application is busy-waiting, optimize the application code rather than simply increasing limits.
  • Consider removing CPU limits for some latency-sensitive services while keeping CPU requests, if that matches your platform policy. This avoids hard throttling while still giving the scheduler useful placement information.

3. Memory Pressure and OOMKills

If a container exceeds its memory limit, Kubernetes initiates an Out-Of-Memory (OOM) kill, restarting the container repeatedly.

Diagnosis: Check the pod status for frequent restarts (check RESTARTS column in kubectl get pods) and examine logs for OOMKilled events.

# Check recent events for OOMKills
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>

Resolution:

  • If OOMKills are frequent, immediately increase the memory limit.
  • For long-term fixes, profile the application to find and fix memory leaks or reduce heap size.

Memory behaves differently from CPU. CPU can be throttled and the process keeps running slowly. Memory limit breaches usually end with the process being killed. That makes memory issues look like reliability incidents: restarts, dropped connections, cold caches, and failed in-flight requests.

Best Practice: Set Requests Wisely. Ensure that resource requests are set reasonably close to the expected minimum usage. If requests are too low, the scheduler might overcommit the node, leading to contention when all pods hit their demands simultaneously.

Phase 3: Investigating Scheduling Bottlenecks

When pods remain in the Pending state, the issue lies in the scheduler's inability to find a suitable node.

1. Analyzing Pending Pods

Use kubectl describe pod on a pending pod to read the Events section. This section usually contains a clear explanation for the failure to schedule.

Common Scheduler Messages:

  • 0/3 nodes are available: 3 Insufficient cpu. (Node capacity issue)
  • 0/3 nodes are available: 3 node(s) had taint {dedicated: infra}, that the pod didn't tolerate. (Taints/Tolerations mismatch)
  • 0/3 nodes are available: 1 node(s) had taint {NoSchedule: true}, that the pod didn't tolerate. (Node pressure or maintenance)

2. Cluster Resource Saturation

If scheduling is delayed due to lack of CPU/Memory, the cluster lacks sufficient aggregate capacity.

Resolution:

  • Add more nodes to the cluster.
  • Verify that node utilization is not artificially high due to misconfigured resource requests (see Phase 2).
  • Use Cluster Autoscaler (CA) if running on cloud providers to dynamically add nodes when pending pods accumulate.

If Cluster Autoscaler is enabled but nodes are not being added, read its logs before assuming the cloud provider is broken. Autoscaler may refuse to add nodes because node groups hit their maximum size, the pending pod has constraints no node group can satisfy, or quotas prevent new instances.

Phase 4: Performance Issues in Scaling Mechanisms

Automated scaling should react quickly, but misconfigurations in Horizontal Pod Autoscalers (HPA) or Vertical Pod Autoscalers (VPA) can cause issues.

1. Horizontal Pod Autoscaler (HPA) Lag

HPA relies on the Metrics Server to report accurate CPU/Memory utilization or custom metrics.

Diagnosis Steps:

  1. Verify Metrics Server Health: Ensure the Metrics Server is running and accessible.
    kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
    
  2. Check HPA Status: Inspect the HPA configuration and recent events.
    kubectl describe hpa <hpa-name> -n <namespace>
    
    Look for messages indicating if the metrics source is unavailable or if the scaling decision loop is functioning.

Bottlenecks: If custom metrics are used, ensure the external metric provider is functioning correctly and reporting data often enough for the HPA to make useful decisions.

HPA is reactive. It does not know a traffic spike is coming unless your metric reflects it. For workloads with sudden bursts, you may need higher minimum replicas, faster custom metrics, queue-based scaling, or pre-scaling before known events.

2. Vertical Pod Autoscaler (VPA) Interactions

While VPA automatically adjusts resource requests, it can cause performance instability during its adjustment phase if it frequently restarts or resizes pods, especially for stateful applications that cannot tolerate restarts.

Recommendation: Use VPA in Recommender mode first, or use the updateMode: "Off" to only observe recommendations without automatic application, mitigating unnecessary resizing disruptions.

Phase 5: Network and Storage Performance

When compute resources look fine, networking or persistent storage might be the choke point.

1. CNI (Container Network Interface) Issues

If communication between pods (especially across nodes) is slow or failing intermittently, the CNI plugin might be overloaded or misconfigured.

Troubleshooting:

  • Check the logs of the CNI daemonset pods (e.g., Calico, Flannel).
  • Test basic connectivity using ping or curl between pods on different nodes.

2. Persistent Volume (PV) Latency

Applications relying heavily on disk I/O (databases, logging systems) will suffer if the underlying Persistent Volume latency is high.

Actionable Check: Confirm the provisioner type (e.g., AWS EBS gp3 vs. io1) and verify that the volume meets the required IOPS/throughput specifications.

Warning on Storage: Never run high-throughput databases directly on standard hostPath volumes without understanding the underlying disk performance characteristics. Use managed cloud storage solutions or high-performance local storage provisioners for demanding workloads.

Node-Level Bottlenecks

Sometimes every pod on a node gets slower at once. That is your cue to stop staring at one Deployment and inspect the node.

kubectl describe node <node-name>
kubectl top node <node-name>
kubectl get pods --all-namespaces -o wide | grep <node-name>

Look for MemoryPressure, DiskPressure, and PIDPressure conditions. Disk pressure is easy to overlook because the application symptom may be slow startup, image pull failures, or evictions rather than an obvious disk error.

On the node itself, if you have access, check:

df -h
iostat -x 1
free -h
journalctl -u kubelet --since "30 minutes ago"

Managed Kubernetes services may limit direct node access, but the same idea still applies: use provider metrics, kubelet events, and node conditions to decide whether the node is the shared bottleneck.

Control Plane and API Pressure

Most application latency is not caused by the Kubernetes API server. Your web request usually does not call the API server on every user request. But control plane pressure can hurt operational performance: slow rollouts, delayed scheduling, slow endpoint updates, or controllers falling behind.

Symptoms include:

  • kubectl commands are slow across the cluster.
  • Deployments take longer than usual to create pods.
  • Controllers lag behind desired state.
  • Events show repeated API timeouts.

Check whether the problem affects normal application traffic or cluster operations. If only rollouts and scheduling are slow, look at API server health, controller manager behavior, admission webhooks, and etcd health in clusters where you manage the control plane.

Admission webhooks deserve special attention. A slow or unavailable webhook can delay pod creation even when nodes have plenty of capacity. If a rollout hangs at creation time and events mention webhook calls, investigate the webhook service before resizing nodes.

A Practical Troubleshooting Order

Start with the user-visible symptom:

  • Slow HTTP requests: compare app latency, CPU throttling, memory restarts, downstream latency, and network path.
  • Slow pod startup: check image pull time, scheduling events, volume attach time, and init containers.
  • Pods pending: check requests, node capacity, taints, affinity, quotas, and autoscaler limits.
  • Periodic latency spikes: check CPU throttling, garbage collection, noisy neighbors, storage latency, and HPA scale timing.
  • Random restarts: check OOMKilled, liveness probes, node pressure, and application logs from the previous container.

Then prove or eliminate one layer at a time. For example, if latency spikes line up exactly with CPU throttling, you have a strong lead. If latency spikes happen while CPU, memory, network, and storage all look calm, the bottleneck may be inside the application or a downstream service outside Kubernetes.

Request and Limit Tuning Without Guesswork

Bad resource settings create many performance problems:

  • Requests too low: the scheduler packs too many busy pods onto the same node.
  • Requests too high: pods stay pending even though actual usage is modest.
  • CPU limits too low: latency-sensitive apps get throttled.
  • Memory limits too low: containers get killed instead of slowing down.
  • No requests at all: scheduling becomes less predictable, and critical workloads may compete badly with noisy neighbors.

Use recent production metrics as a starting point, then leave headroom for normal spikes. For Java, Node.js, Go, Python, and database-like workloads, memory behavior can be very different, so avoid copying limits from one service to another just because the container image size looks similar.

Next Steps

The best Kubernetes performance investigations are boring in a good way: define the symptom, check the pod, check the node, check scaling, then check network and storage. kubectl describe and kubectl top are only the start, but they usually tell you which direction is worth following.

  1. Implement robust Resource Quotas to prevent noisy neighbors from starving critical applications.
  2. Regularly review pod restart counts to catch subtle OOM or failing application behavior early.
  3. Utilize Prometheus/Grafana dashboards specifically tracking CPU throttling metrics, not just raw usage.