Kubernetes Resource Limits: Production Guide [2026]

Kubernetes resource requests tell the scheduler how much CPU and memory a pod needs for placement. Resource limits cap consumption: CPU limits throttle the container; memory limits trigger OOMKill. Set memory limits on every production container. CPU limits are situational — required on multi-tenant clusters, optional on single-tenant node pools.

Two of the most common production Kubernetes incidents trace back to the same root cause: someone guessed at resource limits.

The first is an OOMKilled loop at 2 AM because a developer set a 256Mi memory limit on a JVM service that allocates 512Mi at startup. The second is a cluster running at 15% utilization because someone padded every request with “just in case” headroom and the scheduler can’t fit more pods onto existing nodes.

Both problems are solvable. But you need to understand what requests and limits actually do, where the community disagrees, and how to set values you can defend with data.

This guide covers all of it. Verified against Kubernetes 1.35.3.

How Do Kubernetes Resource Requests and Limits Work?

Requests — What the Scheduler Sees

Resource requests tell the kube-scheduler how much CPU and memory a container needs to run. The scheduler uses these values to decide node placement — specifically, it finds a node where the sum of all scheduled pod requests doesn’t exceed the node’s allocatable capacity.

Two things to internalize:

Requests are not limits. A container can use more than its requested CPU (if capacity is available on the node) and will use more memory than requested until it hits its limit.
Requests determine QoS class. We’ll cover that below.

Limits — What the Kubelet Enforces

CPU limits are enforced via Linux CFS (Completely Fair Scheduler) cgroup quota. A container with a 400m CPU limit gets a 40ms time slice per 100ms scheduling period. When it exhausts that quota, the kernel throttles it until the next period — the process slows down, it doesn’t die.

Memory limits are enforced by the kernel OOM killer. When a container exceeds its memory limit, the kernel terminates the process. There is no throttling path for memory.

CPU vs. Memory: Compressible vs. Incompressible

This distinction is why advice for CPU limits and memory limits diverges:

CPU is compressible. Exceeding a limit causes throttling. The container slows down.
Memory is incompressible. Exceeding a limit causes OOMKill. The container dies.

Almost every resource management decision flows from this.

Should You Set CPU Limits in Kubernetes?

This is the most actively contested topic in Kubernetes resource management. Both camps have valid arguments — here’s what each one actually says.

The Case Against CPU Limits

Tim Hockin — one of Kubernetes’ co-creators, still at Google — has argued publicly that removing CPU limits is the right call for performance-sensitive workloads. The reasoning: HPA and VPA don’t react fast enough to handle sudden traffic spikes. With a CPU limit in place, a pod that suddenly needs 2x its normal CPU gets throttled to its ceiling — even if the node has plenty of free capacity sitting idle. You’ve built an artificial bottleneck that costs latency for zero actual resource savings.

His recommended workflow: benchmark at high-end load, start with request=limit, measure p95 latency, then raise or remove limits until SLO targets are met. The caveat he acknowledges: this requires disciplined re-benchmarking and “only a few apps” at Google have truly trustworthy benchmarks for this.

There’s also a real kernel-level problem involved. A bug in CFS quota enforcement (pre-Linux 5.4) caused containers on multi-core machines to be throttled even when they hadn’t exhausted their quota — on an 88-core machine, quota expiration overhead could waste 87ms of every 100ms period. If you’re running kernel versions before 5.4 and seeing unexplained throttling on lightly loaded pods, this is likely the cause (kubernetes/kubernetes#67577, fixed in kernel 5.4 via commit 512ac999 and follow-on patches).

The Case For CPU Limits

Milan Plzik at Grafana Labs laid out the predictability argument in an official Kubernetes blog post. The core problem: without limits, the actual CPU a pod gets depends entirely on co-tenants. At peak load across a shared cluster, a noisy neighbor takes the spare capacity your pod was relying on for burst behavior. Historical performance data stops being reliable because the operating environment isn’t stable.

His recommendation: either use fixed headroom (limits = 1.5–2x requests) or set requests = limits for Guaranteed QoS on critical services. This makes pod performance reproducible and isolates teams from each other.

The Recommendation

Set CPU requests. CPU limits are situational.

Single-tenant namespaces or trusted teams on dedicated node pools: Skip CPU limits. You get full burst access to node headroom when needed.
Multi-tenant clusters with shared node pools: Set CPU limits to prevent one team’s spike from consuming headroom others are counting on. A 2–3x headroom ratio is reasonable.
Always monitor throttling regardless of which path you choose: container_cpu_cfs_throttled_seconds_total / container_cpu_cfs_periods_total as a ratio. Above 25% on a normal traffic day means your limit is too tight.

Whatever you decide: document the policy and enforce it consistently with LimitRange (covered below) so individual developers aren’t making this call per-deployment.

Why Should You Always Set Memory Limits in Kubernetes?

Memory limits are not debated. Set them on every container in production.

What Happens When a Container Exceeds Its Memory Limit?

When a container crosses its memory limit, the Linux OOM killer terminates it with SIGKILL (exit code 137). In pod status you’ll see:

Last State: Terminated
  Reason:    OOMKilled
  Exit Code: 137

The kubelet restarts the container if the restart policy permits, but repeated OOMKills push the pod into CrashLoopBackOff with exponential backoff — at some point you’re adding 5-minute delays to every restart.

The fix is rarely “remove the limit.” The right path is one of:

The workload genuinely needs more memory → increase the limit
There’s a memory leak → fix the code
An unbounded cache → add a size cap

Memory QoS with cgroup v2

On Kubernetes 1.22+ nodes using cgroup v2 (now the default on modern distros), enabling MemoryQoS sets memory.min = requests (kernel won’t reclaim below this) and memory.high = limits (triggers memory pressure handling before OOM). This gives pods a soft landing before hard termination. On cgroup v1, there’s no soft ceiling — it’s limit or kill.

How Do You Calculate the Right Kubernetes Memory Limit?

memory_request = P95 memory working set (7-day window) + 10–20% headroom
memory_limit   = memory_request × 1.25 to 1.5

For JVM workloads (Java, Spring Boot, Scala), account for the full footprint: heap + metaspace + code cache + native memory. Set -Xmx to ~75% of your memory limit. A service with a 1Gi limit should use -Xmx 768m.

What Are Kubernetes QoS Classes and How Do They Affect Eviction?

Kubernetes assigns every pod a QoS class at creation time based on its resource spec. This class determines eviction order when a node runs low on resources.

The Three Classes

Guaranteed: Every container has CPU and memory requests AND limits, and requests = limits. Last to be evicted. Can use exclusive CPUs with the static CPU manager policy.

Burstable: At least one container has some resource request or limit, but the pod doesn’t meet Guaranteed criteria. Middle eviction priority.

BestEffort: No containers have any requests or limits. Evicted first.

Eviction Order

Under node memory pressure, the kubelet evicts in this sequence:

BestEffort pods
Burstable pods consuming beyond their requests
Guaranteed pods (only if the kubelet has no lower-priority options)

QoS Class Is Immutable

The QoS class is set at pod creation and cannot change. An in-place resource resize (see the section on Kubernetes 1.35 features below) that would change the QoS class is rejected by the API server. If you want to move a Burstable pod to Guaranteed, you need a rolling update, not a resize.

Practical guidance: Use Guaranteed QoS for anything that pages humans (databases, payment services, auth). Use Burstable for web servers and workers where some throttling under extreme pressure is acceptable.

How Do LimitRange and ResourceQuota Enforce Resource Governance in Kubernetes?

Platform teams need to enforce resource hygiene without auditing every deployment. LimitRange and ResourceQuota are the mechanisms.

LimitRange: Per-Container Defaults and Constraints

LimitRange sets per-container defaults and enforces min/max bounds at pod admission time. Pods without explicit resource specs get defaults applied automatically. Pods violating min/max are rejected with HTTP 403.

apiVersion: v1
kind: LimitRange
metadata:
  name: resource-constraints
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "2"
      memory: 1Gi
    min:
      cpu: 50m
      memory: 64Mi

What this enforces:

Pods submitted without resource specs get defaultRequest and default applied automatically
Pods requesting more than max or less than min are rejected with HTTP 403
No developer needs to know the right values from scratch — the platform enforces them

Gotcha: LimitRange doesn’t automatically fix contradictions you introduce. If a pod explicitly requests 700m CPU against this config’s 500m default limit, the admission controller rejects it because request > limit. The developer also needs to specify limits.cpu: "1" or higher.

ResourceQuota: Namespace-Level Budgets

ResourceQuota caps aggregate resource consumption across all pods in a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: namespace-budget
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"

LimitRange + ResourceQuota together is the standard pattern for multi-team clusters: LimitRange keeps individual pods sane, ResourceQuota prevents any one namespace from hoarding cluster capacity.

How Do You Right-Size Kubernetes Resource Requests and Limits?

Step 1 — Observe Current Usage

Start with a snapshot:

kubectl top pods -n production --sort-by=cpu
kubectl top pods -n production --sort-by=memory

This shows current utilization, not peak. For peak, you need Prometheus.

Step 2 — Pull P95 Data from Prometheus

# P95 CPU usage over 7 days — use this for CPU requests
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total{
    namespace="production",
    container!=""
  }[5m])[7d:5m]
)

# P95 memory working set over 7 days — use this for memory requests
quantile_over_time(0.95,
  container_memory_working_set_bytes{
    namespace="production",
    container!=""
  }[7d]
)

# CPU throttle ratio — signals whether existing limits are too tight
rate(container_cpu_cfs_throttled_seconds_total{namespace="production"}[5m])
  /
rate(container_cpu_cfs_periods_total{namespace="production"}[5m])

7 days covers a full business week including peak periods. If your traffic has monthly spikes (end-of-month billing runs, scheduled batch jobs), extend the window to 30 days.

Step 3 — Validate With VPA in Off Mode

Before committing values to production, run VPA in recommendation-only mode and let it observe for 24–48 hours:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"

kubectl describe vpa my-app-vpa -n production
# Check the Recommendation section for lowerBound, target, upperBound

VPA’s recommendations are based on observed usage history and serve as a sanity check against your Prometheus-derived values. Where they diverge significantly, investigate before assuming either is correct.

How Do You Use Vertical Pod Autoscaler (VPA) in Production?

VPA’s autoscaling.k8s.io/v1 API is stable. The right update mode depends on your Kubernetes version and workload tolerance for disruption.

Mode	Behavior	Use When
`Off`	Recommendations only, no changes	Initial observation and validation
`Initial`	Sets requests at pod creation only	Stateful workloads where mid-life resize is too risky
`Recreate`	Evicts pods to apply changes, respects PDB	Stateless workloads on 1.33 and earlier
`InPlaceOrRecreate` (beta)	Tries in-place resize first, evicts if needed	Any workload on 1.35+
`Auto`	Deprecated — do not use	—

VPA + HPA: The Conflict and the Fix

VPA adjusts requests (the denominator in HPA’s utilization percentage). If both target CPU utilization, they fight each other in a feedback loop. Two working patterns:

Pattern 1 — Split by resource type: VPA manages memory, HPA scales on CPU.

# In the VPA spec
resourcePolicy:
  containerPolicies:
  - containerName: "*"
    controlledResources: ["memory"]

Pattern 2 — HPA targets absolute value, not utilization:

# In the HPA spec
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: AverageValue   # Not Utilization
      averageValue: 500m

With AverageValue, HPA’s scaling decision is based on raw CPU usage, not a percentage of requests — so VPA can freely adjust requests without disrupting HPA’s math.

New in 2025–2026: In-Place Pod Resize and Pod-Level Resources

In-Place Pod Resize (GA in Kubernetes 1.35)

In-place pod resize graduated to stable in Kubernetes 1.35 (December 2025). Before 1.35, changing a pod’s resources required deleting and recreating it. Now you can patch resources on a running pod:

kubectl patch pod my-app \
  --subresource resize \
  --type=json \
  -p='[{"op":"replace","path":"/spec/containers/0/resources/requests/cpu","value":"500m"}]'

The actual applied resources are reflected in status.containerStatuses[*].resources — the spec shows what was requested; the status shows what the kubelet actually applied. Memory limit decreases are now permitted: the kubelet checks that current usage is below the new limit before applying.

Constraints: QoS-class-changing resizes are still rejected. When a node lacks capacity for the resize, Kubernetes 1.35 queues deferred resizes ordered by PriorityClass → QoS class → time waiting.

This feature is what makes VPA’s InPlaceOrRecreate mode viable for stateful workloads that couldn’t previously tolerate the pod deletion VPA requires.

Pod-Level Resources (Beta in Kubernetes 1.34)

Pod-level resource specs let you set a shared CPU and memory budget for the whole pod instead of per container:

apiVersion: v1
kind: Pod
metadata:
  name: multi-container-app
spec:
  resources:
    requests:
      cpu: "1"
      memory: 512Mi
    limits:
      cpu: "2"
      memory: 1Gi
  containers:
  - name: app
    image: app:latest
  - name: sidecar
    image: envoy:latest

The use case: multi-container pods where the main app and sidecar don’t peak simultaneously. Instead of over-allocating per container, you give the pod a shared budget the containers compete for based on actual demand.

Current limitation: in-place resize of pod-level resources is not supported in 1.34. Not available on Windows pods.

How Do You Diagnose Common Kubernetes Resource Limit Failures?

OOMKilled Pods

Symptoms: Pod restarts with exit code 137.

Diagnose:

kubectl describe pod <name> -n production
# Look for: Last State: Terminated, Reason: OOMKilled

kubectl top pod <name> -n production --containers
# Compare current usage to configured limits

Fix: Increase memory limit. Then identify the root cause — don’t just keep bumping the limit without understanding whether it’s a leak, unbounded growth, or genuinely undersized allocation.

CPU Throttling

Symptoms: High latency, slow responses, low CPU utilization readings. The pod isn’t dying, just slow.

Diagnose:

rate(container_cpu_cfs_throttled_periods_total{namespace="production"}[5m])
  /
rate(container_cpu_cfs_periods_total{namespace="production"}[5m])

Throttle ratio > 25% is a problem. If you’re on kernel < 5.4 and see high throttle ratios on pods well below their limit, you’re likely hitting the CFS quota bug (kubernetes/kubernetes#67577).

Fix: Raise or remove CPU limits. Upgrade to Linux 5.4+ if on an affected kernel.

Pods Stuck in Pending

Symptoms: Pods remain in Pending indefinitely.

Diagnose:

kubectl describe pod <name> -n production
# Events section — look for:
# Warning  FailedScheduling  0/5 nodes available: Insufficient cpu
# Warning  FailedScheduling  exceeded quota: requests.cpu

Causes and fixes:

Requests exceed any single node’s allocatable capacity → right-size requests using the Prometheus methodology above, or add larger nodes
Namespace ResourceQuota exhausted → expand quota or clean up unused pods/deployments
No nodes matching affinity/toleration rules → separate scheduling problem, but check this too

What Are the Right Resource Specs for Common Kubernetes Workload Types?

Starting points calibrated for common patterns. Adjust based on your P95 Prometheus data.

Web server (nginx, Node.js API):

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    memory: 256Mi
    # CPU limit omitted — allows burst access to idle node capacity

Background worker (queue consumer, batch processor):

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    memory: 512Mi

JVM application (Java, Spring Boot, Kotlin):

resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: "2"       # JVM startup and GC bursts require burst capacity
    memory: 1536Mi  # Set -Xmx 1152m (75% of limit) in your JVM flags

Sidecar (Envoy proxy, Fluent Bit, metrics exporter):

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    memory: 128Mi

Production Checklist

Before a workload ships to production:

Requests

CPU request set on every container
Memory request set on every container
Values based on P95 Prometheus data, not estimates

Limits

Memory limit set on every container
CPU limit policy documented (deliberately omitted or set with documented headroom ratio)
JVM workloads have -Xmx set to ~75% of memory limit

QoS class

Critical workloads (databases, auth, payment) use Guaranteed QoS (requests = limits)
Actual QoS class confirmed: kubectl get pod <name> -o=jsonpath='{.status.qosClass}'

Namespace governance

LimitRange applied with sensible defaults and max values
ResourceQuota applied to cap namespace-level consumption

Monitoring

Alert on OOMKill events (kube_pod_container_status_last_terminated_reason="OOMKilled")
Alert on CPU throttle ratio > 25%
VPA running in Off mode to surface sizing drift over time

Autoscaling

If using VPA + HPA: split resource ownership (VPA on memory, HPA on CPU) or switch HPA to AverageValue target
VPA update mode chosen intentionally (InPlaceOrRecreate on 1.35+ clusters, Recreate on older)

All technical claims verified against Kubernetes 1.35.3. Sources: kubernetes.io/docs/concepts/configuration/manage-resources-containers, kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits, kubernetes.io/blog/2025/12/19/kubernetes-v1-35-in-place-pod-resize-ga, kubernetes.io/blog/2025/09/22/kubernetes-v1-34-pod-level-resources, Tim Hockin HN thread (item 24381813), kubernetes/kubernetes#67577, kubernetes/autoscaler#2939.