Kubernetes resource requests tell the scheduler how much CPU and memory a pod needs for placement. Resource limits cap consumption: CPU limits throttle the container; memory limits trigger OOMKill. Set memory limits on every production container. CPU limits are situational — required on multi-tenant clusters, optional on single-tenant node pools.
Two of the most common production Kubernetes incidents trace back to the same root cause: someone guessed at resource limits.
The first is an OOMKilled loop at 2 AM because a developer set a 256Mi memory limit on a JVM service that allocates 512Mi at startup. The second is a cluster running at 15% utilization because someone padded every request with “just in case” headroom and the scheduler can’t fit more pods onto existing nodes.
Both problems are solvable. But you need to understand what requests and limits actually do, where the community disagrees, and how to set values you can defend with data.
This guide covers all of it. Verified against Kubernetes 1.35.3.
How Do Kubernetes Resource Requests and Limits Work?
Requests — What the Scheduler Sees
Resource requests tell the kube-scheduler how much CPU and memory a container needs to run. The scheduler uses these values to decide node placement — specifically, it finds a node where the sum of all scheduled pod requests doesn’t exceed the node’s allocatable capacity.
Two things to internalize:
- Requests are not limits. A container can use more than its requested CPU (if capacity is available on the node) and will use more memory than requested until it hits its limit.
- Requests determine QoS class. We’ll cover that below.
Limits — What the Kubelet Enforces
CPU limits are enforced via Linux CFS (Completely Fair Scheduler) cgroup quota. A container with a 400m CPU limit gets a 40ms time slice per 100ms scheduling period. When it exhausts that quota, the kernel throttles it until the next period — the process slows down, it doesn’t die.
Memory limits are enforced by the kernel OOM killer. When a container exceeds its memory limit, the kernel terminates the process. There is no throttling path for memory.
CPU vs. Memory: Compressible vs. Incompressible
This distinction is why advice for CPU limits and memory limits diverges:
- CPU is compressible. Exceeding a limit causes throttling. The container slows down.
- Memory is incompressible. Exceeding a limit causes OOMKill. The container dies.
Almost every resource management decision flows from this.
Should You Set CPU Limits in Kubernetes?
This is the most actively contested topic in Kubernetes resource management. Both camps have valid arguments — here’s what each one actually says.
The Case Against CPU Limits
Tim Hockin — one of Kubernetes’ co-creators, still at Google — has argued publicly that removing CPU limits is the right call for performance-sensitive workloads. The reasoning: HPA and VPA don’t react fast enough to handle sudden traffic spikes. With a CPU limit in place, a pod that suddenly needs 2x its normal CPU gets throttled to its ceiling — even if the node has plenty of free capacity sitting idle. You’ve built an artificial bottleneck that costs latency for zero actual resource savings.
His recommended workflow: benchmark at high-end load, start with request=limit, measure p95 latency, then raise or remove limits until SLO targets are met. The caveat he acknowledges: this requires disciplined re-benchmarking and “only a few apps” at Google have truly trustworthy benchmarks for this.
There’s also a real kernel-level problem involved. A bug in CFS quota enforcement (pre-Linux 5.4) caused containers on multi-core machines to be throttled even when they hadn’t exhausted their quota — on an 88-core machine, quota expiration overhead could waste 87ms of every 100ms period. If you’re running kernel versions before 5.4 and seeing unexplained throttling on lightly loaded pods, this is likely the cause (kubernetes/kubernetes#67577, fixed in kernel 5.4 via commit 512ac999 and follow-on patches).
The Case For CPU Limits
Milan Plzik at Grafana Labs laid out the predictability argument in an official Kubernetes blog post. The core problem: without limits, the actual CPU a pod gets depends entirely on co-tenants. At peak load across a shared cluster, a noisy neighbor takes the spare capacity your pod was relying on for burst behavior. Historical performance data stops being reliable because the operating environment isn’t stable.
His recommendation: either use fixed headroom (limits = 1.5–2x requests) or set requests = limits for Guaranteed QoS on critical services. This makes pod performance reproducible and isolates teams from each other.
The Recommendation
Set CPU requests. CPU limits are situational.
- Single-tenant namespaces or trusted teams on dedicated node pools: Skip CPU limits. You get full burst access to node headroom when needed.
- Multi-tenant clusters with shared node pools: Set CPU limits to prevent one team’s spike from consuming headroom others are counting on. A 2–3x headroom ratio is reasonable.
- Always monitor throttling regardless of which path you choose:
container_cpu_cfs_throttled_seconds_total/container_cpu_cfs_periods_totalas a ratio. Above 25% on a normal traffic day means your limit is too tight.
Whatever you decide: document the policy and enforce it consistently with LimitRange (covered below) so individual developers aren’t making this call per-deployment.
Why Should You Always Set Memory Limits in Kubernetes?
Memory limits are not debated. Set them on every container in production.
What Happens When a Container Exceeds Its Memory Limit?
When a container crosses its memory limit, the Linux OOM killer terminates it with SIGKILL (exit code 137). In pod status you’ll see:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
The kubelet restarts the container if the restart policy permits, but repeated OOMKills push the pod into CrashLoopBackOff with exponential backoff — at some point you’re adding 5-minute delays to every restart.
The fix is rarely “remove the limit.” The right path is one of:
- The workload genuinely needs more memory → increase the limit
- There’s a memory leak → fix the code
- An unbounded cache → add a size cap
Memory QoS with cgroup v2
On Kubernetes 1.22+ nodes using cgroup v2 (now the default on modern distros), enabling MemoryQoS sets memory.min = requests (kernel won’t reclaim below this) and memory.high = limits (triggers memory pressure handling before OOM). This gives pods a soft landing before hard termination. On cgroup v1, there’s no soft ceiling — it’s limit or kill.
How Do You Calculate the Right Kubernetes Memory Limit?
memory_request = P95 memory working set (7-day window) + 10–20% headroom
memory_limit = memory_request × 1.25 to 1.5
For JVM workloads (Java, Spring Boot, Scala), account for the full footprint: heap + metaspace + code cache + native memory. Set -Xmx to ~75% of your memory limit. A service with a 1Gi limit should use -Xmx 768m.
What Are Kubernetes QoS Classes and How Do They Affect Eviction?
Kubernetes assigns every pod a QoS class at creation time based on its resource spec. This class determines eviction order when a node runs low on resources.
The Three Classes
Guaranteed: Every container has CPU and memory requests AND limits, and requests = limits. Last to be evicted. Can use exclusive CPUs with the static CPU manager policy.
Burstable: At least one container has some resource request or limit, but the pod doesn’t meet Guaranteed criteria. Middle eviction priority.
BestEffort: No containers have any requests or limits. Evicted first.
Eviction Order
Under node memory pressure, the kubelet evicts in this sequence:
- BestEffort pods
- Burstable pods consuming beyond their requests
- Guaranteed pods (only if the kubelet has no lower-priority options)
QoS Class Is Immutable
The QoS class is set at pod creation and cannot change. An in-place resource resize (see the section on Kubernetes 1.35 features below) that would change the QoS class is rejected by the API server. If you want to move a Burstable pod to Guaranteed, you need a rolling update, not a resize.
Practical guidance: Use Guaranteed QoS for anything that pages humans (databases, payment services, auth). Use Burstable for web servers and workers where some throttling under extreme pressure is acceptable.
How Do LimitRange and ResourceQuota Enforce Resource Governance in Kubernetes?
Platform teams need to enforce resource hygiene without auditing every deployment. LimitRange and ResourceQuota are the mechanisms.
LimitRange: Per-Container Defaults and Constraints
LimitRange sets per-container defaults and enforces min/max bounds at pod admission time. Pods without explicit resource specs get defaults applied automatically. Pods violating min/max are rejected with HTTP 403.
apiVersion: v1
kind: LimitRange
metadata:
name: resource-constraints
namespace: production
spec:
limits:
- type: Container
default:
cpu: 500m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
max:
cpu: "2"
memory: 1Gi
min:
cpu: 50m
memory: 64Mi
What this enforces:
- Pods submitted without resource specs get
defaultRequestanddefaultapplied automatically - Pods requesting more than
maxor less thanminare rejected with HTTP 403 - No developer needs to know the right values from scratch — the platform enforces them
Gotcha: LimitRange doesn’t automatically fix contradictions you introduce. If a pod explicitly requests 700m CPU against this config’s 500m default limit, the admission controller rejects it because request > limit. The developer also needs to specify limits.cpu: "1" or higher.
ResourceQuota: Namespace-Level Budgets
ResourceQuota caps aggregate resource consumption across all pods in a namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: namespace-budget
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
LimitRange + ResourceQuota together is the standard pattern for multi-team clusters: LimitRange keeps individual pods sane, ResourceQuota prevents any one namespace from hoarding cluster capacity.
How Do You Right-Size Kubernetes Resource Requests and Limits?
Step 1 — Observe Current Usage
Start with a snapshot:
kubectl top pods -n production --sort-by=cpu
kubectl top pods -n production --sort-by=memory
This shows current utilization, not peak. For peak, you need Prometheus.
Step 2 — Pull P95 Data from Prometheus
# P95 CPU usage over 7 days — use this for CPU requests
quantile_over_time(0.95,
rate(container_cpu_usage_seconds_total{
namespace="production",
container!=""
}[5m])[7d:5m]
)
# P95 memory working set over 7 days — use this for memory requests
quantile_over_time(0.95,
container_memory_working_set_bytes{
namespace="production",
container!=""
}[7d]
)
# CPU throttle ratio — signals whether existing limits are too tight
rate(container_cpu_cfs_throttled_seconds_total{namespace="production"}[5m])
/
rate(container_cpu_cfs_periods_total{namespace="production"}[5m])
7 days covers a full business week including peak periods. If your traffic has monthly spikes (end-of-month billing runs, scheduled batch jobs), extend the window to 30 days.
Step 3 — Validate With VPA in Off Mode
Before committing values to production, run VPA in recommendation-only mode and let it observe for 24–48 hours:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off"
kubectl describe vpa my-app-vpa -n production
# Check the Recommendation section for lowerBound, target, upperBound
VPA’s recommendations are based on observed usage history and serve as a sanity check against your Prometheus-derived values. Where they diverge significantly, investigate before assuming either is correct.
How Do You Use Vertical Pod Autoscaler (VPA) in Production?
VPA’s autoscaling.k8s.io/v1 API is stable. The right update mode depends on your Kubernetes version and workload tolerance for disruption.
| Mode | Behavior | Use When |
|---|---|---|
Off | Recommendations only, no changes | Initial observation and validation |
Initial | Sets requests at pod creation only | Stateful workloads where mid-life resize is too risky |
Recreate | Evicts pods to apply changes, respects PDB | Stateless workloads on 1.33 and earlier |
InPlaceOrRecreate (beta) | Tries in-place resize first, evicts if needed | Any workload on 1.35+ |
Auto | Deprecated — do not use | — |
VPA + HPA: The Conflict and the Fix
VPA adjusts requests (the denominator in HPA’s utilization percentage). If both target CPU utilization, they fight each other in a feedback loop. Two working patterns:
Pattern 1 — Split by resource type: VPA manages memory, HPA scales on CPU.
# In the VPA spec
resourcePolicy:
containerPolicies:
- containerName: "*"
controlledResources: ["memory"]
Pattern 2 — HPA targets absolute value, not utilization:
# In the HPA spec
metrics:
- type: Resource
resource:
name: cpu
target:
type: AverageValue # Not Utilization
averageValue: 500m
With AverageValue, HPA’s scaling decision is based on raw CPU usage, not a percentage of requests — so VPA can freely adjust requests without disrupting HPA’s math.
New in 2025–2026: In-Place Pod Resize and Pod-Level Resources
In-Place Pod Resize (GA in Kubernetes 1.35)
In-place pod resize graduated to stable in Kubernetes 1.35 (December 2025). Before 1.35, changing a pod’s resources required deleting and recreating it. Now you can patch resources on a running pod:
kubectl patch pod my-app \
--subresource resize \
--type=json \
-p='[{"op":"replace","path":"/spec/containers/0/resources/requests/cpu","value":"500m"}]'
The actual applied resources are reflected in status.containerStatuses[*].resources — the spec shows what was requested; the status shows what the kubelet actually applied. Memory limit decreases are now permitted: the kubelet checks that current usage is below the new limit before applying.
Constraints: QoS-class-changing resizes are still rejected. When a node lacks capacity for the resize, Kubernetes 1.35 queues deferred resizes ordered by PriorityClass → QoS class → time waiting.
This feature is what makes VPA’s InPlaceOrRecreate mode viable for stateful workloads that couldn’t previously tolerate the pod deletion VPA requires.
Pod-Level Resources (Beta in Kubernetes 1.34)
Pod-level resource specs let you set a shared CPU and memory budget for the whole pod instead of per container:
apiVersion: v1
kind: Pod
metadata:
name: multi-container-app
spec:
resources:
requests:
cpu: "1"
memory: 512Mi
limits:
cpu: "2"
memory: 1Gi
containers:
- name: app
image: app:latest
- name: sidecar
image: envoy:latest
The use case: multi-container pods where the main app and sidecar don’t peak simultaneously. Instead of over-allocating per container, you give the pod a shared budget the containers compete for based on actual demand.
Current limitation: in-place resize of pod-level resources is not supported in 1.34. Not available on Windows pods.
How Do You Diagnose Common Kubernetes Resource Limit Failures?
OOMKilled Pods
Symptoms: Pod restarts with exit code 137.
Diagnose:
kubectl describe pod <name> -n production
# Look for: Last State: Terminated, Reason: OOMKilled
kubectl top pod <name> -n production --containers
# Compare current usage to configured limits
Fix: Increase memory limit. Then identify the root cause — don’t just keep bumping the limit without understanding whether it’s a leak, unbounded growth, or genuinely undersized allocation.
CPU Throttling
Symptoms: High latency, slow responses, low CPU utilization readings. The pod isn’t dying, just slow.
Diagnose:
rate(container_cpu_cfs_throttled_periods_total{namespace="production"}[5m])
/
rate(container_cpu_cfs_periods_total{namespace="production"}[5m])
Throttle ratio > 25% is a problem. If you’re on kernel < 5.4 and see high throttle ratios on pods well below their limit, you’re likely hitting the CFS quota bug (kubernetes/kubernetes#67577).
Fix: Raise or remove CPU limits. Upgrade to Linux 5.4+ if on an affected kernel.
Pods Stuck in Pending
Symptoms: Pods remain in Pending indefinitely.
Diagnose:
kubectl describe pod <name> -n production
# Events section — look for:
# Warning FailedScheduling 0/5 nodes available: Insufficient cpu
# Warning FailedScheduling exceeded quota: requests.cpu
Causes and fixes:
- Requests exceed any single node’s allocatable capacity → right-size requests using the Prometheus methodology above, or add larger nodes
- Namespace ResourceQuota exhausted → expand quota or clean up unused pods/deployments
- No nodes matching affinity/toleration rules → separate scheduling problem, but check this too
What Are the Right Resource Specs for Common Kubernetes Workload Types?
Starting points calibrated for common patterns. Adjust based on your P95 Prometheus data.
Web server (nginx, Node.js API):
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
memory: 256Mi
# CPU limit omitted — allows burst access to idle node capacity
Background worker (queue consumer, batch processor):
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
memory: 512Mi
JVM application (Java, Spring Boot, Kotlin):
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2" # JVM startup and GC bursts require burst capacity
memory: 1536Mi # Set -Xmx 1152m (75% of limit) in your JVM flags
Sidecar (Envoy proxy, Fluent Bit, metrics exporter):
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
memory: 128Mi
Production Checklist
Before a workload ships to production:
Requests
- CPU request set on every container
- Memory request set on every container
- Values based on P95 Prometheus data, not estimates
Limits
- Memory limit set on every container
- CPU limit policy documented (deliberately omitted or set with documented headroom ratio)
- JVM workloads have
-Xmxset to ~75% of memory limit
QoS class
- Critical workloads (databases, auth, payment) use Guaranteed QoS (requests = limits)
- Actual QoS class confirmed:
kubectl get pod <name> -o=jsonpath='{.status.qosClass}'
Namespace governance
- LimitRange applied with sensible defaults and max values
- ResourceQuota applied to cap namespace-level consumption
Monitoring
- Alert on OOMKill events (
kube_pod_container_status_last_terminated_reason="OOMKilled") - Alert on CPU throttle ratio > 25%
- VPA running in
Offmode to surface sizing drift over time
Autoscaling
- If using VPA + HPA: split resource ownership (VPA on memory, HPA on CPU) or switch HPA to
AverageValuetarget - VPA update mode chosen intentionally (
InPlaceOrRecreateon 1.35+ clusters,Recreateon older)
All technical claims verified against Kubernetes 1.35.3. Sources: kubernetes.io/docs/concepts/configuration/manage-resources-containers, kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits, kubernetes.io/blog/2025/12/19/kubernetes-v1-35-in-place-pod-resize-ga, kubernetes.io/blog/2025/09/22/kubernetes-v1-34-pod-level-resources, Tim Hockin HN thread (item 24381813), kubernetes/kubernetes#67577, kubernetes/autoscaler#2939.