Skip to content

Now that we know how to ensure our application is healthy with probes, let’s address another critical aspect of running applications in a shared cluster: resource management.

Managing Container Resources (Requests & Limits)

Section titled “Managing Container Resources (Requests & Limits)”

By default, a Pod can consume as much CPU and memory as is available on the Node it’s running on. In a cluster with many applications, this default behavior is dangerous and leads to two major problems:

  1. The “Noisy Neighbor” Problem: A single buggy application that starts consuming 100% of a Node’s CPU or memory can starve all other applications on that same Node, causing them to fail. It can even destabilize the Node itself.
  2. Inefficient Scheduling: When Kubernetes doesn’t know how many resources your application needs, it can’t make smart decisions about where to place your Pods. It might schedule a memory-intensive Pod on a Node that has very little free memory, only for that Pod to be killed moments later when it tries to start.

To solve this, Kubernetes allows you to specify resource requests and limits for each container in your Pod.

It’s crucial to understand the difference between a request and a limit.

  • Requests: This is the amount of resources that Kubernetes guarantees your container will have.

    • Purpose: Primarily used for scheduling. The Kubernetes scheduler will not place your Pod on a Node unless that Node has enough available resources to satisfy the Pod’s requests.
    • Analogy: This is like reserving a 2-core, 4GB RAM virtual machine. You are guaranteed to get at least that much.
  • Limits: This is the maximum amount of resources your container is allowed to use.

    • Purpose: Primarily used for enforcement on the Node.
    • CPU Limit: If your container tries to use more CPU than its limit, it will be throttled (artificially slowed down).
    • Memory Limit: If your container tries to use more memory than its limit, it will be terminated. This is known as being OOMKilled (Out Of Memory Killed).

Resource Units:

  • CPU: Measured in units of “cores”. You can use 1 for a full core or 500m (500 millicores) for half a core.
  • Memory: Measured in bytes. You can use standard suffixes like Mi (Mebibytes) and Gi (Gibibytes) — for example, 128Mi or 2Gi.

Based on the requests and limits you set, Kubernetes assigns a Quality of Service (QoS) class to your Pod. This is important because it determines which Pods get killed first when a Node runs out of memory.

  • Guaranteed: limits are set and are equal to requests for all resources. These are the highest priority Pods and the last to be killed.
  • Burstable: requests are less than limits. These Pods are allowed to “burst” and use more resources if they are available on the Node. These are killed after BestEffort Pods.
  • BestEffort: No requests or limits are set. These are the lowest priority and the first Pods to be killed if the Node is under pressure.


  1. Enable the Metrics Server To monitor resource usage, we need to enable the metrics-server addon in Minikube. This may take a minute or two to start up.

    Terminal window
    minikube addons enable metrics-server
  2. The Deployment with Resource Limits Create a file named deployment-with-resources.yaml. We will set a strict CPU and memory limit.

    deployment-with-resources.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: resource-demo-deployment
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: resource-demo
    template:
    metadata:
    labels:
    app: resource-demo
    spec:
    containers:
    - name: app-container
    image: busybox:1.36
    # This command will just keep the container running
    command: ['/bin/sh', '-c', 'sleep 3600']
    # This is where we define the resources for this container
    resources:
    requests:
    memory: "50Mi"
    cpu: "100m" # 1/10th of a core
    limits:
    memory: "100Mi"
    cpu: "200m" # 1/5th of a core
  3. Deploy and Observe Apply the deployment and wait for the Pod to be ready.

    Terminal window
    kubectl apply -f deployment-with-resources.yaml
    kubectl get pods

    Once it’s running, check its baseline resource usage with kubectl top pods. It should be near zero.

  4. Stress Test the CPU Limit Now, let’s exec into the pod and run a command that pegs the CPU. The dd command is a perfect tool for this.

    Terminal window
    # Get your Pod's name
    POD_NAME=$(kubectl get pods -l app=resource-demo -o jsonpath='{.items[0].metadata.name}')
    # Run the stress test inside the Pod
    kubectl exec -it $POD_NAME -- /bin/sh -c 'dd if=/dev/zero of=/dev/null'

    While that command is running in one terminal, open a second terminal and watch the resource usage:

    Terminal window
    # Watch the resource usage every 2 seconds
    watch kubectl top pod $POD_NAME

    You will see the CPU usage climb up to around 200m (our limit) and then stay there. It is being throttled and cannot consume more CPU, protecting the Node from this greedy process. Press Ctrl+C in the first terminal to stop the stress test.

  5. Trigger an OOMKilled Event Now let’s test the memory limit. We’ll exec into the pod and try to allocate 150MB of memory, which is more than our 100MiB limit.

    Terminal window
    # This command tries to allocate 150MB of memory inside the container
    kubectl exec -it $POD_NAME -- /bin/sh -c 'dd if=/dev/zero of=/dev/null bs=1M count=150'

    Wait a moment. You will likely see the command fail with a “Killed” message. Now check the status of your Pod:

    Terminal window
    kubectl get pod $POD_NAME

    You will see that the RESTARTS count has increased by 1. The container was killed because it violated its memory limit, and the Deployment automatically restarted it.

    To confirm the reason, describe the Pod:

    Terminal window
    kubectl describe pod $POD_NAME

    In the State and Last State sections for the container, you will see the reason: OOMKilled.

  6. Clean Up

    Terminal window
    kubectl delete -f deployment-with-resources.yaml
  • Always define requests and limits for your production workloads. This is fundamental to cluster stability and application reliability.
  • Requests guarantee resources for your app and ensure it gets scheduled correctly.
  • Limits protect the rest of your cluster from misbehaving or resource-hungry applications.
  • Use monitoring tools like kubectl top (and more advanced tools like Prometheus) to observe your application’s real-world usage and adjust its requests and limits over time.

You now know how to build truly robust Pods that are both healthy and well-behaved citizens of the cluster. When you’re ready, let me know, and we’ll move to Chapter 3: Organizing the Cluster with Namespaces.