Compute Copilot Container Rightsizing

About Container Rightsizing

Container rightsizing leverages historical data to generate resource consumption recommendations, aligned with the selected policy. By enabling this feature, the system automates the adjustment of container resource requests, applying the new recommendations through the Vertical Pod Autoscaler (VPA). This approach ensures that deployments are not directly modified, enabling more efficient and dynamic resource allocation while maintaining the integrity of existing configurations. The overall goal of Container Rightsizing is to minimize the waste associated with over-provisioning Kubernetes workloads. Broadly speaking, Container Rightsizing seeks to set resource requests (CPU and memory) to exactly what is consumed by the workload. That is, it aims for 100% utilization over time, subject to the constraints of the policies you apply when configuring Container Rightsizing. It is worth noting that, while Container Rightsizing is automatic, dynamically adjusting resource requests, it is not designed as a fast-response load-following scaling solution. Instead, it aims to reduce cost and maximize reliability by ensuring that your workloads have the resources they actually need.

How Does It Work?

Compute Copilot Container Rightsizing is a data-driven continuous resource optimization platform for workloads in EKS clusters. It supports automatically setting CPU and memory requests on a variety of Kubernetes workloads:

Deployments
StatefulSets
DaemonSets
CronJobs

The rightsizing process starts by collecting data about the real-world resource consumption of your Kubernetes workloads. This is done by the nOps Container Insights Agent, which is a core part of the nOps Kubernetes Agent stack. This data is ingested into a statistical data analytics pipeline that analyzes the resource consumption of your workloads over the last 30 days at a one-minute resolution. Based on this analysis, optimized recommendations are generated at four different levels. Each level is tailored to a specific use case.

The recommendation levels provided are:

Maximum savings - most aggressive recommendations for maximum savings
Balanced savings - a balance of savings and performance, biased toward savings
Balanced performance - a balance of savings and performance, biased toward performance
Maximum performance - maximum headroom for maximum workload performance and reliability

In addition to generating these recommendations, the pipeline also looks at the characteristics of the resource usage of your workloads in order to make a suggestion of the appropriate recommendation level. That is, it analyzes the resource usage of your application to see if it has significant peaks or bursts, or if it is more steady state. Using this analysis, it can suggest "maximum savings" as the appropriate recommendation level for a steady state workload, or "maximum performance" for a workload that sees significant peaks in its resource demand, for example. The recommendation at the suggest level is shown on the container rightsizing dashboard. To see what level was suggested, look for the annotations on the recommended CPU and memory settings in the dashboard.

Recommendations are updated on an hourly basis for all workloads.

Policies

Recommendation levels are the foundation for our recommendation policies feature. Recommendation policies enable a user to combine desired recommendation levels at the CPU and memory level with headroom settings to tailor the behavior of automatic recommendations to the needs of your system. User-created policies are on our roadmap, but for now we provided the following pre-made policies:

Maximum Savings
- CPU Recommendation Level: Maximum Savings
- Memory Recommendation Level: Maximum Savings
- Optimize resource utilization and minimize costs. This policy sets resource limits to the minimum required for your containers to function correctly, based on their observed usage patterns
High availability
- CPU Recommendation Level: Maximum Performance
- Memory Recommendation Level: Maximum Performance
- Prioritizes resource allocation to ensure consistent uptime and resilience. Provide excess capacity to handle traffic spikes and maintain performance under heavy load conditions, optimizing for reliability over cost savings
Dynamic
- CPU Recommendation Level: automatically selected
- Memory Recommendation Level: automatically selected
- This Dynamically adjusts both the selected recommendation level and the resource requests based on the observed demand. It helps ensure that your containers have the resources they need while avoiding over-provisioning or under-provisioning.

Default Setting: If no policy is selected then the Maximum Savings metrics chosen by default.

nOps VPA

The nOps Vertical Pod Autoscaler (VPA) is the agent that is deployed into your cluster to automatically apply CPU and memory request recommendations to your selected workloads. The nOps VPA will modify the resource that you select for automatic rightsizing at the pod/container level, not at the controller level (Deployment, DaemonSet, etc.) The nOps VPA will automatically update the CPU and memory requests of the workloads that you select for automatic rightsizing on an hourly basis. A history of the recommendations applied by the nOps VPA can be accessed through the "History" button for each workload on the Container Rightsizing dashboard.

How to Enable Container Right Sizing

Prerequisites

You must be logged in to your nOps account.
Your AWS account must be configured to your nOps account.
Your EKS cluster must be onboarded according to the instructions on the Onboarding EKS help page.
- After copying the custom command to install the Helm chart for the nOps Kubernetes Agent, but before running it, you must add the following parameter to the command to enable the Container Rightsizing VPA:
  --set containerRightsizing.enabled=true
- Additionally, you can include the following parameter for single-replica:
  --set vpa.updater.replaceMinReplicas=1
  Allows rightsizing for single-replica deployments by enabling pod restarts even if there is only one replica. By default, rightsizing applies only to deployments with two or more replicas to prevent downtime.

If you have already onboarded your cluster and installed the nOps Kubernetes Agent, you can re-run the helm upgrade command from the cluster configuration pane with these parameters added.

note

nOps Kubernetes Agent Helm chart v0.2.0 or higher is required to support automatic Container Rightsizing functionality.

How to Enable Automatic Container Rightsizing

Step 1: Access the Compute Copilot Container Rightsizing Tab

Step 2: Enable Container Rightsizing On a Specific Container

How to Disable Container Right Sizing

Once container rightsizing is disabled, the Vertical Pod Autoscaler (VPA) is removed, and the pod will revert to consuming resource requests defined in its controller kind (e.g., Deployment, DaemonSet, etc.). This means that the container will no longer receive automated adjustments based on historical data and will instead rely on the initial configuration set within the controller.

Infrastructure as Code (IaC) Support

In addition to enabling container rightsizing through the nOps UI, you can now configure your Kubernetes workloads to opt into rightsizing directly through your Infrastructure as Code (IaC) tools like Helm, Kustomize, or other GitOps workflows.

How to Enable Container Rightsizing via IaC

To enable container rightsizing via IaC, you need to add specific labels and annotations to your Kubernetes resources. This declarative approach allows you to manage rightsizing configurations alongside your application code.

Required Label

To opt a workload into rightsizing, add the following label:

metadata:
  labels:
    nops-vpa/enabled: "true"

This label is mandatory and signals that the workload is eligible for rightsizing.

Container-Specific Configuration (Optional)

By default, all containers in a workload with the nops-vpa/enabled: "true" label will be rightsized. If you want to exclude specific containers, you can use annotations:

metadata:
  annotations:
    nops-vpa/container.<container_name>: "disabled"

Replace <container_name> with the actual name of the container you want to exclude.

Policy Attachment (Optional)

You can specify which rightsizing policy to use for all containers in the workload:

metadata:
  annotations:
    nops-vpa/policy: "High availability"

Or attach different policies to specific containers:

metadata:
  annotations:
    nops-vpa/policy.<container_name>: "Maximum Savings"

Example Configuration

Here's a complete example of a Deployment with IaC-based rightsizing configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
  labels:
    nops-vpa/enabled: "true"  # Enable rightsizing for this deployment
  annotations:
    nops-vpa/policy: "Dynamic"  # Use Dynamic policy by default
    nops-vpa/container.worker: "disabled"  # Exclude the worker container
    nops-vpa/policy.web: "High availability"  # Use High availability policy for web container
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      containers:
      - name: web
        image: nginx:latest
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
      - name: worker
        image: worker:latest
        resources:
          requests:
            cpu: 200m
            memory: 256Mi

Supported Controller Types

The IaC-based rightsizing currently supports the following controller types:

Deployments
StatefulSets
DaemonSets
CronJobs

Behavior Defaults

If nops-vpa/enabled: "true" is present, all containers are implicitly enabled for rightsizing unless explicitly disabled.
The nops-vpa/container.<container_name> annotation with value disabled allows opting out specific containers.
A global policy (nops-vpa/policy) applies to all containers unless overridden per container.
Container-specific policies (nops-vpa/policy.<container_name>) take precedence over the global policy.

Discovery Mechanism

The nOps VPA uses a scanner to identify all controllers across the cluster with the nops-vpa/enabled=true label. This ensures that your IaC-defined rightsizing configurations are reliably discovered and applied.

Frequently Asked Questions (FAQ)

Does it overwrite my original workload resources?

No, container rightsizing does not overwrite your workload. The Vertical Pod Autoscaler (VPA) updates the container at the pod level, so the original workload configurations, such as those defined in your Deployment or DaemonSet, remain unchanged.

Is there downtime?

No, there is no downtime. While recommendation updates from the nOps VPA will cause pod restarts, the process is handled in a rolling fashion equivalent to a kubectl rollout restart.

Should I enable container rightsizing for containers in the kube-system or default namespaces?

It is not recommended to enable container rightsizing for containers in the kube-system or default namespaces. These namespaces typically contain Kubernetes-specific workloads that are critical to the cluster's functionality. Modifying the resource requests and limits of these workloads could lead to unforeseen issues or disrupt essential services. It’s best to limit container rightsizing to application-specific namespaces.

How does container rightsizing handle limits?

Container rightsizing will respect the limits that you set. If our data analytics make a recommendation that exceed the current configured CPU or memory limit, the requests will be set to the limit values. Container Rightsizing will not change the limits on any of your workloads, and will not set requests higher than limits.

Does Container Rightsizing work with the Horizontal Pod Autoscaler (HPA)?

Container Rightsizing and the Horizontal Pod Autoscaler are designed for different use cases. Container Rightsizing is designed to maximize utilization at the workload level, with utilization being calculated as actual resource consumption over resource request: Usage / Request. The HPA is designed to enable load following by scaling workloads horizontally (increasing the number of replicas). Since the HPA uses utilization as a scaling signal, caution must be exercised when deploying Container Rightsizing with HPA-enabled workloads.

Since Container Rightsizing aims to maximize utilization, you may have to update your HPA configurations to ensure that the HPA isn't triggered by Container Rightsizing. Our recommended rule of thumb is to adjust your HPA scaling threshold to a value greater than 100%. A good starting point would be to set the HPA scaling threshold to a utilization value that matches the maximum actual utilization of your workload relative to the optimized request values provided by Container Rightsizing

New HPA Threshold = Maximum Usage / Rightsized Request

You can find the maximum resource usage by clicking on a workload in the Container Rightsizing dashboard in the application to pop out the workload details modal. The maximum resource usage over the past 30 days is shown on the chart in the modal, where it is indicated by a red line. The rightsized request values can be found on the main Container Rightsizing dashboard.

The new HPA threshold may have to be adjusted experimentally to obtain the desired scaling behavior.

How does IaC-based rightsizing interact with UI-based configuration?

When you enable rightsizing for a workload through both the UI and IaC (using labels/annotations):

The IaC configuration takes precedence
For best results, choose one method (either UI or IaC) for managing each workload's rightsizing configuration

We recommend using IaC for workloads managed through GitOps workflows and the UI for workloads that require more frequent or ad-hoc adjustments.

Caveats

Showback Delay:

Certain parts of the current container rightsizing data analytics pipeline are based on showback and the AWS Cost and Usage Report (CUR). This means that there may be a delay of up to 48 hours before new workloads initially show up in the Container Rightsizing Dashboard. The product roadmap currently includes plans to reduce or eliminate this delay.
Official VPA Not Supported:

The nOps VPA uses the same CRDs as the official Kubernetes VPA, so installing both the official VPA and the nOps VPA can result in workload instability.
Single Replica Workloads:

By default, single replica workloads will not receive resource recommendations, even if they are enabled in the UI. Applying recommendations requires a pod restart, which would cause downtime for that workload. If you are interested in automatically rightsizing single replica workloads that are downtime tolerate, contact customer success to discuss your requirements.
CronJobs:

Our Vertical Pod Autoscaler (VPA) solution fully supports CronJobs, enabling automated rightsizing for resources associated with periodic workloads. To ensure compatibility, the following requirement must be met: In the CronJob specification (spec.jobTemplate.spec.template.metadata.labels), there must be an app label with a value matching the name of the CronJob. For example, if the CronJob is named demo-cronjob, the app label should be:
```
apiVersion: batch/v1
kind: CronJob
metadata:
  name: demo-cronjob
spec:
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app: demo-cronjob
        spec:
          containers:
          - name: demo-container
            image: demo-image
```
IaC Reconciliation:

When using Infrastructure as Code (IaC) to configure rightsizing, be aware that there is a reconciliation process that runs periodically to detect changes in your IaC configurations. This means that changes made to labels or annotations may not be immediately reflected in the rightsizing behavior. The system typically reconciles IaC configurations every minute.

About Container Rightsizing​

How Does It Work?​

Policies​

nOps VPA​

How to Enable Container Right Sizing​

Prerequisites​

How to Enable Automatic Container Rightsizing​

How to Disable Container Right Sizing​

Infrastructure as Code (IaC) Support​

How to Enable Container Rightsizing via IaC​

Required Label​

Container-Specific Configuration (Optional)​

Policy Attachment (Optional)​

Example Configuration​

Supported Controller Types​

Behavior Defaults​

Discovery Mechanism​

Frequently Asked Questions (FAQ)​

Does it overwrite my original workload resources?​

Is there downtime?​

Should I enable container rightsizing for containers in the kube-system or default namespaces?​

How does container rightsizing handle limits?​

Does Container Rightsizing work with the Horizontal Pod Autoscaler (HPA)?​

How does IaC-based rightsizing interact with UI-based configuration?​

Caveats​