Compute Copilot Container Rightsizing
About Container Rightsizing
Container rightsizing leverages historical data to generate resource consumption recommendations, aligned with the selected policy. By enabling this feature, the system automates the adjustment of container resource requests, applying the new recommendations through the Vertical Pod Autoscaler (VPA). This approach ensures that deployments are not directly modified, enabling more efficient and dynamic resource allocation while maintaining the integrity of existing configurations. The overall goal of Container Rightsizing is to minimize the waste associated with over-provisioning Kubernetes workloads. Broadly speaking, Container Rightsizing seeks to set resource requests (CPU and memory) to exactly what is consumed by the workload. That is, it aims for 100% utilization over time, subject to the constraints of the policies you apply when configuring Container Rightsizing. It is worth noting that, while Container Rightsizing is automatic, dynamically adjusting resource requests, it is not designed as a fast-response load-following scaling solution. Instead, it aims to reduce cost and maximize reliability by ensuring that your workloads have the resources they actually need.
Compute Copilot Container Rightsizing is currently in Early Access (EA). Please contact customer success if you are interested in participating in the EA program for Container Rightsizing
How Does It Work?
Compute Copilot Container Rightsizing is a data-driven continuous resource optimization platform for workloads in EKS clusters. It supports automatically setting CPU and memory requests on a variety of Kubernetes workloads:
- Deployments
- StatefulSets
- DaemonSets
- CronJobs
The rightsizing process starts by collecting data about the real-world resource consumption of your Kubernetes workloads. This is done by the nOps Container Insights Agent, which is a core part of the nOps Kubernetes Agent stack. This data is ingested into a statistical data analytics pipeline that analyzes the resource consumption of your workloads over the last 30 days at a one-minute resolution. Based on this analysis, optimized recommendations are generated at four different levels. Each level is tailored to a specific use case.
The recommendation levels provided are:
- Maximum savings - most aggressive recommendations for maximum savings
- Balanced savings - a balance of savings and performance, biased toward savings
- Balanced performance - a balance of savings and performance, biased toward performance
- Maximum performance - maximum headroom for maximum workload performance and reliability
In addition to generating these recommendations, the pipeline also looks at the characteristics of the resource usage of your workloads in order to make a suggestion of the appropriate recommendation level. That is, it analyzes the resource usage of your application to see if it has significant peaks or bursts, or if it is more steady state. Using this analysis, it can suggest "maximum savings" as the appropriate recommendation level for a steady state workload, or "maximum performance" for a workload that sees significant peaks in its resource demand, for example. The recommendation at the suggest level is shown on the container rightsizing dashboard. To see what level was suggested, look for the annotations on the recommended CPU and memory settings in the dashboard.
Recommendations are updated on an hourly basis for all workloads.
Policies
Recommendation levels are the foundation for our recommendation policies feature. Recommendation policies enable a user to combine desired recommendation levels at the CPU and memory level with headroom settings to tailor the behavior of automatic recommendations to the needs of your system. User-created policies are on our roadmap, but for now we provided the following pre-made policies:
- Maximum Savings
- CPU Recommendation Level: Maximum Savings
- Memory Recommendation Level: Maximum Savings
- Optimize resource utilization and minimize costs. This policy sets resource limits to the minimum required for your containers to function correctly, based on their observed usage patterns
- High availability
- CPU Recommendation Level: Maximum Performance
- Memory Recommendation Level: Maximum Performance
- Prioritizes resource allocation to ensure consistent uptime and resilience. Provide excess capacity to handle traffic spikes and maintain performance under heavy load conditions, optimizing for reliability over cost savings
- Dynamic
- CPU Recommendation Level: automatically selected
- Memory Recommendation Level: automatically selected
- This Dynamically adjusts both the selected recommendation level and the resource requests based on the observed demand. It helps ensure that your containers have the resources they need while avoiding over-provisioning or under-provisioning.
Default Setting: If no policy is selected then the Maximum Savings metrics chosen by default.
nOps VPA
The nOps Vertical Pod Autoscaler (VPA) is the agent that is deployed into your cluster to automatically apply CPU and memory request recommendations to your selected workloads. The nOps VPA will modify the resource that you select for automatic rightsizing at the pod/container level, not at the controller level (Deployment, DaemonSet, etc.) The nOps VPA will automatically update the CPU and memory requests of the workloads that you select for automatic rightsizing on an hourly basis. A history of the recommendations applied by the nOps VPA can be accessed through the "History" button for each workload on the Container Rightsizing dashboard.
How to Enable Container Right Sizing
Prequisites
-
You must be logged in to your nOps account.
-
Your AWS account must be configured to your nOps account.
-
Your EKS cluster must be onboarded according to the instructions on the Onboarding EKS help page
- After copying the custom command to install the Helm chart for the nOps Kubernetes Agent, but before running it, you must add the following parameter to the command to enable the Container Rightsizing VPA:
--set containerRightsizing.enabled=true
- If you have already onboarded your cluster and installed the nOps Kubernetes Agent, you can re-run the
helm upgrade
command from the cluster configuration pane with the above parameter added to the command.
notenOps Kubernetes Agent Helm chart
v0.2.0
or higher is required to support automatic Container Rightsizing functionality. - After copying the custom command to install the Helm chart for the nOps Kubernetes Agent, but before running it, you must add the following parameter to the command to enable the Container Rightsizing VPA:
How to Enable Automatic Container Rightsizing
Step 1: Access the Compute Copilot Container Rightsizing Tab
Step 2: Enable Container Rightsizing On a Specific Container
How to Disable Container Right Sizing
Once container rightsizing is disabled, the Vertical Pod Autoscaler (VPA) is removed, and the pod will revert to consuming resource requests defined in its controller kind (e.g., Deployment, DaemonSet, etc.). This means that the container will no longer receive automated adjustments based on historical data and will instead rely on the initial configuration set within the controller.
Frequently Asked Questions (FAQ)
Does it overwrite my original workload resources?
No, container rightsizing does not overwrite your workload. The Vertical Pod Autoscaler (VPA) updates the container at the pod level, so the original workload configurations, such as those defined in your Deployment or DaemonSet, remain unchanged.
Is there downtime?
No, there is no downtime.
While recommendation updates from the nOps VPA will cause pod restarts, the process is handled in a rolling fashion equivalent to a kubectl rollout restart
.
Should I enable container rightsizing for containers in the kube-system or default namespaces?
It is not recommended to enable container rightsizing for containers in the kube-system or default namespaces. These namespaces typically contain Kubernetes-specific workloads that are critical to the cluster's functionality. Modifying the resource requests and limits of these workloads could lead to unforeseen issues or disrupt essential services. It’s best to limit container rightsizing to application-specific namespaces.
How does container rightsizing handle limits?
Container rightsizing will respect the limits that you set. If our data analytics make a recommendation that exceed the current configured CPU or memory limit, the requests will be set to the limit values. Container Rightsizing will not change the limits on any of your workloads, and will not set requests higher than limits.
Does Container Rightsizing work with the Horizontal Pod Autoscaler (HPA)?
Container Rightsizing and the Horizontal Pod Autoscaler are designed for different use cases.
Container Rightsizing is designed to maximize utilization at the workload level, with utilization being calculated as actual resource consumption over resource request: Usage / Request
.
The HPA is designed to enable load following by scaling workloads horizontally (increasing the number of replicas).
Since the HPA uses utilization as a scaling signal, caution must be exercised when deploying Container Rightsizing with HPA-enabled workloads.
Since Container Rightsizing aims to maximize utilization, you may have to update your HPA configurations to ensure that the HPA isn't triggered by Container Rightsizing. Our recommended rule of thumb is to adjust your HPA scaling threshold to a value greater than 100%. A good starting point would be to set the HPA scaling threshold to a utilization value that matches the maximum actual utilization of your workload relative to the optimized request values provided by Container Rightsizing
New HPA Threshold = Maximum Usage / Rightsized Request
You can find the maximum resource usage by clicking on a workload in the Container Rightsizing dashboard in the application to pop out the workload details modal. The maximum resource usage over the past 30 days is shown on the chart in the modal, where it is indicated by a red line. The rightsized request values can be found on the main Container Rightsizing dashboard.
The new HPA threshold may have to be adjusted experimentally to obtain the desired scaling behavior.
Caveats
-
Showback Delay:
Certain parts of the current container rightsizing data analytics pipeline are based on showback and the AWS Cost and Usage Report (CUR). This means that there may be a delay of up to 48 hours before new workloads initially show up in the Container Rightsizing Dashboard. The product roadmap currently includes plans to reduce or eliminate this delay.
-
Official VPA Not Supported:
The nOps VPA uses the same CRDs as the official Kubernetes VPA, so installing both the official VPA and the nOps VPA can result in workload instability.
-
Single Replica Workloads:
By default, single replica workloads will not receive resource recommendations, even if they are enabled in the UI. Applying recommendations requires a pod restart, which would cause downtime for that workload. If you are interested in automatically rightsizing single replica workloads that are downtime tolerate, contact customer success to discuss your requirements.
-
CronJobs:
Our Vertical Pod Autoscaler (VPA) solution fully supports CronJobs, enabling automated rightsizing for resources associated with periodic workloads. To ensure compatibility, the following requirement must be met: In the CronJob specification (spec.jobTemplate.spec.template.metadata.labels), there must be an app label with a value matching the name of the CronJob. For example, if the CronJob is named demo-cronjob, the app label should be:
apiVersion: batch/v1
kind: CronJob
metadata:
name: demo-cronjob
spec:
jobTemplate:
spec:
template:
metadata:
labels:
app: demo-cronjob
spec:
containers:
- name: demo-container
image: demo-image