Skip to main content

Onboarding your EKS clusters to Copilot for EKS Cluster Autoscaler

About Compute Copilot for EKS

Compute Copilot for EKS is a powerful service designed to automatically optimize compute-based workloads, reducing AWS EKS costs by intelligently adjusting your cluster in three key dimensions:

  • Container Efficiency
  • Node Efficiency
  • Price Efficiency

For a detailed breakdown of each dimension and how Compute Copilot enhances them, see the EKS Insights Dashboard Documentation. This documentation explains all the metrics displayed on the EKS page, providing deeper insights into your cluster’s efficiency and potential savings.

note

For better node efficiency and automation, we recommend migrating to Karpenter and using Compute Copilot for Karpenter, which dynamically adjusts node sizes based on workload demands.

Price Efficiency with Cluster Autoscaler

Cluster Autoscaler is a key feature that dynamically adjusts the number of nodes in a node group within an EKS cluster to accommodate changes in resource requirements for pods. When a scaling decision is made, Cluster Autoscaler communicates with the associated Auto Scaling Group (ASG) to adjust the DesiredCapacity, prompting the ASG to automatically scale the cluster nodes.

Compute Copilot for Cluster Autoscaler fully manages the ASGs that belong to your cluster for additional savings and reliability. Here’s how it works:

  • ASGs managed by Copilot are now converted to Mixed Instances Policy (MIP) ASGs, allowing nOps Compute Copilot to define Spot Instance Types this ASG can launch.
  • Compute Copilot Lambda keeps Managed ASG MIP in sync with the latest Spot Market recommendations by nOps, taking into account your Reserved Instances data. As a result, during scale-out events, Copilot-Managed ASG launches a Spot instance that is cheaper than the On Demand instance this ASG was launching before configuring it to Compute Copilot.
  • If there are OnDemand instances available for Spot migration running, or there are Spot instances that are at risk of interruption, Compute Copilot Lambda initiates an Instance Refresh to bring the ASG to the state approved by nOps Spot Market recommendations.

By combining the automation capabilities of Cluster Autoscaler with the intelligent instance management of Compute Copilot, this solution offers a seamless and cost-effective approach to optimizing AWS EKS workloads and reduce costs.

Why Use Copilot to Manage EKS ASGs

  • Compute Copilot uses AI-driven decision making to provision and run auto scaling group instances at the cheapest price in real time, without manual effort.
  • Compute Copilot allows you to benefit from Spot savings with the same reliability as on-demand. By analyzing historical data and Spot Termination events, it ensures your critical workloads remain safe from interruption. 
  • Compute Copilot does not require your workload to be transferred to a proprietary system, but works directly with AWS ASG.
  • By utilizing an ASG Mixed Instance Policy, instances are typically not replaced after they are launched, reducing the amount of unnecessary noise your Application experiences.
  • Since Compute Copilot for Cluster Autoscaler integrates directly with AWS Instance Refresh and MIP, it supports complicated use cases like AWS CodeDeploy and any other Life Cycle Hook specific cases.
  • Setting a strict MaxSpotPrice setting within the ASG MIP ensures the price of Managed Spot is always cheaper than the OnDemand that was initially defined, even if Spot Market Prices are changing frequently.

Prerequisites

  1. You must be logged in to your nOps account.
  2. Your AWS account must be configured to your nOps account.
  3. You must have an EKS Cluster with Cluster Autoscaler installed.
  4. GP3 storage class must be configured in your Kubernetes cluster for Container Insights and Container Rightsizing.

Steps to Configure Your EKS Cluster for BC+, Container Rightsizing, and Container Insights

IAM Roles and Permissions Setup

Purpose

To enable our agent stack to function within the EKS clusters, we need to create IAM roles with the following permissions:

  • Store container insights metrics in an S3 bucket for Container Efficiency: The S3 bucket will reside in the customer's AWS account, so the IAM role must be configured to allow access to that specific bucket.

  • Subscribe to an SQS queue for real-time data fetching for the Cluster Dashboard: The SQS queue will reside in the nOps AWS account, so the IAM role needs to be granted permission to access that queue across accounts.

Choosing Between Terraform and CloudFormation

To set up IAM roles and permissions, you can use either Terraform or CloudFormation. The key differences between these options are:

  • Automatic Updates: CloudFormation supports automatic updates when a new version is available, while Terraform requires manual updates.
  • Multi-Region Deployment: CloudFormation allows you to define AWS regions where your clusters exist and applies the setup across all specified regions. With Terraform, you must run the setup separately for each region.

Choose the tool that best fits your operational needs and update strategy.

CloudFormation Setup Steps

  1. Navigate to EKS from the Compute Copilot top menu.

  2. Click on the EKS cluster you want to configure.

  3. Go to the Cluster Configuration section.

  4. Generate an API key from the API Key section and save it for later use.

  5. Click the Setup button for CloudFormation and proceed.

  6. You will be redirected to the CloudFormation stack setup.

  7. Fill in the Input Variables, paying special attention to the Token field, which is not pre-filled. The template accepts the following parameters:

    • IncludeRegions: Comma-separated list of AWS regions where the solution should operate. Defaults to the region where the stack is created if left blank.
    • RoleName: IAM role name to attach the read policy. Created during onboarding for each AWS account into nOps.
    • CreateIAMUser: Boolean (true/false) specifying whether to create an IAM user. This is required (true) if there is no IAM OIDC provider. Default is false.
    • Environment: Specifies the nOps environment where the solution will run. Allowed values: PROD, UAT. Default: PROD.
    • Token: The nOps API token required for authentication. This is sensitive information and will not be logged.
    • AutoUpdate: Determines whether the stack should automatically update when a new version is released. Allowed values: true, false. Default: true.
  8. Enter the saved API key in the Token field.

  9. Run the CloudFormation stack and return to the nOps platform.

  10. On successful execution, Version and Status should display as Configured.

CloudFormation Update Steps

When the CloudFormation stack requires an update, the IAM Roles and Permissions Setup section in the UI will display the Status as "Outdated". To update it, ensure you are authenticated in the correct AWS account, then click "Update". This action will open the AWS Console on the CloudFormation page, where you can proceed with the update.

If the UI displays the Status as "Outdated" and the Version as "N/A", this indicates that your CloudFormation stack is running a version prior to onboarding confirmation support. In this case, you must manually update the stack from the AWS Console by following these steps:

  1. Navigate to the CloudFormation page in the AWS Console.

  2. Locate and click on the nops-container-cost-setup-${account_number} stack.

  3. Click Update.

  4. Select Replace existing template.

  5. Use the following S3 URL for the new template:

    https://nops-rules-lambda-sources.s3.us-west-2.amazonaws.com/container_cost/versions/0.2.0/nops-kubernetes-agent-setup.yml
  6. Update the IncludeRegions parameter if needed.

  7. Provide a nOps Token/API Key (a new API key can be generated in the Configuration tab, where the Helm upgrade command is available).

  8. Click Submit to apply the stack update.

Terraform Setup Steps

  1. Navigate to EKS from the Compute Copilot top menu.

  2. Click on the EKS cluster you want to configure.

  3. Go to the Cluster Configuration section.

  4. Click the Setup button for Terraform and proceed.

  5. Generate an API key and save it for later use.

  6. Copy and paste the Terraform module call into your Terraform configuration.

  7. Update the Input Variables if necessary. These variables have default values and do not need to be set unless you wish to override them. This module has no required variables.

    • cluster_names: A list of EKS cluster names targeted for deploying resources. Leave it empty to create roles for all EKS clusters in the region. Default is an empty list ([]).
    • create_bucket: Boolean (true/false) indicating whether to create an S3 bucket. If the bucket already exists or is located in another region, set it to false. Default is true.
    • create_iam_user: Boolean (true/false) specifying whether to create an IAM user. This is required (true) if there is no IAM OIDC provider. Default is false.
    • environment: Specifies the nOps environment where the solution will operate. Allowed values: PROD, UAT. Default is PROD.
    • role_name: The IAM role name to attach the read policy. If left empty, it will be automatically fetched. Default is an empty string ("").
  8. Initialize and apply your Terraform configuration:

terraform init
terraform plan -out=plan
terraform apply plan

Install the nOps Agent Stack

Once the Terraform or CloudFormation Setup is done, proceed with the installation of the nOpss Agent Stack within the cluster to begin data collection.

  1. On the same Cluster Configuration page, generate a new API key for the nOps Agent Stack.
  2. Copy the custom command and run it in your command line.
  3. Click Test Connectivity to confirm connectivity with the nOps Agent Stack.

How to Enable Container Rightsizing

After completing the IAM Roles and Permissions Setup and Installing the nOps Agent Stack, you can start configuring container rightsizing to optimize resource usage and reduce costs.

The nOps Container Insights Agent, a core part of the nOps Kubernetes Agent Stack, will now start collecting data on the actual resource consumption of your Kubernetes workloads. This data is used to generate rightsizing recommendations, helping you optimize CPU and memory allocations for better efficiency.

For detailed steps on enabling container rightsizing, refer to the Container Rightsizing Documentation.

Steps to Configure Your EKS Cluster for Price Efficiency

Install Compute Copilot Lambda

For instrunctions on how to install Compute Copilot Lambda, please, refer to the Compute Copilot for Auto Scaling Groups (ASG) documentation.

Configure ASGs associated with the Cluster

  1. Navigate to Compute Copilot → EKS from the nOps dashboard.

  2. Choose the EKS cluster you want to cost-optimize.

  3. Navigate to the Cluster Configuration TAB.

  4. Select the ASG you want to configure:

    • Open Configure Modal. The details section of Auto Scaling Group will be prefilled.

    • The AWS Lambda configuration section will show the version detail and current status of the Lambda Stack to confirm it is properly configured on the AWS account.

    • You can create or choose an existing ASG template in the Spot detail section.

      note

      Unlike Compute Copilot for Karpenter, Compute Copilot for Cluster Autoscaler does not require an agent. The Helm command for installing an agent—found in the configuration tab for clusters using the Cluster Autoscaler—is for deploying the nOps Agent Stack, which provides container utilization metrics, rightsizing data, and up-to-date information for your cluster’s dashboard.

  5. Create an ASG Template:

    • Give the ASG template a unique name.
    • Select the CPU architecture based on the AMIs of the ASG you are going to attach the template to.
    • By default Dynamic min VCpuCount & MemoryMiB is checked, setting the minimum vCPU and RAM requirements based on the size of the On-Demand EC2 instance being replaced.You can disable this option and set the CPU or Memory suitable for your workload from the Instance Requirements list.
    • You can also directly choose instance families. Compute Copilot will select the most optimal choice for price and stability out of the provided options.
    • Now click Create
  6. (Optional) Set the Minimum Number of On-demand Instances Minimum Number of On-Demand Instances defines the number of On-Demand instances that should be left in the ASG and not replaced with Spot. ASG Lambda won’t do any On-Demand to Spot replacement if it has fewerf On-Demands instances than specified in this config setting.

  7. (Optional) Set the Spot percentage and Max Spot Instances using the draggable bars.

    • Spot percentage defines the desired percentage of spot instances.
    • Max Spot Instances defines the maximum number of Spot instances to be created by Compute Copilot.
  8. Select Fully Mananged ASG.

    • Compute Copilot for Cluster Autoscaler only works when Fully Mananged ASG is selected.
  9. Click Configure and repeat these steps for all ASGs in the cluster.

onboarding-ca.gif

What to Expect After Configuring your Cluster for Price Efficiency

Compute Copilot will start running your workloads on Spot Instances. You can navigate to the cluster's dashboard to view the instances being launched by Compute Copilot.

dashboard-ca.gif

FAQ

  1. What is the architecture of the CloudFormation Stack?

    • The stack automates the process of creating and managing IAM roles for EKS clusters, ensuring the proper roles are associated with the service account used by the nOps Agent Stack.

    Cloudformation

    note
    • Optional IAM User Support: The template can also handle situations where an IAM user might be needed, making it suitable for environments lacking OIDC identity providers.
    • Onboarding Confirmation: The Lambdas send a request to nOps to confirm CloudFormation onboarding.
  2. What is the architecture of the nOps Terraform module?

    • The Terraform module is hosted in the public Terraform Registry, allowing customers to use it as a source in their own Terraform configurations.

    For each AWS account and region where clusters exist, customers apply the Terraform module. This process:

    • Creates an S3 bucket and IAM roles for each cluster, enabling the agent to export data.
    • Creates a cross-account role for the backend, allowing it to copy data into the platform.

    Additionally, nOps APIs ensure that:

    • Each cluster has the necessary IAM roles for agent installation.
    • The S3 bucket is registered in the backend table, triggering the data copy workflows.