Configure nOps Kubernetes Agent
If you have previously installed the container-insights (nops-k8s-agent) and want to migrate to the unified installation of agents, please refer to this migration guide.
The nOps Kubernetes Agent is required to fully utilize nOps features for your EKS clusters. It's a bundle of components that will enable you to leverage our Compute Copilot product and provide container visibility into the cluster.
Prerequisites
- You must have an active nOps account. If you do not have one, please register on nOps.
- Make sure you have access to the Kubernetes cluster (recommended version v1.23.6 or later) to deploy the agent.
- aws cli.
- Helm.
- kubectl.
- Unix-like terminal to execute the installation script.
- Terraform if installing via Infrastructure as Code (IaC).
- Ensure that the EBS CSI Driver is installed with a Storage Class configured for the EKS cluster to dynamically create EBS gp2/gp3 volumes.
- If you want to enable karpenOps agent, make sure Karpenter is installed in the cluster.
For karpenOps specific documentation, please click here.
Steps to Configure nOps Kubernetes Agent
Navigate to Container Cost Tab
- Go to your Container Cost Integrations Settings.
- Setup Container Cost Integration
- Click the Setup button for the desired account. Ensure you are authenticated into that account.
- This step will:
- Create an S3 bucket in your AWS account.
- Grant writing permissions to the agent for writing files to that bucket.
- Use either your OIDC Identity Provider to create a service role or an IAM User with permissions to write to the S3 bucket.
- Allow nOps to copy those files.
- Create AWS Infrastructure
- Configure CloudFormation Stack
- On the CloudFormation stack creation page:
- List the regions where your clusters for that specific account are located, separated by commas (e.g.,
us-east-1,us-east-2,us-west-1,us-west-2
). - (Optional) If you don't have an IAM OIDC provider configured, you can create an IAM User to grant the agent permissions to write to the S3 bucket. To do this, set
CreateIAMUser
totrue
.noteIf you choose this approach, you must create and store secret key credentials to replace and add them to the helm values.
- Select the I acknowledge that AWS CloudFormation might create IAM resources with custom names checkbox.
- Click the Create stack button.
- Configure CloudFormation Stack
OR
- Create resources via Terraform
- If you prefer you can create the infrastructure using Terraform, use the instructions from the following repo.
-
Check Integration Status
- After the creation is complete, return to the nOps platform.
- Click the Check Status button to verify the integration status.
-
Install Agent in your clusters
- Now in order to install the agent in your clusters, you must go to the EKS section in your nOps platform
- Click on the cluster you want to install the agent,
- Click on the Cluster Configuration tab
- Copy the command to install the agent for in that particular cluster (Make sure to be authenticated and having that cluster in current context of your kubectl)
Example command for Karpenter enabled clusters:
helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent \
--namespace nops --create-namespace \
--set datadog.apiKey=realkeysonlyinprod \
--set containerInsights.enabled=true \
--set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-east-1:123456789101:cluster/example-cluster \
--set containerInsights.env_variables.APP_AWS_S3_BUCKET=nops-container-cost-12345678101 \
--set karpenops.enabled=true \
--set karpenops.image.tag=1.23.2 \
--set nops.apiKey=*******************************a004eb \
--set karpenops.clusterId=D/M7YjExample command for ClusterAutoscaler enabled clusters:
helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent \
--namespace nops --create-namespace \
--set datadog.apiKey=realkeysonlyinprod \
--set containerInsights.enabled=true \
--set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-east-1:123456789101:cluster/example-cluster \
--set containerInsights.env_variables.APP_AWS_S3_BUCKET=nops-container-cost-12345678101 \
--set karpenops.enabled=false
After a successful installation, you'll have our Compute Copilot (KarpenOps for Karpenter clusters) efficiently managing your node lifecycle, enhancing cost efficiency, and ensuring high availability. While EKS costs are often unclear, nOps helps by automatically finding waste through CPU and memory metrics. This lets you optimize resources and save money quickly.
noteAs part of the installation a CRD is installed, ServiceMonitors.
Installation via Terraform
Requirements
Name | Version |
---|---|
terraform | >= 1.3.2 |
Providers
Name | Version |
---|---|
aws | >= 5.58 |
helm | >= 2.7 |
kubernetes | >= 2.20 |
Usage
Helm Release
provider "aws" {
region = "us-east-1"
alias = "virginia"
}
data "aws_ecrpublic_authorization_token" "token" {
provider = aws.virginia
}
provider "helm" {
kubernetes {
host = var.cluster_endpoint # Replace it with your cluster endpoint
cluster_ca_certificate = base64decode(var.cluster_certificate_authority_data) # Replace it with your cluster certificate authority data
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", var.cluster_name] # Replace it with your cluster name
}
}
}
resource "helm_release" "nops_kubernetes_agent" {
name = "nops-kubernetes-agent"
namespace = "nops"
create_namespace = true
repository = "oci://public.ecr.aws/nops"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
description = "Helm Chart for nOps kubernetes agent"
chart = "kubernetes-agent"
version = "0.1.6" # Ensure to update this to the latest/desired version: https://gallery.ecr.aws/nops/kubernetes-agent
# Example to place Prometheus deployment in a on-demand node provisioned by Karpenter (THIS IS THE RECOMMENDED WAY TO RUN PROMETHEUS, Note: using double backslashes (\\) to escape the dot in karpenter.sh/capacity-type)
#set {
# name = "prometheus.server.nodeSelector.karpenter\\.sh/capacity-type"
# value = "on-demand"
#}
set {
name = "nops.apiKey"
value = "<your_karpenops_api_key" # Get it from the nOps kubernetes agent onboarding process
}
set {
name = "datadog.apiKey"
value = "<datadog_api_key>" # Get it from the nOps kubernetes agent onboarding process
}
set {
name = "containerInsights.enabled"
value = "true" # Set it to true to install container insights agent to gain cost visibility in your EKS cluster
}
set {
name = "containerInsights.imageTag"
value = "2.0.4" # Ensure to update this to the latest/desired version from [here](https://gallery.ecr.aws/nops/container-insights-agent)
}
set {
name = "containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN"
value = "<your_eks_cluster_arn>" # Get it from the nOps kubernetes agent onboarding process
}
set {
name = "containerInsights.env_variables.APP_AWS_S3_BUCKET"
value = "<your_s3_bucket_name>" # Get it from the nOps kubernetes agent onboarding process
}
set {
name = "karpenops.enabled"
value = "true" # Set it to true if you have karpenter running in your EKS Cluster and want the karpenOps agent to manage your EC2NodeTemplate/NodePool
}
set {
name = "karpenops.image.tag"
value = "1.23.2" # Ensure to update this to the latest/desired version: https://gallery.ecr.aws/nops/karpenops
}
set {
name = "karpenops.clusterId"
value = "<your_karpenops_cluster_id>" # Get it from the nOps kubernetes agent onboarding process
}
}
Create Addon (EKS Blueprints)
provider "aws" {
region = "us-east-1"
alias = "virginia"
}
data "aws_ecrpublic_authorization_token" "token" {
provider = aws.virginia
}
provider "helm" {
kubernetes {
host = var.cluster_endpoint # Replace it with your cluster endpoint
cluster_ca_certificate = base64decode(var.cluster_certificate_authority_data) # Replace it with your cluster certificate authority data
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", var.cluster_name] # Replace it with your cluster name
}
}
}
module "eks_blueprints_addon" {
source = "aws-ia/eks-blueprints-addon/aws"
version = "~> 1.0"
chart = "kubernetes-agent"
chart_version = "0.1.6" # Ensure to update this to the latest/desired version: https://gallery.ecr.aws/nops/kubernetes-agent
repository = "oci://public.ecr.aws/nops"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
description = "Helm Chart for nOps kubernetes agent"
namespace = "nops"
create_namespace = true
set = [
# Example to place Prometheus deployment in a on-demand node provisioned by Karpenter (THIS IS THE RECOMMENDED WAY TO RUN PROMETHEUS, Note: using double backslashes (\\) to escape the dot in karpenter.sh/capacity-type)
#{
# name = "prometheus.server.nodeSelector.karpenter\\.sh/capacity-type"
# value = "on-demand"
#},
{
name = "nops.apiKey"
value = "<your_karpenops_api_key" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "datadog.apiKey" # Get it from the nOps kubernetes agent onboarding process
value = "<datadog_api_key>" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "containerInsights.enabled"
value = "true" # Set it to true to install container insights agent to gain cost visibility in your EKS cluster
},
{
name = "containerInsights.imageTag"
value = "2.0.4" # Ensure to update this to the latest/desired version from [here](https://gallery.ecr.aws/nops/container-insights-agent)
},
{
name = "containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN"
value = "<your_eks_cluster_arn>" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "containerInsights.env_variables.APP_AWS_S3_BUCKET"
value = "<your_s3_bucket_name>" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "karpenops.enabled"
value = "true" # Set it to true if you have karpenter running in your EKS Cluster and want the karpenOps agent to manage your EC2NodeTemplate/NodePool
},
{
name = "karpenops.image.tag"
value = "1.23.2" # Ensure to update this to the latest/desired version from [here](https://gallery.ecr.aws/nops/karpenops)
},
{
name = "karpenops.clusterId"
value = "<your_karpenops_cluster_id>" # Get it from the nOps kubernetes agent onboarding process
}
]
}
Required Parameters
The following table lists required configuration parameters for the KarpenOps and Container Insights agents and their default values.
Parameter | Description | Default |
---|---|---|
nops.apiKey | nOps agent API Key | - |
datadog.apiKey | Datadog API Key. | - |
containerInsights.enabled | Wheter to install Container Insights agent or not. | true |
containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN | EKS Cluster ARN. | - |
containerInsights.env_variables.APP_AWS_S3_BUCKET | S3 Bucket name. | - |
karpenops.enabled | Wheter to install KarpenOps agent or not. | false |
karpenops.clusterId | KarpenOps Cluster ID | - |
Optional Parameters
The following table lists the optional configuration parameters for the KarpenOps and Container Insights agents and their default values.
Parameter | Description | Default |
---|---|---|
externalSecrets.enabled | External Secrets integration, more info here. | false |
externalSecrets.secretStoreRef.name | Name of the ClusterSecretStore. | - |
externalSecrets.data.apiKeys.remoteRef.key | Name of the secret in AWS Secrets Manager. | - |
autoUpdater.enabled | Wether to enable the Auto Update CronJob that updates the latest image tag for KarpenOps, Container Insights and Opencost kubernetes resources, defaults to true | true |
autoUpdater.schedule | Schedule to run the Auto Update CronJob, defaults to run every Monday at 12:00 AM | 0 0 * * 1 |
autoUpdater.repository | Repository for the Auto Update container image | public.ecr.aws/nops/alpine/k8s |
autoUpdater.imageTag | Image tag for the autoUpdater container image | 1.30.4 |
autoUpdater.successfulJobsHistoryLimit | Number of successful finished jobs to keep for the Auto Update | 0 |
autoUpdater.failedJobsHistoryLimit | Number of failed finished jobs to keep for the Auto Update | 0 |
datadog.repository | Repository for the Data Dog Agent container image | public.ecr.aws/nops/datadog/agent |
datadog.imageTag | Image tag for the Data Dog Agent container image | 7.56.0 |
containerInsights.debug | Debug mode. | false |
containerInsights.repository | Repository for the nOps Container Insights Agent container image | public.ecr.aws/nops/container-insights-agent |
containerInsights.imageTag | Image tag for the nOps Container Insights Agent container image | 2.0.4 |
containerInsights.successfulJobsHistoryLimit | Number of successful finished jobs to keep for the nOps Container Insights Agent | 2 |
containerInsights.failedJobsHistoryLimit | Number of failed finished jobs to keep for the nOps Container Insights Agent | 2 |
containerInsights.resources.limits.cpu | Container Insights Agent CPU Limit | 500m |
containerInsights.resources.limits.memory | Container Insights Agent Memory Limit | 4Gi |
containerInsights.resources.requests.cpu | Container Insights Agent CPU Request | 500m |
containerInsights.resources.requests.memory | Container Insights Agent Memory Request | 2Gi |
containerInsights.backoffLimit | Number of retries for failed Container Insights cronjobs | 3 |
opencost.loglevel | Log level for nOps-cost. | info |
opencost.opencost.exporter.image.registry | Registry for the Opencost Exporter container image | public.ecr.aws |
opencost.opencost.exporter.image.repository | Repository for the Opencost Exporter container image | nops/opencost |
opencost.opencost.exporter.image.tag | Image tag for the Opencost Exporter container image | 1.111.0 |
karpenops.image.repository | Repository for the KarpenOps Agent container image | public.ecr.aws/nops/karpenops |
karpenops.image.tag | Image tag for the KarpenOps Agent container image | 1.23.2 |
heartbeat.image.repository | Repository for the Heartbeat Agent container image | public.ecr.aws/nops/k8s-heartbeat-agent |
heartbeat.image.tag | Image tag for the Heartbeat Agent container image | 0.1.4 |
dataFetcher.image.repository | Repository for the Data Fetcher Agent container image | public.ecr.aws/nops/k8s-data-fetcher-agent |
dataFetcher.image.tag | Image tag for the Data Fetcher Agent container image | 0.1.2 |
dataFetcher.replicaCount | Number of replicas for Data Fetcher Agent. | 2 |
dataFetcher.resources.requests.cpu | Data Fetcher CPU resource requests. | 200m |
dataFetcher.resources.requests.memory | Data Fetcher Memory resource requests. | 1Gi |
dataFetcher.resources.limits.cpu | Data Fetcher CPU resource limits. | 1000m |
dataFetcher.resources.limits.memory | Data Fetcher Memory resource limits. | 3Gi |
dcgmExporter.image.repository | Repository for the DCGM Exporter container image | public.ecr.aws/nops/nvidia/dcgm-exporter |
dcgmExporter.image.tag | Image tag for the DCGM Exporter container image | 3.3.6-3.4.2-ubuntu22.04 |
prometheus.ipv6_enabled | Wether IPv6 is configured for the EKS cluster | false |
prometheus.deleteLogFile.repository | Repository for the Busybox container image | public.ecr.aws/docker/library/busybox |
prometheus.deleteLogFile.imageTag | Image tag for the Busybox container image | 1.36.1 |
prometheus.configmapReload.prometheus.image.repository | Repository for the Prometheus Config Reloader container image | public.ecr.aws/nops/prom/config-reloader |
prometheus.configmapReload.prometheus.image.tag | Image tag for the Prometheus Config Reloader container image | v0.76.0 |
prometheus.kubeStateMetrics.image.registry | Registry for the Kube State Metrics container image | public.ecr.aws |
prometheus.kubeStateMetrics.image.repository | Repository for the Kube State Metrics container image | nops/kube-state-metrics |
prometheus.kubeStateMetrics.image.tag | Image tag for the Kube State Metrics container image | 2.13.0 |
prometheus.nodeExporter.image.registry | Registry for the Node Exporter container image | public.ecr.aws |
prometheus.nodeExporter.image.repository | Repository for the Node Exporter container image | nops/prom/node-exporter |
prometheus.nodeExporter.image.tag | Image tag for the Node Exporter container image | v1.8.2 |
prometheus.nodeExporter.resources | Node Exporter CPU and Memory limits and requests resources | {} |
prometheus.server.image.repository | Repository for the Prometheus Server container image | public.ecr.aws/nops/prom/prometheus |
prometheus.server.image.tag | Image tag for the Prometheus Server container image | v2.54.0 |
prometheus.server.persistentVolume.storageClass | StorageClass Name. | gp2 |
prometheus.server.resources.requests.cpu | Prometheus CPU resource requests. | 1000m |
prometheus.server.resources.requests.memory | Prometheus Memory resource requests. | 4Gi |
prometheus.server.resources.limits.cpu | Prometheus CPU resource limits. | 3000m |
prometheus.server.resources.limits.memory | Prometheus Memory resource limits. | 32Gi |
prometheus.server.nodeSelector | Node Selector labels to use for Prometheus deployment. | {} |
Prometheus Resources
Below is a table where you can see 3 options for Prometheus memory allocation depending on your cluster size (number of pods). Use it as a baseline and adjust it according to your needs.
Prometheus memory limit default is 32Gi
No. of Pods | Memory Request | Memory Limits |
---|---|---|
100 - 500 | 4Gi | 16Gi |
500 - 1000 | 8Gi | 24Gi |
1000 or more | 16Gi | 32Gi |
Frequently Asked Questions
-
Will the agent installation affect my existing Prometheus setup? Answer: No, the agent installs its own Prometheus instance in a separate namespace, as well running Node Exporter using a different port than the default one (9100) ensuring that it does not interfere with your current Prometheus deployment and Node exporter daemonset.
-
How can I remove the agent?
Answer: To remove the agent from your cluster you just need to follow these steps:
# Delete nops-k8s-agent release
helm uninstall nops-kubernetes-agent --namespace nops
# Delete the namespaces
kubectl delete namespace nops
# CRD (ServiceMonitors) created by this chart is not removed by default and should be manually cleaned up if you want to.
kubectl delete crd servicemonitors.monitoring.coreos.com