Skip to main content

Configure nOps Kubernetes Agent

note

If you have previously installed the container-insights (nops-k8s-agent) and want to migrate to the unified installation of agents, please refer to this migration guide.

The nOps Kubernetes Agent is required to fully utilize nOps features for your EKS clusters. It's a bundle of components that will enable you to leverage our Compute Copilot product and provide container visibility into the cluster.

Prerequisites

  1. You must have an active nOps account. If you do not have one, please register on nOps.
  2. Make sure you have access to the Kubernetes cluster (recommended version v1.23.6 or later) to deploy the agent.
  3. aws cli.
  4. Helm.
  5. kubectl.
  6. Unix-like terminal to execute the installation script.
  7. Terraform if installing via Infrastructure as Code (IaC).
  8. Ensure that the EBS CSI Driver is installed with a Storage Class configured for the EKS cluster to dynamically create EBS gp2/gp3 volumes.
  9. If you want to enable karpenOps agent, make sure Karpenter is installed in the cluster.

For karpenOps specific documentation, please click here.

Steps to Configure nOps Kubernetes Agent

Organization Settings

Navigate to Container Cost Tab

  1. Setup Container Cost Integration
    • Click the Setup button for the desired account. Ensure you are authenticated into that account.
    • This step will:
      • Create an S3 bucket in your AWS account.
      • Grant writing permissions to the agent for writing files to that bucket.
      • Use either your OIDC Identity Provider to create a service role or an IAM User with permissions to write to the S3 bucket.
      • Allow nOps to copy those files.
  2. Create AWS Infrastructure
    • Configure CloudFormation Stack
      • On the CloudFormation stack creation page:
      • List the regions where your clusters for that specific account are located, separated by commas (e.g., us-east-1,us-east-2,us-west-1,us-west-2).
      • (Optional) If you don't have an IAM OIDC provider configured, you can create an IAM User to grant the agent permissions to write to the S3 bucket. To do this, set CreateIAMUser to true.
        note

        If you choose this approach, you must create and store secret key credentials to replace and add them to the helm values.

      • Select the I acknowledge that AWS CloudFormation might create IAM resources with custom names checkbox.
      • Click the Create stack button.

OR

  • Create resources via Terraform
    • If you prefer you can create the infrastructure using Terraform, use the instructions from the following repo.
  1. Check Integration Status

    • After the creation is complete, return to the nOps platform.
    • Click the Check Status button to verify the integration status.
  2. Install Agent in your clusters

    • Now in order to install the agent in your clusters, you must go to the EKS section in your nOps platform
    • Click on the cluster you want to install the agent,
    • Click on the Cluster Configuration tab
    • Copy the command to install the agent for in that particular cluster (Make sure to be authenticated and having that cluster in current context of your kubectl)

    Example command for Karpenter enabled clusters:

      helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent \
    --namespace nops --create-namespace \
    --set datadog.apiKey=realkeysonlyinprod \
    --set containerInsights.enabled=true \
    --set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-east-1:123456789101:cluster/example-cluster \
    --set containerInsights.env_variables.APP_AWS_S3_BUCKET=nops-container-cost-12345678101 \
    --set karpenops.enabled=true \
    --set karpenops.image.tag=1.23.2 \
    --set nops.apiKey=*******************************a004eb \
    --set karpenops.clusterId=D/M7Yj

    Example command for ClusterAutoscaler enabled clusters:

      helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent \
    --namespace nops --create-namespace \
    --set datadog.apiKey=realkeysonlyinprod \
    --set containerInsights.enabled=true \
    --set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-east-1:123456789101:cluster/example-cluster \
    --set containerInsights.env_variables.APP_AWS_S3_BUCKET=nops-container-cost-12345678101 \
    --set karpenops.enabled=false

    After a successful installation, you'll have our Compute Copilot (KarpenOps for Karpenter clusters) efficiently managing your node lifecycle, enhancing cost efficiency, and ensuring high availability. While EKS costs are often unclear, nOps helps by automatically finding waste through CPU and memory metrics. This lets you optimize resources and save money quickly.

    note

    As part of the installation a CRD is installed, ServiceMonitors.

Installation via Terraform

Requirements

NameVersion
terraform>= 1.3.2

Providers

NameVersion
aws>= 5.58
helm>= 2.7
kubernetes>= 2.20

Usage

Helm Release

provider "aws" {
region = "us-east-1"
alias = "virginia"
}

data "aws_ecrpublic_authorization_token" "token" {
provider = aws.virginia
}

provider "helm" {
kubernetes {
host = var.cluster_endpoint # Replace it with your cluster endpoint
cluster_ca_certificate = base64decode(var.cluster_certificate_authority_data) # Replace it with your cluster certificate authority data
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", var.cluster_name] # Replace it with your cluster name
}
}
}

resource "helm_release" "nops_kubernetes_agent" {
name = "nops-kubernetes-agent"
namespace = "nops"
create_namespace = true

repository = "oci://public.ecr.aws/nops"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
description = "Helm Chart for nOps kubernetes agent"
chart = "kubernetes-agent"
version = "0.1.6" # Ensure to update this to the latest/desired version: https://gallery.ecr.aws/nops/kubernetes-agent

# Example to place Prometheus deployment in a on-demand node provisioned by Karpenter (THIS IS THE RECOMMENDED WAY TO RUN PROMETHEUS, Note: using double backslashes (\\) to escape the dot in karpenter.sh/capacity-type)
#set {
# name = "prometheus.server.nodeSelector.karpenter\\.sh/capacity-type"
# value = "on-demand"
#}

set {
name = "nops.apiKey"
value = "<your_karpenops_api_key" # Get it from the nOps kubernetes agent onboarding process
}

set {
name = "datadog.apiKey"
value = "<datadog_api_key>" # Get it from the nOps kubernetes agent onboarding process
}

set {
name = "containerInsights.enabled"
value = "true" # Set it to true to install container insights agent to gain cost visibility in your EKS cluster
}

set {
name = "containerInsights.imageTag"
value = "2.0.4" # Ensure to update this to the latest/desired version from [here](https://gallery.ecr.aws/nops/container-insights-agent)
}

set {
name = "containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN"
value = "<your_eks_cluster_arn>" # Get it from the nOps kubernetes agent onboarding process
}

set {
name = "containerInsights.env_variables.APP_AWS_S3_BUCKET"
value = "<your_s3_bucket_name>" # Get it from the nOps kubernetes agent onboarding process
}

set {
name = "karpenops.enabled"
value = "true" # Set it to true if you have karpenter running in your EKS Cluster and want the karpenOps agent to manage your EC2NodeTemplate/NodePool
}

set {
name = "karpenops.image.tag"
value = "1.23.2" # Ensure to update this to the latest/desired version: https://gallery.ecr.aws/nops/karpenops
}

set {
name = "karpenops.clusterId"
value = "<your_karpenops_cluster_id>" # Get it from the nOps kubernetes agent onboarding process
}
}

Create Addon (EKS Blueprints)

provider "aws" {
region = "us-east-1"
alias = "virginia"
}

data "aws_ecrpublic_authorization_token" "token" {
provider = aws.virginia
}

provider "helm" {
kubernetes {
host = var.cluster_endpoint # Replace it with your cluster endpoint
cluster_ca_certificate = base64decode(var.cluster_certificate_authority_data) # Replace it with your cluster certificate authority data
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", var.cluster_name] # Replace it with your cluster name
}
}
}

module "eks_blueprints_addon" {
source = "aws-ia/eks-blueprints-addon/aws"
version = "~> 1.0"
chart = "kubernetes-agent"
chart_version = "0.1.6" # Ensure to update this to the latest/desired version: https://gallery.ecr.aws/nops/kubernetes-agent
repository = "oci://public.ecr.aws/nops"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
description = "Helm Chart for nOps kubernetes agent"
namespace = "nops"
create_namespace = true
set = [
# Example to place Prometheus deployment in a on-demand node provisioned by Karpenter (THIS IS THE RECOMMENDED WAY TO RUN PROMETHEUS, Note: using double backslashes (\\) to escape the dot in karpenter.sh/capacity-type)
#{
# name = "prometheus.server.nodeSelector.karpenter\\.sh/capacity-type"
# value = "on-demand"
#},
{
name = "nops.apiKey"
value = "<your_karpenops_api_key" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "datadog.apiKey" # Get it from the nOps kubernetes agent onboarding process
value = "<datadog_api_key>" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "containerInsights.enabled"
value = "true" # Set it to true to install container insights agent to gain cost visibility in your EKS cluster
},
{
name = "containerInsights.imageTag"
value = "2.0.4" # Ensure to update this to the latest/desired version from [here](https://gallery.ecr.aws/nops/container-insights-agent)
},
{
name = "containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN"
value = "<your_eks_cluster_arn>" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "containerInsights.env_variables.APP_AWS_S3_BUCKET"
value = "<your_s3_bucket_name>" # Get it from the nOps kubernetes agent onboarding process
},
{
name = "karpenops.enabled"
value = "true" # Set it to true if you have karpenter running in your EKS Cluster and want the karpenOps agent to manage your EC2NodeTemplate/NodePool
},
{
name = "karpenops.image.tag"
value = "1.23.2" # Ensure to update this to the latest/desired version from [here](https://gallery.ecr.aws/nops/karpenops)
},
{
name = "karpenops.clusterId"
value = "<your_karpenops_cluster_id>" # Get it from the nOps kubernetes agent onboarding process
}
]
}

Required Parameters

The following table lists required configuration parameters for the KarpenOps and Container Insights agents and their default values.

ParameterDescriptionDefault
nops.apiKeynOps agent API Key-
datadog.apiKeyDatadog API Key.-
containerInsights.enabledWheter to install Container Insights agent or not.true
containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARNEKS Cluster ARN.-
containerInsights.env_variables.APP_AWS_S3_BUCKETS3 Bucket name.-
karpenops.enabledWheter to install KarpenOps agent or not.false
karpenops.clusterIdKarpenOps Cluster ID-

Optional Parameters

The following table lists the optional configuration parameters for the KarpenOps and Container Insights agents and their default values.

ParameterDescriptionDefault
externalSecrets.enabledExternal Secrets integration, more info here.false
externalSecrets.secretStoreRef.nameName of the ClusterSecretStore.-
externalSecrets.data.apiKeys.remoteRef.keyName of the secret in AWS Secrets Manager.-
autoUpdater.enabledWether to enable the Auto Update CronJob that updates the latest image tag for KarpenOps, Container Insights and Opencost kubernetes resources, defaults to truetrue
autoUpdater.scheduleSchedule to run the Auto Update CronJob, defaults to run every Monday at 12:00 AM0 0 * * 1
autoUpdater.repositoryRepository for the Auto Update container imagepublic.ecr.aws/nops/alpine/k8s
autoUpdater.imageTagImage tag for the autoUpdater container image1.30.4
autoUpdater.successfulJobsHistoryLimitNumber of successful finished jobs to keep for the Auto Update0
autoUpdater.failedJobsHistoryLimitNumber of failed finished jobs to keep for the Auto Update0
datadog.repositoryRepository for the Data Dog Agent container imagepublic.ecr.aws/nops/datadog/agent
datadog.imageTagImage tag for the Data Dog Agent container image7.56.0
containerInsights.debugDebug mode.false
containerInsights.repositoryRepository for the nOps Container Insights Agent container imagepublic.ecr.aws/nops/container-insights-agent
containerInsights.imageTagImage tag for the nOps Container Insights Agent container image2.0.4
containerInsights.successfulJobsHistoryLimitNumber of successful finished jobs to keep for the nOps Container Insights Agent2
containerInsights.failedJobsHistoryLimitNumber of failed finished jobs to keep for the nOps Container Insights Agent2
containerInsights.resources.limits.cpuContainer Insights Agent CPU Limit500m
containerInsights.resources.limits.memoryContainer Insights Agent Memory Limit4Gi
containerInsights.resources.requests.cpuContainer Insights Agent CPU Request500m
containerInsights.resources.requests.memoryContainer Insights Agent Memory Request2Gi
containerInsights.backoffLimitNumber of retries for failed Container Insights cronjobs3
opencost.loglevelLog level for nOps-cost.info
opencost.opencost.exporter.image.registryRegistry for the Opencost Exporter container imagepublic.ecr.aws
opencost.opencost.exporter.image.repositoryRepository for the Opencost Exporter container imagenops/opencost
opencost.opencost.exporter.image.tagImage tag for the Opencost Exporter container image1.111.0
karpenops.image.repositoryRepository for the KarpenOps Agent container imagepublic.ecr.aws/nops/karpenops
karpenops.image.tagImage tag for the KarpenOps Agent container image1.23.2
heartbeat.image.repositoryRepository for the Heartbeat Agent container imagepublic.ecr.aws/nops/k8s-heartbeat-agent
heartbeat.image.tagImage tag for the Heartbeat Agent container image0.1.4
dataFetcher.image.repositoryRepository for the Data Fetcher Agent container imagepublic.ecr.aws/nops/k8s-data-fetcher-agent
dataFetcher.image.tagImage tag for the Data Fetcher Agent container image0.1.2
dataFetcher.replicaCountNumber of replicas for Data Fetcher Agent.2
dataFetcher.resources.requests.cpuData Fetcher CPU resource requests.200m
dataFetcher.resources.requests.memoryData Fetcher Memory resource requests.1Gi
dataFetcher.resources.limits.cpuData Fetcher CPU resource limits.1000m
dataFetcher.resources.limits.memoryData Fetcher Memory resource limits.3Gi
dcgmExporter.image.repositoryRepository for the DCGM Exporter container imagepublic.ecr.aws/nops/nvidia/dcgm-exporter
dcgmExporter.image.tagImage tag for the DCGM Exporter container image3.3.6-3.4.2-ubuntu22.04
prometheus.ipv6_enabledWether IPv6 is configured for the EKS clusterfalse
prometheus.deleteLogFile.repositoryRepository for the Busybox container imagepublic.ecr.aws/docker/library/busybox
prometheus.deleteLogFile.imageTagImage tag for the Busybox container image1.36.1
prometheus.configmapReload.prometheus.image.repositoryRepository for the Prometheus Config Reloader container imagepublic.ecr.aws/nops/prom/config-reloader
prometheus.configmapReload.prometheus.image.tagImage tag for the Prometheus Config Reloader container imagev0.76.0
prometheus.kubeStateMetrics.image.registryRegistry for the Kube State Metrics container imagepublic.ecr.aws
prometheus.kubeStateMetrics.image.repositoryRepository for the Kube State Metrics container imagenops/kube-state-metrics
prometheus.kubeStateMetrics.image.tagImage tag for the Kube State Metrics container image2.13.0
prometheus.nodeExporter.image.registryRegistry for the Node Exporter container imagepublic.ecr.aws
prometheus.nodeExporter.image.repositoryRepository for the Node Exporter container imagenops/prom/node-exporter
prometheus.nodeExporter.image.tagImage tag for the Node Exporter container imagev1.8.2
prometheus.nodeExporter.resourcesNode Exporter CPU and Memory limits and requests resources{}
prometheus.server.image.repositoryRepository for the Prometheus Server container imagepublic.ecr.aws/nops/prom/prometheus
prometheus.server.image.tagImage tag for the Prometheus Server container imagev2.54.0
prometheus.server.persistentVolume.storageClassStorageClass Name.gp2
prometheus.server.resources.requests.cpuPrometheus CPU resource requests.1000m
prometheus.server.resources.requests.memoryPrometheus Memory resource requests.4Gi
prometheus.server.resources.limits.cpuPrometheus CPU resource limits.3000m
prometheus.server.resources.limits.memoryPrometheus Memory resource limits.32Gi
prometheus.server.nodeSelectorNode Selector labels to use for Prometheus deployment.{}

Prometheus Resources

Below is a table where you can see 3 options for Prometheus memory allocation depending on your cluster size (number of pods). Use it as a baseline and adjust it according to your needs.

note

Prometheus memory limit default is 32Gi

No. of PodsMemory RequestMemory Limits
100 - 5004Gi16Gi
500 - 10008Gi24Gi
1000 or more16Gi32Gi

Frequently Asked Questions

  1. Will the agent installation affect my existing Prometheus setup? Answer: No, the agent installs its own Prometheus instance in a separate namespace, as well running Node Exporter using a different port than the default one (9100) ensuring that it does not interfere with your current Prometheus deployment and Node exporter daemonset.

  2. How can I remove the agent?

    Answer: To remove the agent from your cluster you just need to follow these steps:

    # Delete nops-k8s-agent release
    helm uninstall nops-kubernetes-agent --namespace nops
    # Delete the namespaces
    kubectl delete namespace nops
    # CRD (ServiceMonitors) created by this chart is not removed by default and should be manually cleaned up if you want to.
    kubectl delete crd servicemonitors.monitoring.coreos.com