nOps Kubernetes Agent Overview
The nOps Kubernetes Agent stack is a helm chart that contains agent components that both gather metrics data in your Kubernetes cluster and enable nOps automated optimization of Karpenter and Container Rightsizing.
Agent Stack Components
- Container Insights Agent: The Container Insights agent is responsible for gathering all of the metrics nOps needs within your cluster.
- nOps VPA: The nOps Vertical Pod Autoscaler (VPA) is an agent that enables automated container rightsizing within your cluster
- Karpenops Agent: The Karpenops Agent is an agent that enables nOps to automatically optimize the Karpenter configuration of your cluster.
- Data Fetcher Agent: The Data Fetcher Agent gathers real-time information about the workloads in your cluster to enable the Workloads tab on your cluster dashboard in the nOps application
- Heartbeat Agent: The Hearbeat Agent syncs the status of the components in the agent stack with the nOps API to warn of any connectiveity issues
- Image Tag Updater: The Image Tag Updater automatically updates the image versions of the containers in the agent stack, so you can stay up to date without re-running any Helm commands
All of the agent stack components are managed, deployed, and updated by the Kubernetes Agent Helm chart. All components are installed into the nops namespace by default.
Deployment Workflow
To install the nOps Agent click on the cluster's Configure to Optimize to begin data collection.
- API Key Setup : Generate a new API key for the nOps Agent Stack.
- Copy the custom command and run it in your command line.
- On Successful, Click Test Connectivity to confirm connectivity with the nOps Agent Stack.

With this setup, your clusters are fully equipped to gather and upload data, providing comprehensive insights into your containerized workloads.
How the Container Insights Agent Works

The agent operates through a series of off-the-shelf, open-source metrics exporters along with the VictoriaMetrics agent to forward metrics to Amazon Managed Prometheus.
-
Metrics Sources:
- Promethes Node Exporter: node level usage metrics
- Kube State Metrics: workload metrics from the Kubernetes API
- Nvidia DCGM Exporter: GPU metrics (if your nodes have GPUs)
- CAdvisor metrics: Container metrics from the
kubeletsalready in your cluster
-
VictoriaMetrics Agent: A ligtweight agent that scrapes the metrics sources outlined above and, using the Prometheus remote write protocol, forwards them to Amazon Managed Service for Prometheus
-
Amazon Managed Service for Prometheus: (AMP) A centralized metrics store managed by Amazon. Deployed in the nOps AWS environment. Data is partitioned by issuing one AMP workspace per customer.
-
nOps analytics: Data is queried from AMP and analyzed in nOps' AWS environment to drive all the features of the nOps Platform
How the nOps Vertical Pod Autoscaler Works

The nOps Vertical Pod Autoscaler is based on the open source vertical pod autoscaler component developed by Kubernetes. It uses a mutating webhook driven by an admission controller to apply resource recommendations to the workloads in your cluster. Recommended resource settings are tracked in VerticalPodAutoscaler (VPA) custom resources. The VPA custom resources used by the nOps VPA are namespaced to enable the nOps VPA to coexist with the upstream Kubernetes VPA in the same cluster.
How the Karpenops Agent Works

The Karpenops agent enables nOps Compute Copilot to optimize the Karpenter NodeClass and NodePool settings in your cluster. It is a simple agent that uses HTTPS polling to update the status of your NodePools and NodeClasses with the nOps API, and update your NodePools based on spot compute market scores and commitment utilization from the nOps API.
How the Data Fetcher Agent Works

The Data Fetcher Agent utilizes the Two-Way Messaging Pattern, leveraging Amazon SQS for real-time communication with nOps. This design ensures that Kubernetes data is promptly fetched and made available in your Cluster Dashboard for seamless, up-to-date visibility.
Encryption: The Data Fetcher Agent communicates securely with Amazon SQS using HTTPS, ensuring encryption in transit. At this time, Server-Side Encryption (SSE) for SQS queues is not enabled.
How the Image Tag Updater Works

The image tag updater is a CronJob that is deployed and runs nightly at midnight. Upon execution, it scans the nOps public ECR repos for updated images, and if any are found, utilizes the Kubernetes API to update the deployments/daemonsets of the nOps Kubernetes Agent components with the new image tags.
Helm Configuration Reference
Recommended Installation Flow
Configuration starts in the nOps UI (Optimize → EKS Optimization). Selecting Configure to Optimize or Manage Configuration opens a modal that:
- Generates an nOps API key.
- Provides a Helm command targeting
oci://public.ecr.aws/nops/kubernetes-agentwith all required settings and credentials pre-populated.
Run that command as-is for standard installs. When cluster policies require adjustments, copy the Helm command and set additional configuration values on the command line using --set/--set-json or supply a values file via -f values.yaml.
Example helm command from the UI:
helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent --namespace nops --create-namespace \
--set datadog.apiKey=********************** \
--set containerInsights.enabled=true \
--set containerInsights.centralized=true \
--set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-west-2:123456789012:cluster/my-sample-cluster \
--set containerInsights.ampID=********************* \
--set containerInsights.ampRegion=us-west-2 \
--set containerInsights.ampAccessKey=****************** \
--set containerInsights.ampSecretKey=************************* \
--set karpenops.enabled=true \
--set karpenops.image.tag=1.25.1 \
--set karpenops.clusterId=***** \
--set nops.apiKey=********************************
Required Values Provided by the UI
| Value path | Description | Where to find it |
|---|---|---|
nops.apiKey | Authenticates the agent stack with the nOps API. | Generated in the cluster configuration modal or Settings → API Key. |
containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN | Identifies the EKS cluster in AMP and nOps backends. Unique per cluster. | Included in the cluster Helm command in the UI. |
containerInsights.ampID / containerInsights.ampRegion | Points the agent to your AMP workspace and region. Unique per nOps account. | Included in the cluster Helm command in the UI. |
containerInsights.ampAccessKey / containerInsights.ampSecretKey | Scoped AWS credentials for AMP remote-write. Unique per nOps account. | Included in the cluster Helm command in the UI. |
karpenops.clusterId | Internal identifier for Karpenter optimization. Unique per cluster | Included in the cluster Helm command in the UI for Karpenter clusters. |
datadog.apiKey | Optional key for support-assisted diagnostics. Unique per nOps account. | Included in the cluster Helm command in the UI. |
Global Scheduling & Metadata Controls
nops.global applies shared scheduling policies and metadata to most components (daemonsets such as the node exporter maintain their own rules).
| Value path | Default | Description |
|---|---|---|
nops.global.nodeSelector | {} | Apply a node selector to all agent workloads (excluding Prometheus node-exporter and NVIDIA dcgm-exporter daemonsets). |
nops.global.tolerations | [] | Apply tolerations to all agent workloads to enable scheduling on tainted nodes. |
nops.global.affinity | {} | Apply affinities to all agent workloads (excluding Prometheus node-exporter and NVIDIA dcgm-exporter daemonsets). |
nops.global.labels | {} | Attach labels to every agent resource. Useful for ownership or cost-center metadata, etc. |
nops.global.annotations | {} | Attach annotations to all workloads (deployments, daemonsets, cronjobs), pods, and service resources. |
Example:
nops:
global:
# Node selectors limit where pods can run based on node labels.
nodeSelector:
kubernetes.io/os: linux
# Tolerations allow pods to run on nodes with specific taints.
tolerations:
- key: nops/agents
operator: Exists
effect: NoSchedule
# Labels are applied to all pods created by the chart.
labels:
cost-center: finops
# Affinity defines scheduling preferences for pods.
# In this example, pods will prefer (but not require) nodes
# labeled with "nops/role=compute".
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: nops/role
operator: In
values:
- compute
annotations:
owner: platform
shared: global-default
Component-level settings (containerInsights.nodeSelector, etc.) remain available if a specific workload needs different scheduling or metadata.
When combining global and component-level scheduling or metadata, component keys win on collision. Keep shared keys consistent to avoid surprises.
Component-Specific Annotation fields
Set component fields instead of or alongside nops.global.annotations to add/override keys on a component-specific basis.
- Data Fetcher
- Workload:
dataFetcher.deploymentAnnotations - Pod:
dataFetcher.podAnnotations
- Workload:
- Karpenops
- Workload:
karpenops.deploymentAnnotations - Pod:
karpenops.podAnnotations - Service Account:
karpenops.serviceAccount.annotations
- Workload:
- Heartbeat Agent
- Workload:
heartbeat.deploymentAnnotations - Pod:
heartbeat.podAnnotations
- Workload:
- DCGM Exporter
- Workload:
dcgmExporter.daemonsetAnnotations - Pod:
dcgmExporter.podAnnotations - Service:
dcgmExporter.service.annotations
- Workload:
- Kube State Metrics
- Workload:
prometheus.kubeStateMetrics.annotations - Pod:
prometheus.kubeStateMetrics.podAnnotations - Service:
prometheus.kubeStateMetrics.service.annotations(merged withprometheus.io/scrapeif enabled)
- Workload:
- Prometheus Node Exporter
- Workload:
prometheus.nodeExporter.daemonsetAnnotations - Pod:
prometheus.nodeExporter.podAnnotations - Service:
prometheus.nodeExporter.service.annotations
- Workload:
- Victoria Metrics Agent (centralized/server mode)
- Workload:
victoriaMetrics.annotations - Pod:
victoriaMetrics.podAnnotations - Service Account:
victoriaMetrics.serviceAccount.annotations
- Workload:
- Image Tag Updater
- CronJob:
autoUpdater.cronjobAnnotations - Pod:
autoUpdater.podAnnotations
- CronJob:
Secrets & Credential Sources
| Value path | Usage | Guidance |
|---|---|---|
externalSecrets.enabled | Legacy integration with External Secrets Operator (secretStoreRef + data.apiKeys.remoteRef.key). | Still supported but superseded by customer-managed secrets. Prefer the operator-agnostic approach below. |
customerManagedSecrets.enabled | Reference Kubernetes secrets you provision in the nops namespace using any workflow (GitOps, SealedSecrets, manual kubectl). Populate apiSecretName, ampSecretName, etc. | Recommended when you manage secrets centrally; remove the matching --set flags from the Helm command. |
If both toggles stay false, the chart relies on credentials supplied directly through the UI-generated --set flags.
External Secrets Operator support
For details on configuring the external secrets operator, see the documentation page for external secrets operator support.
Customer Managed Secrets support
The customer managed secrets feature of the agent helm chart is an operator-agnostic facility for managing agent secrets externally.
It simply assumes that the secrets required by the agent will be present in a Kubernetes Secret (or possibly multiple Secrets) resource in the chart namespace (nops).
This Secret resource can be managed by any operator (External Secrets Operator/Vault/etc) or by hand if desired.
Example:
Assuming you have an operator managed secret called nops-secret, you can configure the agent Helm chart to use these secret by setting chart values as shown below.
Kubernetes secret, managed by your operator:
apiVersion: v1
kind: Secret
metadata:
name: nops-secret
namespace: nops
type: Opaque
data:
API_KEY: QUJDREVGR0hJSktMTU5PUA== # "ABCDEFGHIJKLMNOP"
DD_KEY: ZGVtb19kYXRhZG9nX2tleQ== # "demo_datadog_key"
AMP_ACCESS_KEY: QU1QX0FDQ0VTU19LRVk= # "AMP_ACCESS_KEY"
AMP_SECRET_KEY: QU1QX1NFQ1JFVF9LRVkxMjM0NQ== # "AMP_SECRET_KEY12345"
Agent Helm chart values to set:
customerManagedSecrets:
enabled: true
apiSecretName: "nops-secret"
apiSecretKey: "API_KEY"
ddSecretName: "nops-secret"
ddSecretKey: "DD_KEY"
ampAccessName: "nops-secret"
ampAccessKey: "AMP_ACCESS_KEY"
ampSecretName: "nops-secret"
ampSecretKey: "AMP_SECRET_KEY"
Note that, at a minimum, you must provide a secret containing the API secret key, the AMP access key, and the AMP secret key. These do not need to be in the same Secret resource, but they must all be specified and managed by your operator, etc.
Disabling Components
We strive to make sure the default configuration of the agent helm chart provides valuable functionality with minimal configuration. However, if you determine that you need a custom configuration, some components of the agent helm chart can be disabled. If you have questions about disabling components, please consult nOps support.
| Feature | Toggle | Default | Notes |
|---|---|---|---|
| Image Tag Updater | autoUpdater.enabled | true | Keeps images current; disable only if change control forbids automation. |
| Karpenops | karpenops.enabled | true in UI command | Disable on clusters without Karpenter or if you manage NodePools manually. |
| Container Rightsizing | containerRightsizing.enabled | true | Turn off to disabe nOps VPA (container rightsizing) automation |
| Datadog Agent | datadog.enabled | false | Enable temporarily when nOps support requests additional telemetry. |
| NVIDIA DCGM Exporter | dcgmExporter.enabled | true | Safe to leave enabled: pods will only be scheduled on nodes with GPUs; set false on GPU-free clusters. |
| Data Fetcher | dataFetcher.enabled | true | May be disabled for sensitive environments; dashboards relying on live workload data will lose detail. |
Example Helm commands
Only install Container Insights; disable all automation/integration components (container rightsizing, Karpenter, etc):
helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent --namespace nops --create-namespace \
--set datadog.apiKey=********************** \
--set containerInsights.enabled=true \
--set containerInsights.centralized=true \
--set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-west-2:123456789012:cluster/my-sample-cluster \
--set containerInsights.ampID=********************* \
--set containerInsights.ampRegion=us-west-2 \
--set containerInsights.ampAccessKey=****************** \
--set containerInsights.ampSecretKey=************************* \
--set karpenops.enabled=false \
--set autoUpater.enabled=false \
--set dataFetcher.enabled=false \
--set containerRightsizing.enabled=false \
--set nops.apiKey=********************************
Only install Karpenter integration; disable all other components:
helm upgrade -i nops-kubernetes-agent oci://public.ecr.aws/nops/kubernetes-agent --namespace nops --create-namespace \
--set datadog.apiKey=******************************** \
--set containerInsights.env_variables.APP_NOPS_K8S_AGENT_CLUSTER_ARN=arn:aws:eks:us-west-2:123456789012:cluster/my-sample-cluster \
--set containerInsights.enabled=false \
--set autoUpater.enabled=false \
--set dataFetcher.enabled=false \
--set containerRightsizing.enabled=false \
--set karpenops.enabled=true \
--set karpenops.image.tag=1.25.1 \
--set karpenops.clusterId=***** \
--set nops.apiKey=********************************
(Note: The heartbeat agent is still installed in these configuration examples. Autoupdate functionality is not installed in these configuration examples.)
FAQ
-
Where can I get information about Security and Compliance?
You can find detailed reports on our stack and policies in this folder. -
Where can I find the CloudFormation template for inspection?
You can find the template here. -
Do containers run as root?
Most containers run as nobody with the exception of the datadog agent, which is non root but privileged, and the DCGM exporter which runs as root.
-
Which images are used in the deployment, and are they digitally signed?
Yes, the following images are used in our full deployment, and all of them have been digitally signed in our public ECR:- container-insights-agent
- k8s-heartbeat-agent
- k8s-data-fetcher-agent
- karpenops
- opencost
- alpine/k8s
- datadog agent
- kube-state-metrics
- nvidia/dcgm-exporter
- prometheus config-reloader
- prometheus node-exporter
- prometheus server
- victoriametric agent
These images are hosted in our public ECR to mitigate rate limit issues and are securely signed using AWS Signer. This ensures the integrity and authenticity of the images, supporting a reliable and trusted deployment pipeline.
-
What is the
nops-data-fetcheragent?The nOps Data Fetcher Agent provides real-time Kubernetes cluster insights, seamlessly integrating with your nOps dashboard. With this agent, you can visualize live data from your cluster directly within the Workloads tab, conveniently located next to the Nodes tab. To facilitate communication with nOps, the agent leverages an Amazon SQS queue for efficient and reliable data transfer.
-
Where do I supply cluster-specific IDs and AMP credentials?
Use the Helm command shown in the configuration modal. It already sets the nOps API key, AMP workspace ID and keys, and the cluster ARN. If you move to customer-managed secrets, create those secrets first, then remove the corresponding--setflags from the command. -
Can I disable Karpenops or other optional components?
Yes—set the relevant*.enabledvalue (for examplekarpenops.enabled: false) in your override file. Keep Container Insights, VictoriaMetrics, Data Fetcher, and Heartbeat enabled so the nOps platform continues to receive metrics and health signals. -
How do I add tolerations for tainted system nodes?
Add them undernops.global.tolerations(preferred) or within the specific component’stolerationslist. The chart merges global and component-level tolerations, so you only need to define each entry once.