Skip to main content

nOps Kubernetes Agent Overview

The nOps Kubernetes Agent stack is a helm chart that contains agent components that both gather metrics data in your Kubernetes cluster and enable nOps automated optimization of Karpenter and Container Rightsizing.

Kubernetes Agent Stack

Agent Stack Components

  • Container Insights Agent: The Container Insights agent is responsible for gathering all of the metrics nOps needs within your cluster.
  • nOps VPA: The nOps Vertical Pod Autoscaler (VPA) is an agent that enables automated container rightsizing within your cluster
  • Karpenops Agent: The Karpenops Agent is an agent that enables nOps to automatically optimize the Karpenter configuration of your cluster.
  • Data Fetcher Agent: The Data Fetcher Agent gathers real-time information about the workloads in your cluster to enable the Workloads tab on your cluster dashboard in the nOps application
  • Heartbeat Agent: The Hearbeat Agent syncs the status of the components in the agent stack with the nOps API to warn of any connectiveity issues
  • Image Tag Updater: The Image Tag Updater automatically updates the image versions of the containers in the agent stack, so you can stay up to date without re-running any Helm commands

All of the agent stack components are managed, deployed, and updated by the Kubernetes Agent Helm chart. All components are installed into the nops namespace by default.

Deployment Workflow

To install the nOps Agent click on the cluster's Configure to Optimize to begin data collection.

  1. API Key Setup : Generate a new API key for the nOps Agent Stack.
  2. Copy the custom command and run it in your command line.
  3. On Successful, Click Test Connectivity to confirm connectivity with the nOps Agent Stack.

With this setup, your clusters are fully equipped to gather and upload data, providing comprehensive insights into your containerized workloads.

How the Container Insights Agent Works

Stack overview

The agent operates through a series of off-the-shelf, open-source metrics exporters along with the VictoriaMetrics agent to forward metrics to Amazon Managed Prometheus.

  • Metrics Sources:

    • Promethes Node Exporter: node level usage metrics
    • Kube State Metrics: workload metrics from the Kubernetes API
    • Nvidia DCGM Exporter: GPU metrics (if your nodes have GPUs)
    • CAdvisor metrics: Container metrics from the kubelets already in your cluster
  • VictoriaMetrics Agent: A ligtweight agent that scrapes the metrics sources outlined above and, using the Prometheus remote write protocol, forwards them to Amazon Managed Service for Prometheus

  • Amazon Managed Service for Prometheus: (AMP) A centralized metrics store managed by Amazon. Deployed in the nOps AWS environment. Data is partitioned by issuing one AMP workspace per customer.

  • nOps analytics: Data is queried from AMP and analyzed in nOps' AWS environment to drive all the features of the nOps Platform

How the nOps Vertical Pod Autoscaler Works

VPA overview

The nOps Vertical Pod Autoscaler is based on the open source vertical pod autoscaler component developed by Kubernetes. It uses a mutating webhook driven by an admission controller to apply resource recommendations to the workloads in your cluster. Recommended resource settings are tracked in VerticalPodAutoscaler (VPA) custom resources. The VPA custom resources used by the nOps VPA are namespaced to enable the nOps VPA to coexist with the upstream Kubernetes VPA in the same cluster.

How the Karpenops Agent Works

Karpenops overview

The Karpenops agent enables nOps Compute Copilot to optimize the Karpenter NodeClass and NodePool settings in your cluster. It is a simple agent that uses HTTPS polling to update the status of your NodePools and NodeClasses with the nOps API, and update your NodePools based on spot compute market scores and commitment utilization from the nOps API.

How the Data Fetcher Agent Works

Stack overview

The Data Fetcher Agent utilizes the Two-Way Messaging Pattern, leveraging Amazon SQS for real-time communication with nOps. This design ensures that Kubernetes data is promptly fetched and made available in your Cluster Dashboard for seamless, up-to-date visibility.

note

Encryption: The Data Fetcher Agent communicates securely with Amazon SQS using HTTPS, ensuring encryption in transit. At this time, Server-Side Encryption (SSE) for SQS queues is not enabled.


How the Image Tag Updater Works

Image tag updater overview

The image tag updater is a CronJob that is deployed and runs nightly at midnight. Upon execution, it scans the nOps public ECR repos for updated images, and if any are found, utilizes the Kubernetes API to update the deployments/daemonsets of the nOps Kubernetes Agent components with the new image tags.

FAQ

  1. Where can I get information about Security and Compliance?
    You can find detailed reports on our stack and policies in this folder.

  2. Where can I find the CloudFormation template for inspection?
    You can find the template here.

  3. Do containers run as root?

    Most containers run as nobody with the exception of the datadog agent, which is non root but privileged, and the DCGM exporter which runs as root.

  4. Which images are used in the deployment, and are they digitally signed?
    Yes, the following images are used in our full deployment, and all of them have been digitally signed in our public ECR:

    These images are hosted in our public ECR to mitigate rate limit issues and are securely signed using AWS Signer. This ensures the integrity and authenticity of the images, supporting a reliable and trusted deployment pipeline.

  5. What is the nops-data-fetcher agent?

    The nOps Data Fetcher Agent provides real-time Kubernetes cluster insights, seamlessly integrating with your nOps dashboard. With this agent, you can visualize live data from your cluster directly within the Workloads tab, conveniently located next to the Nodes tab. To facilitate communication with nOps, the agent leverages an Amazon SQS queue for efficient and reliable data transfer.