# Balloons Policy ## Overview ### What Problems Does the Balloons Policy Solve? The balloons policy addresses CPU, device, and container affinity requirements by organizing workloads into **isolated CPU pools called "balloons."** This approach solves several key challenges: - **Flexible node resource partitioning**: Isolates and regroups containers from same or different pods, multi-pod applications and namespaces to share balloons. - **Realtime requirements and latency-sensitivity**: Minimizes latencies by isolating critical workloads, tuning cache and physical CPU core sharing, memory and device locality, CPU frequencies and powersaving states, and process scheduling parameters, including realtime scheduling policies and I/O priorities. - **Maximal server throughput**: Optimizes resource utilization by spreading memory and memory bandwidth hungry workloads across sockets, NUMA nodes and cache domains. Tunes the balance of memory accesses from lowest latency accesses to the closest memories only towards maximal bandwidth by using more memory channels through balanced cross-NUMA balloons within the same socket, or even more by crossing socket boundaries. Controls allowed memory types (DRAM, HBM, PMEM). - **Different workloads, different servers, different rules**: Allocates CPUs for different containers based on different CPU allocation preferences. Preferences per worker node or worker node group. Supports both pre-allocation of CPUs and on-demand allocations as containers are created, and both always static and dynamically growing/shrinking balloons. ### Organization of this document This document is organized so you can either read it top-to-bottom or jump directly to the sections below: - **[Integration with Kubernetes](#integration-with-kubernetes)** - **[Installation and Configuration](#installation-and-configuration)** - **[Configuration Options](#configuration-options)** - **[Container-to-Balloon Assignment](#container-to-balloon-assignment)** - **[CPUs-to-Balloon Selection](#cpus-to-balloon-selection)** - **[Memories-to-Balloon Selection](#memories-to-balloon-selection)** - **[Container Tuning](#container-tuning)** - **[CPU Tuning](#cpu-tuning)** - **[Built-in Balloon Types](#built-in-balloon-types)** - **[Toggle and Reset Pinning Memory, CPUs, and Containers](#toggle-and-reset-pinning-memory-cpus-and-containers)** - **[Visibility, Scheduling, Metrics, Logging, Debugging](#visibility-scheduling-metrics-logging-debugging)** - **[Cookbook](#cookbook)** - **[Latency-Critical Containers](#latency-critical-containers)** - **[Maximum Memory Bandwidth Containers](#maximum-memory-bandwidth-containers)** - **[Workload-Aware Hyperthread Sharing](#workload-aware-hyperthread-sharing)** - **[Troubleshooting](#troubleshooting)** ### Integration with Kubernetes The balloons policy integrates into the Kubernetes stack as an NRI (Node Resource Interface) plugin that works with container runtimes (containerd or CRI-O): 1. **Runtime Integration**: Runs as a DaemonSet on each node, intercepting container lifecycle events through NRI. 2. **Dynamic Configuration**: Uses Kubernetes Custom Resources (CRs) for configuration, supporting cluster-wide, node-group, and node-specific settings. 3. **Pod Annotations**: Allows fine-grained control through pod and container annotations. 4. **Topology Awareness**: Can expose balloon topology through NodeResourceTopology CRs for scheduler integration and for cluster-wide inspection of containers CPU affinity. The policy evaluates each container when it starts, assigns it to an appropriate balloon based on configuration rules, and sets its CPU and memory affinity accordingly. ## Installation and Configuration ### Prerequisites - Kubernetes 1.24+ - Helm 3.0.0+ - Container runtime with NRI support: - containerd 1.7.0+ or CRI-O 1.26.0+ - NRI feature enabled in the runtime (this is the default in up-to-date runtimes). ### Installing with Helm Add the NRI plugins Helm repository: ```sh helm repo add nri-plugins https://containers.github.io/nri-plugins helm repo update ``` Install the balloons policy with default configuration: ```sh helm install nri-resource-policy-balloons nri-plugins/nri-resource-policy-balloons --namespace kube-system ``` Install with a custom configuration from a values file: ```sh cat > balloons.values.helm.yaml <