# Balloons Policy ## Overview The balloons policy implements workload placement into "balloons" that are disjoint CPU pools. Size of a balloon can be fixed, or the balloon can be dynamically inflated and deflated, that is CPUs added and removed, based on the CPU resource requests of containers running in the balloon. Balloons can be static or dynamically created and destroyed. CPUs in balloons can be configured, for example, by setting min and max frequencies on CPU cores and uncore. ## How It Works 1. User configures balloon types from which the policy creates balloons. 2. A balloon has a set of CPUs and a set of containers that run on the CPUs. 3. Every container is assigned to exactly one balloon. A container is allowed to use all CPUs of its balloon and no other CPUs. 4. Every logical CPU belongs to at most one balloon. There can be CPUs that do not belong to any balloon. 5. The number of CPUs in a balloon can change during the lifetime of the balloon. If a balloon inflates, that is CPUs are added to it, all containers in the balloon are allowed to use more CPUs. If a balloon deflates, the opposite is true. 6. When a new container is created on a Kubernetes node, the policy first decides the type of the balloon that will run the container. The decision is based on annotations of the pod, or the namespace if annotations are not given. 7. Next the policy decides which balloon of the decided type will run the container. Options are: - an existing balloon that already has enough CPUs to run its current and new containers - an existing balloon that can be inflated to fit its current and new containers - new balloon. 9. When a CPU is added to a balloon or removed from it, the CPU is reconfigured based on balloon's CPU class attributes, or idle CPU class attributes. ## Deployment Deploy nri-resource-policy-balloons on each node as you would for any other policy. See [deployment](../../deployment/index.md) for more details. ## Configuration The balloons policy is configured using BalloonsPolicy Custom Resources. See [setup and usage](../setup.md#setting-up-nri-resource-policy) for more details on managing the configuration. ### Parameters Balloons policy parameters: - `reservedResources`: - `cpu` specifies cpuset or number of CPUs in the special `reserved` balloon. By default all containers in the `kube-system` namespace are assigned to the reserved balloon. Examples: `cpu: cpuset:0,48` uses two logical CPUs: cpu0 and cpu48. `cpu: 2000m` uses any two CPUs. If minCPUs are explicitly defined for the `reserved` balloon, that number of CPUs will be allocated from the `cpuset` and more later (up to `maxCpus`) as needed. - `pinCPU` controls pinning a container to CPUs of its balloon. The default is `true`: the container cannot use other CPUs. - `pinMemory` controls pinning a container to the memories that are closest to the CPUs of its balloon. The default is `true`: allow using memory only from the closest NUMA nodes. Warning: this may cause kernel to kill containers due to out-of-memory error when closest NUMA nodes do not have enough memory. In this situation consider switching this option `false`. - `idleCPUClass` specifies the CPU class of those CPUs that do not belong to any balloon. - `reservedPoolNamespaces` is a list of namespaces (wildcards allowed) that are assigned to the special reserved balloon, that is, will run on reserved CPUs. This always includes the `kube-system` namespace. - `allocatorTopologyBalancing` affects selecting CPUs for new balloons. If `true`, new balloons are created using CPUs on NUMA/die/package with most free CPUs, that is, balloons are spread across the hardware topology. This helps inflating balloons within the same NUMA/die/package and reduces interference between containers in balloons when system is not fully loaded. The default is `false`: pack new balloons tightly into the same NUMAs/dies/packages. This helps keeping large portions of hardware idle and entering into deep power saving states. - `preferSpreadOnPhysicalCores` prefers allocating logical CPUs (possibly hyperthreads) for a balloon from separate physical CPU cores. This prevents containers in the balloon from interfering with themselves as they do not compete on the resources of the same CPU cores. On the other hand, it allows more interference between containers in different balloons. The default is `false`: balloons are packed tightly to a minimum number of physical CPU cores. The value set here is the default for all balloon types, but it can be overridden with the balloon type specific setting with the same name. - `balloonTypes` is a list of balloon type definitions. The order of the types is significant in two cases. In the first case the policy pre-creates balloons and allocates their CPUs when it starts or is reconfigured, see `minBalloons` and `minCPUs` below. Balloon types with the highest `allocatorPriority` will get their CPUs in the listed order. Balloon types with a lower `allocatorPriority` will get theirs in the same order after them. In the second case the policy looks for a balloon type for a new container. If annotations do not specify it, the container will be be assignd to the first balloon type in the list with matching criteria, for instance based on `namespaces` below. Each balloon type can be configured with following parameters: - `name` of the balloon type. This is used in pod annotations to assign containers to balloons of this type. - `namespaces` is a list of namespaces (wildcards allowed) whose pods should be assigned to this balloon type, unless overridden by pod annotations. - `groupBy` groups containers into same balloon instances if their GroupBy expressions evaluate to the same group. Expressions are strings where key references like `${pod/labels/mylabel}` will be substituted with corresponding values. - `matchExpressions` is a list of container match expressions. These expressions are evaluated for all containers which have not been assigned otherwise to other balloons. If an expression matches, IOW it evaluates to true, the container gets assigned to this balloon type. Container mach expressions have the same syntax and semantics as the scope and match expressions in container affinity annotations for the topology-aware policy. See the [affinity documentation](./topology-aware.md#affinity-semantics) for a detailed description of expressions. - `minBalloons` is the minimum number of balloons of this type that is always present, even if the balloons would not have any containers. The default is 0: if a balloon has no containers, it can be destroyed. - `maxBalloons` is the maximum number of balloons of this type that is allowed to co-exist. The default is 0: creating new balloons is not limited by the number of existing balloons. - `maxCPUs` specifies the maximum number of CPUs in any balloon of this type. Balloons will not be inflated larger than this. 0 means unlimited. - `minCPUs` specifies the minimum number of CPUs in any balloon of this type. When a balloon is created or deflated, it will always have at least this many CPUs, even if containers in the balloon request less. - `cpuClass` specifies the name of the CPU class according to which CPUs of balloons are configured. Class properties are defined in separate `cpu.classes` objects, see below. - `preferCloseToDevices`: prefer creating new balloons close to listed devices. List of strings - `preferSpreadingPods`: if `true`, containers of the same pod should be spread to different balloons of this type. The default is `false`: prefer placing containers of the same pod to the same balloon(s). - `preferPerNamespaceBalloon`: if `true`, containers in the same namespace will be placed in the same balloon(s). On the other hand, containers in different namespaces are preferrably placed in different balloons. The default is `false`: namespace has no effect on choosing the balloon of this type. - `preferNewBalloons`: if `true`, prefer creating new balloons over placing containers to existing balloons. This results in preferring exclusive CPUs, as long as there are enough free CPUs. The default is `false`: prefer filling and inflating existing balloons over creating new ones. - `shareIdleCPUsInSame`: Whenever the number of or sizes of balloons change, idle CPUs (that do not belong to any balloon) are reshared as extra CPUs to containers in balloons with this option. The value sets locality of allowed extra CPUs that will be common to these containers. - `system`: containers are allowed to use idle CPUs available anywhere in the system. - `package`: ...allowed to use idle CPUs in the same package(s) (sockets) as the balloon. - `die`: ...in the same die(s) as the balloon. - `numa`: ...in the same numa node(s) as the balloon. - `core`: ...allowed to use idle CPU threads in the same cores with the balloon. - `preferSpreadOnPhysicalCores` overrides the policy level option with the same name in the scope of this balloon type. - `preferCloseToDevices` prefers creating new balloons close to listed devices. If all preferences cannot be fulfilled, preference to first devices in the list override preferences to devices after them. Adding this preference to any balloon type automatically adds corresponding anti-affinity to other balloon types that do not prefer to be close to the same device: they prefer being created away from the device. Example: ``` preferCloseToDevices: - /sys/class/net/eth0 - /sys/class/block/sda ``` - `allocatorPriority` (0: High, 1: Normal, 2: Low, 3: None). CPU allocator parameter, used when creating new or resizing existing balloons. If there are balloon types with pre-created balloons (`minBalloons` > 0), balloons of the type with the highest `allocatorPriority` are created first. - `control.cpu.classes`: defines CPU classes and their properties. Class names are keys followed by properties: - `minFreq` minimum frequency for CPUs in this class (kHz). - `maxFreq` maximum frequency for CPUs in this class (kHz). - `uncoreMinFreq` minimum uncore frequency for CPUs in this class (kHz). If there are differences in `uncoreMinFreq`s in CPUs within the same uncore frequency zone, the maximum value of all `uncoreMinFreq`s is used. - `uncoreMaxFreq` maximum uncore frequency for CPUs in this class (kHz). - `instrumentation`: configures interface for runtime instrumentation. - `httpEndpoint`: the address the HTTP server listens on. Example: `:8891`. - `prometheusExport`: if set to True, balloons with their CPUs and assigned containers are readable through `/metrics` from the httpEndpoint. - `reportPeriod`: `/metrics` aggregation interval. ### Example Example configuration that runs all pods in balloons of 1-4 CPUs. Instrumentation enables reading CPUs and containers in balloons from `http://localhost:8891/metrics`. ```yaml apiVersion: config.nri/v1alpha1 kind: BalloonsPolicy metadata: name: default namespace: kube-system spec: reservedResources: cpu: 1000m pinCPU: true pinMemory: true allocatorTopologyBalancing: true idleCPUClass: lowpower balloonTypes: - name: "quad" maxCPUs: 4 cpuClass: dynamic namespaces: - "*" control: cpu: classes: lowpower: minFreq: 800000 maxFreq: 800000 dynamic: minFreq: 800000 maxFreq: 3600000 turbo: minFreq: 3000000 maxFreq: 3600000 uncoreMinFreq: 2000000 uncoreMaxFreq: 2400000 instrumentation: httpEndpoint: :8891 prometheusExport: true ``` ## Assigning a Container to a Balloon The balloon type of a container can be defined in pod annotations. In the example below, the first annotation sets the balloon type (`BT`) of a single container (`CONTAINER_NAME`). The last two annotations set the balloon type for all containers in the pod. This will be used unless overridden with the container-specific balloon type. ```yaml balloon.balloons.resource-policy.nri.io/container.CONTAINER_NAME: BT balloon.balloons.resource-policy.nri.io/pod: BT balloon.balloons.resource-policy.nri.io: BT ``` If the pod does not have these annotations, the container is matched to `matchExpressions` and `namespaces` of each type in the `balloonType`s list. The first matching balloon type is used. If the container does not match any of the balloon types, it is assigned to the `default` balloon type. Parameters for this balloon type can be defined explicitly among other balloon types. If they are not defined, a built-in `default` balloon type is used. ## Disabling CPU or Memory Pinning of a Container Some containers may need to run on all CPUs or access all memories without restrictions. Annotate these pods and containers to prevent the resource policy from touching their CPU or memory pinning. ```yaml cpu.preserve.resource-policy.nri.io/container.CONTAINER_NAME: "true" cpu.preserve.resource-policy.nri.io/pod: "true" cpu.preserve.resource-policy.nri.io: "true" memory.preserve.resource-policy.nri.io/container.CONTAINER_NAME: "true" memory.preserve.resource-policy.nri.io/pod: "true" memory.preserve.resource-policy.nri.io: "true" ``` ## Metrics and Debugging In order to enable more verbose logging and metrics exporting from the balloons policy, enable instrumentation and policy debugging from the nri-resource-policy global config: ```yaml instrumentation: # The balloons policy exports containers running in each balloon, # and cpusets of balloons. Accessible in command line: # curl --silent http://localhost:8891/metrics HTTPEndpoint: :8891 PrometheusExport: true logger: Debug: policy ```