# Common Functionality ## Overview There is some common functionality implemented by the generic resource management infrastructure shared by all resource policy plugin implementations. This functionality is available in all policies, unless stated otherwise in the policy-specific documentation. ## Cache Allocation Plugins can be configured to exercise class-based control over the L2 and L3 cache allocated to containers' processes. In practice, containers are assigned to classes. Classes have a corresponding cache allocation configuration. This configuration is applied to all containers and subsequently to all processes started in a container. To enable cache control use the `control.rdt.enable` option which defaults to `false`. Plugins can be configured to assign containers by default to a cache class named after the Pod QoS class of the container: one of `BestEffort`, `Burstable`, and `Guaranteed`. The configuration setting controlling this behavior is `control.rdt.usagePodQoSAsDefaultClass` and it defaults to `false`. Additionally, containers can be explicitly annotated to be assigned to a class. Use the `rdtclass.resource-policy.nri.io` annotation key for this. For instance ```yaml apiVersion: v1 kind: Pod metadata: name: test-pod annotations: rdtclass.resource-policy.nri.io/pod: poddefaultclass rdtclass.resource-policy.nri.io/container.special-container: specialclass ... ``` This will assign the container named `special-container` within the pod to the `specialclass` RDT class and any other container within the pod to the `poddefaultclass` RDT class. Effectively these containers' processes will be assigned to the RDT CLOSes corresponding to those classes. ### Cache Class/Partitioning Configuration RDT configuration is supplied as part of the`control.rdt` configuration block. Here is a sample snippet as a Helm chart value which assigns 33%, 66% and 100% of cache lines to `BestEffort`, `Burstable` and `Guaranteed` Pod QoS class containers correspondingly: ```yaml config: control: rdt: enable: false usePodQoSAsDefaultClass: true options: l2: optional: true l3: optional: true mb: optional: true partitions: fullCache: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% classes: BestEffort: l2Allocation: all: unified: 33% l3Allocation: all: unified: 33% Burstable: l2Allocation: all: unified: 66% l3Allocation: all: unified: 66% Guaranteed: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% ``` The actual library used to implement cache control is [goresctrl](https://github.com/intel/goresctrl). Please refer to its [documentation](https://github.com/intel/goresctrl/blob/main/doc/rdt.md) for a more detailed description of configuration semantics. #### A Warning About Configuration Syntax Differences Note that the configuration syntax used for cache partitioning and classes is slightly different for [goresctrl](https://github.com/intel/goresctrl/blob/main/doc/rdt.md) and NRI Reference Plugins. When directly using goresctrl you can use a shorthand notation like this ```yaml ... classes: fullCache: l2Allocation: all: 100% l3Allocation: all: 100% ... ``` to actually mean ```yaml ... classes: fullCache: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% ... ``` This is not possible with the NRI Reference Plugins configuration CR. Here you must use the latter full syntax. ## Cache Occupancy Monitoring Metrics Plugins can be configured to export cache usage as Prometheus metrics. The following configuration options must be specified: - `control.rdt.enable` set to `true` - `instrumentation.prometheusExport` set to `true`, - `instrumentation.httpEndpoint` set to a valid non-empty value, eg. `:8891`, and - `instrumentation.metrics.enabled` set to contain `policy/rdt`, `rdt`, or `policy` When deploying with Helm, the default configuration can be modified like this: ```shell $ helm install test -n kube-system nri-plugins/nri-resource-policy-topology-aware \ --set config.control.rdt.enable=true \ --set config.instrumentation.prometheusExport=true \ --set config.instrumentation.metrics.enabled='{buildinfo,rdt}' \ --set config.log.debug='{goresctrl}' ``` Once enabled, you'll see RDT metrics similar to the following: ```shell $ kubectl port-forward -n kube-system ds/nri-resource-policy-topology-aware 9000:8891 & $ wget -q --no-proxy http://127.0.0.1:9000/metrics -O- # HELP go_build_info Build information about the main Go module. # TYPE go_build_info gauge go_build_info{checksum="",path="github.com/containers/nri-plugins",version="v0.10.0"} 1 # HELP nri_l3_llc_occupancy L3 (LLC) occupancy # TYPE nri_l3_llc_occupancy counter nri_l3_llc_occupancy{cache_id="0",rdt_class="BestEffort",rdt_mon_group=""} 655360 nri_l3_llc_occupancy{cache_id="0",rdt_class="Burstable",rdt_mon_group=""} 409600 nri_l3_llc_occupancy{cache_id="0",rdt_class="Guaranteed",rdt_mon_group=""} 0 nri_l3_llc_occupancy{cache_id="0",rdt_class="system/default",rdt_mon_group=""} 2.752512e+07 nri_l3_llc_occupancy{cache_id="1",rdt_class="BestEffort",rdt_mon_group=""} 0 nri_l3_llc_occupancy{cache_id="1",rdt_class="Burstable",rdt_mon_group=""} 0 nri_l3_llc_occupancy{cache_id="1",rdt_class="Guaranteed",rdt_mon_group=""} 491520 nri_l3_llc_occupancy{cache_id="1",rdt_class="system/default",rdt_mon_group=""} 2.818048e+07 ``` The RDT-specific set of metrics collected depends on your hardware and your kernel configuration. If supported by your environment, currently you can expect to get the following metrics related to cache occupancy: - l3_llc_occupancy: L3 (LLC) occupancy These are collected per cache ID for each RDT class/CLOS. ## Memory Bandwidth Allocation If the hardware supports it, plugins can limit per RDT class, how much memory bandwidth processes in containers in a class can use up altogether. You can enable this using a slightly modified class configuration which specifies MBA limits for each class and the partition. ```yaml config: control: rdt: enable: false usePodQoSAsDefaultClass: true options: l2: optional: true l3: optional: true mb: optional: true partitions: fullCache: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% mbAllocation: all: [ 100%, 1000Mbps ] classes: BestEffort: l2Allocation: all: unified: 33% l3Allocation: all: unified: 33% mbAllocation: all: [ 33%, 330Mbps ] Burstable: l2Allocation: all: unified: 66% l3Allocation: all: unified: 66% mbAllocation: all: [ 66%, 660Mbps ] Guaranteed: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% mbAllocation: all: [ 100%, 1000Mbps ] ``` ## Memory Bandwidth Monitoring Metrics If you have RDT-specific metrics collection enabled and your platform supports memory bandwidth monitoring, you can expect these related metrics to be exposed: - l3_mbm_local_bytes: bytes transferred to/from local memory through LLC - l3_mbm_total_bytes: total bytes transferred to/from memory through LLC An example: ```shell $ kubectl port-forward -n kube-system ds/nri-resource-policy-topology-aware 9000:8891 & $ wget -q --no-proxy http://127.0.0.1:9000/metrics -O- # HELP nri_l3_mbm_local_bytes bytes transferred to/from local memory through LLC # TYPE nri_l3_mbm_local_bytes counter nri_l3_mbm_local_bytes{cache_id="0",rdt_class="BestEffort",rdt_mon_group=""} 573440 nri_l3_mbm_local_bytes{cache_id="0",rdt_class="Burstable",rdt_mon_group=""} 1.253376e+07 nri_l3_mbm_local_bytes{cache_id="0",rdt_class="Guaranteed",rdt_mon_group=""} 0 nri_l3_mbm_local_bytes{cache_id="0",rdt_class="system/default",rdt_mon_group=""} 1.98836224e+09 nri_l3_mbm_local_bytes{cache_id="1",rdt_class="BestEffort",rdt_mon_group=""} 1.6384e+07 nri_l3_mbm_local_bytes{cache_id="1",rdt_class="Burstable",rdt_mon_group=""} 0 nri_l3_mbm_local_bytes{cache_id="1",rdt_class="Guaranteed",rdt_mon_group=""} 1.06496e+07 nri_l3_mbm_local_bytes{cache_id="1",rdt_class="system/default",rdt_mon_group=""} 1.63692544e+09 # HELP nri_l3_mbm_total_bytes total bytes transferred to/from memory through LLC # TYPE nri_l3_mbm_total_bytes counter nri_l3_mbm_total_bytes{cache_id="0",rdt_class="BestEffort",rdt_mon_group=""} 573440 nri_l3_mbm_total_bytes{cache_id="0",rdt_class="Burstable",rdt_mon_group=""} 1.59744e+07 nri_l3_mbm_total_bytes{cache_id="0",rdt_class="Guaranteed",rdt_mon_group=""} 0 nri_l3_mbm_total_bytes{cache_id="0",rdt_class="system/default",rdt_mon_group=""} 3.172352e+09 nri_l3_mbm_total_bytes{cache_id="1",rdt_class="BestEffort",rdt_mon_group=""} 2.236416e+07 nri_l3_mbm_total_bytes{cache_id="1",rdt_class="Burstable",rdt_mon_group=""} 0 nri_l3_mbm_total_bytes{cache_id="1",rdt_class="Guaranteed",rdt_mon_group=""} 1.318912e+07 nri_l3_mbm_total_bytes{cache_id="1",rdt_class="system/default",rdt_mon_group=""} 2.64511488e+09 ``` ## Metrics Specific to Monitoring Groups If there are any monitoring groups present in the system, goresctrl produces RDT metrics for those as well. You can differentiate between group specific and other metrics using the `rdt_mon_group` metrics label. Metrics specific to a monitoring group have this label set to the name of the monitoring group the metric corresponds to. ## Cache and Memory Bandwidth Allocation and Monitoring Prerequisites Note that for cache and memory bandwidth allocation and monitoring to work, you must have - a hardware platform which supports these features, - resctrlfs pseudofilesystem enabled in your kernel - the resctrlfs filesystem mounted (possibly with extra options for your platform)