Common Functionality
Overview
There is some common functionality implemented by the generic resource management infrastructure shared by all resource policy plugin implementations. This functionality is available in all policies, unless stated otherwise in the policy-specific documentation.
Cache Allocation
Plugins can be configured to exercise class-based control over the L2 and L3 cache allocated to containers’ processes. In practice, containers are assigned to classes. Classes have a corresponding cache allocation configuration. This configuration is applied to all containers and subsequently to all processes started in a container.
To enable cache control use the control.rdt.enable
option which defaults to false
.
Plugins can be configured to assign containers by default to a cache class named after
the Pod QoS class of the container: one of BestEffort
, Burstable
, and Guaranteed
.
The configuration setting controlling this behavior is control.rdt.usagePodQoSAsDefaultClass
and it defaults to false
.
Additionally, containers can be explicitly annotated to be assigned to a class.
Use the rdtclass.resource-policy.nri.io
annotation key for this. For instance
apiVersion: v1
kind: Pod
metadata:
name: test-pod
annotations:
rdtclass.resource-policy.nri.io/pod: poddefaultclass
rdtclass.resource-policy.nri.io/container.special-container: specialclass
...
This will assign the container named special-container
within the pod to
the specialclass
RDT class and any other container within the pod to the
poddefaultclass
RDT class. Effectively these containers’ processes will
be assigned to the RDT CLOSes corresponding to those classes.
Cache Class/Partitioning Configuration
RDT configuration is supplied as part of thecontrol.rdt
configuration block.
Here is a sample snippet as a Helm chart value which assigns 33%, 66% and 100%
of cache lines to BestEffort
, Burstable
and Guaranteed
Pod QoS class
containers correspondingly:
config:
control:
rdt:
enable: false
usePodQoSAsDefaultClass: true
options:
l2:
optional: true
l3:
optional: true
mb:
optional: true
partitions:
fullCache:
l2Allocation:
all:
unified: 100%
l3Allocation:
all:
unified: 100%
classes:
BestEffort:
l2Allocation:
all:
unified: 33%
l3Allocation:
all:
unified: 33%
Burstable:
l2Allocation:
all:
unified: 66%
l3Allocation:
all:
unified: 66%
Guaranteed:
l2Allocation:
all:
unified: 100%
l3Allocation:
all:
unified: 100%
The actual library used to implement cache control is goresctrl. Please refer to its documentation for a more detailed description of configuration semantics.
A Warning About Configuration Syntax Differences
Note that the configuration syntax used for cache partitioning and classes is slightly different for goresctrl and NRI Reference Plugins. When directly using goresctrl you can use a shorthand notation like this
...
classes:
fullCache:
l2Allocation:
all: 100%
l3Allocation:
all: 100%
...
to actually mean
...
classes:
fullCache:
l2Allocation:
all:
unified: 100%
l3Allocation:
all:
unified: 100%
...
This is not possible with the NRI Reference Plugins configuration CR. Here you must use the latter full syntax.
Cache Occupancy Monitoring Metrics
Plugins can be configured to export cache usage as Prometheus metrics. The following configuration options must be specified:
control.rdt.enable
set totrue
instrumentation.prometheusExport
set totrue
,instrumentation.httpEndpoint
set to a valid non-empty value, eg.:8891
, andinstrumentation.metrics.enabled
set to containpolicy/rdt
,rdt
, orpolicy
When deploying with Helm, the default configuration can be modified like this:
$ helm install test -n kube-system nri-plugins/nri-resource-policy-topology-aware \
--set config.control.rdt.enable=true \
--set config.instrumentation.prometheusExport=true \
--set config.instrumentation.metrics.enabled='{buildinfo,rdt}' \
--set config.log.debug='{goresctrl}'
Once enabled, you’ll see RDT metrics similar to the following:
$ kubectl port-forward -n kube-system ds/nri-resource-policy-topology-aware 9000:8891 &
$ wget -q --no-proxy http://127.0.0.1:9000/metrics -O-
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/containers/nri-plugins",version="v0.10.0"} 1
# HELP nri_l3_llc_occupancy L3 (LLC) occupancy
# TYPE nri_l3_llc_occupancy counter
nri_l3_llc_occupancy{cache_id="0",rdt_class="BestEffort",rdt_mon_group=""} 655360
nri_l3_llc_occupancy{cache_id="0",rdt_class="Burstable",rdt_mon_group=""} 409600
nri_l3_llc_occupancy{cache_id="0",rdt_class="Guaranteed",rdt_mon_group=""} 0
nri_l3_llc_occupancy{cache_id="0",rdt_class="system/default",rdt_mon_group=""} 2.752512e+07
nri_l3_llc_occupancy{cache_id="1",rdt_class="BestEffort",rdt_mon_group=""} 0
nri_l3_llc_occupancy{cache_id="1",rdt_class="Burstable",rdt_mon_group=""} 0
nri_l3_llc_occupancy{cache_id="1",rdt_class="Guaranteed",rdt_mon_group=""} 491520
nri_l3_llc_occupancy{cache_id="1",rdt_class="system/default",rdt_mon_group=""} 2.818048e+07
The RDT-specific set of metrics collected depends on your hardware and your kernel configuration. If supported by your environment, currently you can expect to get the following metrics related to cache occupancy:
l3_llc_occupancy: L3 (LLC) occupancy
These are collected per cache ID for each RDT class/CLOS.
Memory Bandwidth Allocation
If the hardware supports it, plugins can limit per RDT class, how much memory bandwidth processes in containers in a class can use up altogether. You can enable this using a slightly modified class configuration which specifies MBA limits for each class and the partition.
config:
control:
rdt:
enable: false
usePodQoSAsDefaultClass: true
options:
l2:
optional: true
l3:
optional: true
mb:
optional: true
partitions:
fullCache:
l2Allocation:
all:
unified: 100%
l3Allocation:
all:
unified: 100%
mbAllocation:
all: [ 100%, 1000Mbps ]
classes:
BestEffort:
l2Allocation:
all:
unified: 33%
l3Allocation:
all:
unified: 33%
mbAllocation:
all: [ 33%, 330Mbps ]
Burstable:
l2Allocation:
all:
unified: 66%
l3Allocation:
all:
unified: 66%
mbAllocation:
all: [ 66%, 660Mbps ]
Guaranteed:
l2Allocation:
all:
unified: 100%
l3Allocation:
all:
unified: 100%
mbAllocation:
all: [ 100%, 1000Mbps ]
Memory Bandwidth Monitoring Metrics
If you have RDT-specific metrics collection enabled and your platform supports memory bandwidth monitoring, you can expect these related metrics to be exposed:
l3_mbm_local_bytes: bytes transferred to/from local memory through LLC
l3_mbm_total_bytes: total bytes transferred to/from memory through LLC
An example:
$ kubectl port-forward -n kube-system ds/nri-resource-policy-topology-aware 9000:8891 &
$ wget -q --no-proxy http://127.0.0.1:9000/metrics -O-
# HELP nri_l3_mbm_local_bytes bytes transferred to/from local memory through LLC
# TYPE nri_l3_mbm_local_bytes counter
nri_l3_mbm_local_bytes{cache_id="0",rdt_class="BestEffort",rdt_mon_group=""} 573440
nri_l3_mbm_local_bytes{cache_id="0",rdt_class="Burstable",rdt_mon_group=""} 1.253376e+07
nri_l3_mbm_local_bytes{cache_id="0",rdt_class="Guaranteed",rdt_mon_group=""} 0
nri_l3_mbm_local_bytes{cache_id="0",rdt_class="system/default",rdt_mon_group=""} 1.98836224e+09
nri_l3_mbm_local_bytes{cache_id="1",rdt_class="BestEffort",rdt_mon_group=""} 1.6384e+07
nri_l3_mbm_local_bytes{cache_id="1",rdt_class="Burstable",rdt_mon_group=""} 0
nri_l3_mbm_local_bytes{cache_id="1",rdt_class="Guaranteed",rdt_mon_group=""} 1.06496e+07
nri_l3_mbm_local_bytes{cache_id="1",rdt_class="system/default",rdt_mon_group=""} 1.63692544e+09
# HELP nri_l3_mbm_total_bytes total bytes transferred to/from memory through LLC
# TYPE nri_l3_mbm_total_bytes counter
nri_l3_mbm_total_bytes{cache_id="0",rdt_class="BestEffort",rdt_mon_group=""} 573440
nri_l3_mbm_total_bytes{cache_id="0",rdt_class="Burstable",rdt_mon_group=""} 1.59744e+07
nri_l3_mbm_total_bytes{cache_id="0",rdt_class="Guaranteed",rdt_mon_group=""} 0
nri_l3_mbm_total_bytes{cache_id="0",rdt_class="system/default",rdt_mon_group=""} 3.172352e+09
nri_l3_mbm_total_bytes{cache_id="1",rdt_class="BestEffort",rdt_mon_group=""} 2.236416e+07
nri_l3_mbm_total_bytes{cache_id="1",rdt_class="Burstable",rdt_mon_group=""} 0
nri_l3_mbm_total_bytes{cache_id="1",rdt_class="Guaranteed",rdt_mon_group=""} 1.318912e+07
nri_l3_mbm_total_bytes{cache_id="1",rdt_class="system/default",rdt_mon_group=""} 2.64511488e+09
Metrics Specific to Monitoring Groups
If there are any monitoring groups present in the system, goresctrl produces
RDT metrics for those as well. You can differentiate between group specific and
other metrics using the rdt_mon_group
metrics label. Metrics specific to a
monitoring group have this label set to the name of the monitoring group the
metric corresponds to.
Cache and Memory Bandwidth Allocation and Monitoring Prerequisites
Note that for cache and memory bandwidth allocation and monitoring to work, you must have
a hardware platform which supports these features,
resctrlfs pseudofilesystem enabled in your kernel
the resctrlfs filesystem mounted (possibly with extra options for your platform)