cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

resctrl.rst (48057B)


      1.. SPDX-License-Identifier: GPL-2.0
      2.. include:: <isonum.txt>
      3
      4===========================================
      5User Interface for Resource Control feature
      6===========================================
      7
      8:Copyright: |copy| 2016 Intel Corporation
      9:Authors: - Fenghua Yu <fenghua.yu@intel.com>
     10          - Tony Luck <tony.luck@intel.com>
     11          - Vikas Shivappa <vikas.shivappa@intel.com>
     12
     13
     14Intel refers to this feature as Intel Resource Director Technology(Intel(R) RDT).
     15AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
     16
     17This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
     18flag bits:
     19
     20=============================================	================================
     21RDT (Resource Director Technology) Allocation	"rdt_a"
     22CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
     23CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
     24CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
     25MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
     26MBA (Memory Bandwidth Allocation)		"mba"
     27=============================================	================================
     28
     29To use the feature mount the file system::
     30
     31 # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
     32
     33mount options are:
     34
     35"cdp":
     36	Enable code/data prioritization in L3 cache allocations.
     37"cdpl2":
     38	Enable code/data prioritization in L2 cache allocations.
     39"mba_MBps":
     40	Enable the MBA Software Controller(mba_sc) to specify MBA
     41	bandwidth in MBps
     42
     43L2 and L3 CDP are controlled separately.
     44
     45RDT features are orthogonal. A particular system may support only
     46monitoring, only control, or both monitoring and control.  Cache
     47pseudo-locking is a unique way of using cache control to "pin" or
     48"lock" data in the cache. Details can be found in
     49"Cache Pseudo-Locking".
     50
     51
     52The mount succeeds if either of allocation or monitoring is present, but
     53only those files and directories supported by the system will be created.
     54For more details on the behavior of the interface during monitoring
     55and allocation, see the "Resource alloc and monitor groups" section.
     56
     57Info directory
     58==============
     59
     60The 'info' directory contains information about the enabled
     61resources. Each resource has its own subdirectory. The subdirectory
     62names reflect the resource names.
     63
     64Each subdirectory contains the following files with respect to
     65allocation:
     66
     67Cache resource(L3/L2)  subdirectory contains the following files
     68related to allocation:
     69
     70"num_closids":
     71		The number of CLOSIDs which are valid for this
     72		resource. The kernel uses the smallest number of
     73		CLOSIDs of all enabled resources as limit.
     74"cbm_mask":
     75		The bitmask which is valid for this resource.
     76		This mask is equivalent to 100%.
     77"min_cbm_bits":
     78		The minimum number of consecutive bits which
     79		must be set when writing a mask.
     80
     81"shareable_bits":
     82		Bitmask of shareable resource with other executing
     83		entities (e.g. I/O). User can use this when
     84		setting up exclusive cache partitions. Note that
     85		some platforms support devices that have their
     86		own settings for cache use which can over-ride
     87		these bits.
     88"bit_usage":
     89		Annotated capacity bitmasks showing how all
     90		instances of the resource are used. The legend is:
     91
     92			"0":
     93			      Corresponding region is unused. When the system's
     94			      resources have been allocated and a "0" is found
     95			      in "bit_usage" it is a sign that resources are
     96			      wasted.
     97
     98			"H":
     99			      Corresponding region is used by hardware only
    100			      but available for software use. If a resource
    101			      has bits set in "shareable_bits" but not all
    102			      of these bits appear in the resource groups'
    103			      schematas then the bits appearing in
    104			      "shareable_bits" but no resource group will
    105			      be marked as "H".
    106			"X":
    107			      Corresponding region is available for sharing and
    108			      used by hardware and software. These are the
    109			      bits that appear in "shareable_bits" as
    110			      well as a resource group's allocation.
    111			"S":
    112			      Corresponding region is used by software
    113			      and available for sharing.
    114			"E":
    115			      Corresponding region is used exclusively by
    116			      one resource group. No sharing allowed.
    117			"P":
    118			      Corresponding region is pseudo-locked. No
    119			      sharing allowed.
    120
    121Memory bandwidth(MB) subdirectory contains the following files
    122with respect to allocation:
    123
    124"min_bandwidth":
    125		The minimum memory bandwidth percentage which
    126		user can request.
    127
    128"bandwidth_gran":
    129		The granularity in which the memory bandwidth
    130		percentage is allocated. The allocated
    131		b/w percentage is rounded off to the next
    132		control step available on the hardware. The
    133		available bandwidth control steps are:
    134		min_bandwidth + N * bandwidth_gran.
    135
    136"delay_linear":
    137		Indicates if the delay scale is linear or
    138		non-linear. This field is purely informational
    139		only.
    140
    141"thread_throttle_mode":
    142		Indicator on Intel systems of how tasks running on threads
    143		of a physical core are throttled in cases where they
    144		request different memory bandwidth percentages:
    145
    146		"max":
    147			the smallest percentage is applied
    148			to all threads
    149		"per-thread":
    150			bandwidth percentages are directly applied to
    151			the threads running on the core
    152
    153If RDT monitoring is available there will be an "L3_MON" directory
    154with the following files:
    155
    156"num_rmids":
    157		The number of RMIDs available. This is the
    158		upper bound for how many "CTRL_MON" + "MON"
    159		groups can be created.
    160
    161"mon_features":
    162		Lists the monitoring events if
    163		monitoring is enabled for the resource.
    164
    165"max_threshold_occupancy":
    166		Read/write file provides the largest value (in
    167		bytes) at which a previously used LLC_occupancy
    168		counter can be considered for re-use.
    169
    170Finally, in the top level of the "info" directory there is a file
    171named "last_cmd_status". This is reset with every "command" issued
    172via the file system (making new directories or writing to any of the
    173control files). If the command was successful, it will read as "ok".
    174If the command failed, it will provide more information that can be
    175conveyed in the error returns from file operations. E.g.
    176::
    177
    178	# echo L3:0=f7 > schemata
    179	bash: echo: write error: Invalid argument
    180	# cat info/last_cmd_status
    181	mask f7 has non-consecutive 1-bits
    182
    183Resource alloc and monitor groups
    184=================================
    185
    186Resource groups are represented as directories in the resctrl file
    187system.  The default group is the root directory which, immediately
    188after mounting, owns all the tasks and cpus in the system and can make
    189full use of all resources.
    190
    191On a system with RDT control features additional directories can be
    192created in the root directory that specify different amounts of each
    193resource (see "schemata" below). The root and these additional top level
    194directories are referred to as "CTRL_MON" groups below.
    195
    196On a system with RDT monitoring the root directory and other top level
    197directories contain a directory named "mon_groups" in which additional
    198directories can be created to monitor subsets of tasks in the CTRL_MON
    199group that is their ancestor. These are called "MON" groups in the rest
    200of this document.
    201
    202Removing a directory will move all tasks and cpus owned by the group it
    203represents to the parent. Removing one of the created CTRL_MON groups
    204will automatically remove all MON groups below it.
    205
    206All groups contain the following files:
    207
    208"tasks":
    209	Reading this file shows the list of all tasks that belong to
    210	this group. Writing a task id to the file will add a task to the
    211	group. If the group is a CTRL_MON group the task is removed from
    212	whichever previous CTRL_MON group owned the task and also from
    213	any MON group that owned the task. If the group is a MON group,
    214	then the task must already belong to the CTRL_MON parent of this
    215	group. The task is removed from any previous MON group.
    216
    217
    218"cpus":
    219	Reading this file shows a bitmask of the logical CPUs owned by
    220	this group. Writing a mask to this file will add and remove
    221	CPUs to/from this group. As with the tasks file a hierarchy is
    222	maintained where MON groups may only include CPUs owned by the
    223	parent CTRL_MON group.
    224	When the resource group is in pseudo-locked mode this file will
    225	only be readable, reflecting the CPUs associated with the
    226	pseudo-locked region.
    227
    228
    229"cpus_list":
    230	Just like "cpus", only using ranges of CPUs instead of bitmasks.
    231
    232
    233When control is enabled all CTRL_MON groups will also contain:
    234
    235"schemata":
    236	A list of all the resources available to this group.
    237	Each resource has its own line and format - see below for details.
    238
    239"size":
    240	Mirrors the display of the "schemata" file to display the size in
    241	bytes of each allocation instead of the bits representing the
    242	allocation.
    243
    244"mode":
    245	The "mode" of the resource group dictates the sharing of its
    246	allocations. A "shareable" resource group allows sharing of its
    247	allocations while an "exclusive" resource group does not. A
    248	cache pseudo-locked region is created by first writing
    249	"pseudo-locksetup" to the "mode" file before writing the cache
    250	pseudo-locked region's schemata to the resource group's "schemata"
    251	file. On successful pseudo-locked region creation the mode will
    252	automatically change to "pseudo-locked".
    253
    254When monitoring is enabled all MON groups will also contain:
    255
    256"mon_data":
    257	This contains a set of files organized by L3 domain and by
    258	RDT event. E.g. on a system with two L3 domains there will
    259	be subdirectories "mon_L3_00" and "mon_L3_01".	Each of these
    260	directories have one file per event (e.g. "llc_occupancy",
    261	"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
    262	files provide a read out of the current value of the event for
    263	all tasks in the group. In CTRL_MON groups these files provide
    264	the sum for all tasks in the CTRL_MON group and all tasks in
    265	MON groups. Please see example section for more details on usage.
    266
    267Resource allocation rules
    268-------------------------
    269
    270When a task is running the following rules define which resources are
    271available to it:
    272
    2731) If the task is a member of a non-default group, then the schemata
    274   for that group is used.
    275
    2762) Else if the task belongs to the default group, but is running on a
    277   CPU that is assigned to some specific group, then the schemata for the
    278   CPU's group is used.
    279
    2803) Otherwise the schemata for the default group is used.
    281
    282Resource monitoring rules
    283-------------------------
    2841) If a task is a member of a MON group, or non-default CTRL_MON group
    285   then RDT events for the task will be reported in that group.
    286
    2872) If a task is a member of the default CTRL_MON group, but is running
    288   on a CPU that is assigned to some specific group, then the RDT events
    289   for the task will be reported in that group.
    290
    2913) Otherwise RDT events for the task will be reported in the root level
    292   "mon_data" group.
    293
    294
    295Notes on cache occupancy monitoring and control
    296===============================================
    297When moving a task from one group to another you should remember that
    298this only affects *new* cache allocations by the task. E.g. you may have
    299a task in a monitor group showing 3 MB of cache occupancy. If you move
    300to a new group and immediately check the occupancy of the old and new
    301groups you will likely see that the old group is still showing 3 MB and
    302the new group zero. When the task accesses locations still in cache from
    303before the move, the h/w does not update any counters. On a busy system
    304you will likely see the occupancy in the old group go down as cache lines
    305are evicted and re-used while the occupancy in the new group rises as
    306the task accesses memory and loads into the cache are counted based on
    307membership in the new group.
    308
    309The same applies to cache allocation control. Moving a task to a group
    310with a smaller cache partition will not evict any cache lines. The
    311process may continue to use them from the old partition.
    312
    313Hardware uses CLOSid(Class of service ID) and an RMID(Resource monitoring ID)
    314to identify a control group and a monitoring group respectively. Each of
    315the resource groups are mapped to these IDs based on the kind of group. The
    316number of CLOSid and RMID are limited by the hardware and hence the creation of
    317a "CTRL_MON" directory may fail if we run out of either CLOSID or RMID
    318and creation of "MON" group may fail if we run out of RMIDs.
    319
    320max_threshold_occupancy - generic concepts
    321------------------------------------------
    322
    323Note that an RMID once freed may not be immediately available for use as
    324the RMID is still tagged the cache lines of the previous user of RMID.
    325Hence such RMIDs are placed on limbo list and checked back if the cache
    326occupancy has gone down. If there is a time when system has a lot of
    327limbo RMIDs but which are not ready to be used, user may see an -EBUSY
    328during mkdir.
    329
    330max_threshold_occupancy is a user configurable value to determine the
    331occupancy at which an RMID can be freed.
    332
    333Schemata files - general concepts
    334---------------------------------
    335Each line in the file describes one resource. The line starts with
    336the name of the resource, followed by specific values to be applied
    337in each of the instances of that resource on the system.
    338
    339Cache IDs
    340---------
    341On current generation systems there is one L3 cache per socket and L2
    342caches are generally just shared by the hyperthreads on a core, but this
    343isn't an architectural requirement. We could have multiple separate L3
    344caches on a socket, multiple cores could share an L2 cache. So instead
    345of using "socket" or "core" to define the set of logical cpus sharing
    346a resource we use a "Cache ID". At a given cache level this will be a
    347unique number across the whole system (but it isn't guaranteed to be a
    348contiguous sequence, there may be gaps).  To find the ID for each logical
    349CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
    350
    351Cache Bit Masks (CBM)
    352---------------------
    353For cache resources we describe the portion of the cache that is available
    354for allocation using a bitmask. The maximum value of the mask is defined
    355by each cpu model (and may be different for different cache levels). It
    356is found using CPUID, but is also provided in the "info" directory of
    357the resctrl file system in "info/{resource}/cbm_mask". Intel hardware
    358requires that these masks have all the '1' bits in a contiguous block. So
    3590x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
    360and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
    361of the capacity of the cache. You could partition the cache into four
    362equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
    363
    364Memory bandwidth Allocation and monitoring
    365==========================================
    366
    367For Memory bandwidth resource, by default the user controls the resource
    368by indicating the percentage of total memory bandwidth.
    369
    370The minimum bandwidth percentage value for each cpu model is predefined
    371and can be looked up through "info/MB/min_bandwidth". The bandwidth
    372granularity that is allocated is also dependent on the cpu model and can
    373be looked up at "info/MB/bandwidth_gran". The available bandwidth
    374control steps are: min_bw + N * bw_gran. Intermediate values are rounded
    375to the next control step available on the hardware.
    376
    377The bandwidth throttling is a core specific mechanism on some of Intel
    378SKUs. Using a high bandwidth and a low bandwidth setting on two threads
    379sharing a core may result in both threads being throttled to use the
    380low bandwidth (see "thread_throttle_mode").
    381
    382The fact that Memory bandwidth allocation(MBA) may be a core
    383specific mechanism where as memory bandwidth monitoring(MBM) is done at
    384the package level may lead to confusion when users try to apply control
    385via the MBA and then monitor the bandwidth to see if the controls are
    386effective. Below are such scenarios:
    387
    3881. User may *not* see increase in actual bandwidth when percentage
    389   values are increased:
    390
    391This can occur when aggregate L2 external bandwidth is more than L3
    392external bandwidth. Consider an SKL SKU with 24 cores on a package and
    393where L2 external  is 10GBps (hence aggregate L2 external bandwidth is
    394240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
    395threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
    396bandwidth of 100GBps although the percentage value specified is only 50%
    397<< 100%. Hence increasing the bandwidth percentage will not yield any
    398more bandwidth. This is because although the L2 external bandwidth still
    399has capacity, the L3 external bandwidth is fully used. Also note that
    400this would be dependent on number of cores the benchmark is run on.
    401
    4022. Same bandwidth percentage may mean different actual bandwidth
    403   depending on # of threads:
    404
    405For the same SKU in #1, a 'single thread, with 10% bandwidth' and '4
    406thread, with 10% bandwidth' can consume upto 10GBps and 40GBps although
    407they have same percentage bandwidth of 10%. This is simply because as
    408threads start using more cores in an rdtgroup, the actual bandwidth may
    409increase or vary although user specified bandwidth percentage is same.
    410
    411In order to mitigate this and make the interface more user friendly,
    412resctrl added support for specifying the bandwidth in MBps as well.  The
    413kernel underneath would use a software feedback mechanism or a "Software
    414Controller(mba_sc)" which reads the actual bandwidth using MBM counters
    415and adjust the memory bandwidth percentages to ensure::
    416
    417	"actual bandwidth < user specified bandwidth".
    418
    419By default, the schemata would take the bandwidth percentage values
    420where as user can switch to the "MBA software controller" mode using
    421a mount option 'mba_MBps'. The schemata format is specified in the below
    422sections.
    423
    424L3 schemata file details (code and data prioritization disabled)
    425----------------------------------------------------------------
    426With CDP disabled the L3 schemata format is::
    427
    428	L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
    429
    430L3 schemata file details (CDP enabled via mount option to resctrl)
    431------------------------------------------------------------------
    432When CDP is enabled L3 control is split into two separate resources
    433so you can specify independent masks for code and data like this::
    434
    435	L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
    436	L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
    437
    438L2 schemata file details
    439------------------------
    440CDP is supported at L2 using the 'cdpl2' mount option. The schemata
    441format is either::
    442
    443	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
    444
    445or
    446
    447	L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
    448	L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
    449
    450
    451Memory bandwidth Allocation (default mode)
    452------------------------------------------
    453
    454Memory b/w domain is L3 cache.
    455::
    456
    457	MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
    458
    459Memory bandwidth Allocation specified in MBps
    460---------------------------------------------
    461
    462Memory bandwidth domain is L3 cache.
    463::
    464
    465	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
    466
    467Reading/writing the schemata file
    468---------------------------------
    469Reading the schemata file will show the state of all resources
    470on all domains. When writing you only need to specify those values
    471which you wish to change.  E.g.
    472::
    473
    474  # cat schemata
    475  L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
    476  L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
    477  # echo "L3DATA:2=3c0;" > schemata
    478  # cat schemata
    479  L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
    480  L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
    481
    482Cache Pseudo-Locking
    483====================
    484CAT enables a user to specify the amount of cache space that an
    485application can fill. Cache pseudo-locking builds on the fact that a
    486CPU can still read and write data pre-allocated outside its current
    487allocated area on a cache hit. With cache pseudo-locking, data can be
    488preloaded into a reserved portion of cache that no application can
    489fill, and from that point on will only serve cache hits. The cache
    490pseudo-locked memory is made accessible to user space where an
    491application can map it into its virtual address space and thus have
    492a region of memory with reduced average read latency.
    493
    494The creation of a cache pseudo-locked region is triggered by a request
    495from the user to do so that is accompanied by a schemata of the region
    496to be pseudo-locked. The cache pseudo-locked region is created as follows:
    497
    498- Create a CAT allocation CLOSNEW with a CBM matching the schemata
    499  from the user of the cache region that will contain the pseudo-locked
    500  memory. This region must not overlap with any current CAT allocation/CLOS
    501  on the system and no future overlap with this cache region is allowed
    502  while the pseudo-locked region exists.
    503- Create a contiguous region of memory of the same size as the cache
    504  region.
    505- Flush the cache, disable hardware prefetchers, disable preemption.
    506- Make CLOSNEW the active CLOS and touch the allocated memory to load
    507  it into the cache.
    508- Set the previous CLOS as active.
    509- At this point the closid CLOSNEW can be released - the cache
    510  pseudo-locked region is protected as long as its CBM does not appear in
    511  any CAT allocation. Even though the cache pseudo-locked region will from
    512  this point on not appear in any CBM of any CLOS an application running with
    513  any CLOS will be able to access the memory in the pseudo-locked region since
    514  the region continues to serve cache hits.
    515- The contiguous region of memory loaded into the cache is exposed to
    516  user-space as a character device.
    517
    518Cache pseudo-locking increases the probability that data will remain
    519in the cache via carefully configuring the CAT feature and controlling
    520application behavior. There is no guarantee that data is placed in
    521cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
    522“locked” data from cache. Power management C-states may shrink or
    523power off cache. Deeper C-states will automatically be restricted on
    524pseudo-locked region creation.
    525
    526It is required that an application using a pseudo-locked region runs
    527with affinity to the cores (or a subset of the cores) associated
    528with the cache on which the pseudo-locked region resides. A sanity check
    529within the code will not allow an application to map pseudo-locked memory
    530unless it runs with affinity to cores associated with the cache on which the
    531pseudo-locked region resides. The sanity check is only done during the
    532initial mmap() handling, there is no enforcement afterwards and the
    533application self needs to ensure it remains affine to the correct cores.
    534
    535Pseudo-locking is accomplished in two stages:
    536
    5371) During the first stage the system administrator allocates a portion
    538   of cache that should be dedicated to pseudo-locking. At this time an
    539   equivalent portion of memory is allocated, loaded into allocated
    540   cache portion, and exposed as a character device.
    5412) During the second stage a user-space application maps (mmap()) the
    542   pseudo-locked memory into its address space.
    543
    544Cache Pseudo-Locking Interface
    545------------------------------
    546A pseudo-locked region is created using the resctrl interface as follows:
    547
    5481) Create a new resource group by creating a new directory in /sys/fs/resctrl.
    5492) Change the new resource group's mode to "pseudo-locksetup" by writing
    550   "pseudo-locksetup" to the "mode" file.
    5513) Write the schemata of the pseudo-locked region to the "schemata" file. All
    552   bits within the schemata should be "unused" according to the "bit_usage"
    553   file.
    554
    555On successful pseudo-locked region creation the "mode" file will contain
    556"pseudo-locked" and a new character device with the same name as the resource
    557group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
    558by user space in order to obtain access to the pseudo-locked memory region.
    559
    560An example of cache pseudo-locked region creation and usage can be found below.
    561
    562Cache Pseudo-Locking Debugging Interface
    563----------------------------------------
    564The pseudo-locking debugging interface is enabled by default (if
    565CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl.
    566
    567There is no explicit way for the kernel to test if a provided memory
    568location is present in the cache. The pseudo-locking debugging interface uses
    569the tracing infrastructure to provide two ways to measure cache residency of
    570the pseudo-locked region:
    571
    5721) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
    573   from these measurements are best visualized using a hist trigger (see
    574   example below). In this test the pseudo-locked region is traversed at
    575   a stride of 32 bytes while hardware prefetchers and preemption
    576   are disabled. This also provides a substitute visualization of cache
    577   hits and misses.
    5782) Cache hit and miss measurements using model specific precision counters if
    579   available. Depending on the levels of cache on the system the pseudo_lock_l2
    580   and pseudo_lock_l3 tracepoints are available.
    581
    582When a pseudo-locked region is created a new debugfs directory is created for
    583it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
    584write-only file, pseudo_lock_measure, is present in this directory. The
    585measurement of the pseudo-locked region depends on the number written to this
    586debugfs file:
    587
    5881:
    589     writing "1" to the pseudo_lock_measure file will trigger the latency
    590     measurement captured in the pseudo_lock_mem_latency tracepoint. See
    591     example below.
    5922:
    593     writing "2" to the pseudo_lock_measure file will trigger the L2 cache
    594     residency (cache hits and misses) measurement captured in the
    595     pseudo_lock_l2 tracepoint. See example below.
    5963:
    597     writing "3" to the pseudo_lock_measure file will trigger the L3 cache
    598     residency (cache hits and misses) measurement captured in the
    599     pseudo_lock_l3 tracepoint.
    600
    601All measurements are recorded with the tracing infrastructure. This requires
    602the relevant tracepoints to be enabled before the measurement is triggered.
    603
    604Example of latency debugging interface
    605~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    606In this example a pseudo-locked region named "newlock" was created. Here is
    607how we can measure the latency in cycles of reading from this region and
    608visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS
    609is set::
    610
    611  # :> /sys/kernel/debug/tracing/trace
    612  # echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
    613  # echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
    614  # echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
    615  # echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
    616  # cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
    617
    618  # event histogram
    619  #
    620  # trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
    621  #
    622
    623  { latency:        456 } hitcount:          1
    624  { latency:         50 } hitcount:         83
    625  { latency:         36 } hitcount:         96
    626  { latency:         44 } hitcount:        174
    627  { latency:         48 } hitcount:        195
    628  { latency:         46 } hitcount:        262
    629  { latency:         42 } hitcount:        693
    630  { latency:         40 } hitcount:       3204
    631  { latency:         38 } hitcount:       3484
    632
    633  Totals:
    634      Hits: 8192
    635      Entries: 9
    636    Dropped: 0
    637
    638Example of cache hits/misses debugging
    639~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    640In this example a pseudo-locked region named "newlock" was created on the L2
    641cache of a platform. Here is how we can obtain details of the cache hits
    642and misses using the platform's precision counters.
    643::
    644
    645  # :> /sys/kernel/debug/tracing/trace
    646  # echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
    647  # echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
    648  # echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
    649  # cat /sys/kernel/debug/tracing/trace
    650
    651  # tracer: nop
    652  #
    653  #                              _-----=> irqs-off
    654  #                             / _----=> need-resched
    655  #                            | / _---=> hardirq/softirq
    656  #                            || / _--=> preempt-depth
    657  #                            ||| /     delay
    658  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
    659  #              | |       |   ||||       |         |
    660  pseudo_lock_mea-1672  [002] ....  3132.860500: pseudo_lock_l2: hits=4097 miss=0
    661
    662
    663Examples for RDT allocation usage
    664~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    665
    6661) Example 1
    667
    668On a two socket machine (one L3 cache per socket) with just four bits
    669for cache bit masks, minimum b/w of 10% with a memory bandwidth
    670granularity of 10%.
    671::
    672
    673  # mount -t resctrl resctrl /sys/fs/resctrl
    674  # cd /sys/fs/resctrl
    675  # mkdir p0 p1
    676  # echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
    677  # echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
    678
    679The default resource group is unmodified, so we have access to all parts
    680of all caches (its schemata file reads "L3:0=f;1=f").
    681
    682Tasks that are under the control of group "p0" may only allocate from the
    683"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
    684Tasks in group "p1" use the "lower" 50% of cache on both sockets.
    685
    686Similarly, tasks that are under the control of group "p0" may use a
    687maximum memory b/w of 50% on socket0 and 50% on socket 1.
    688Tasks in group "p1" may also use 50% memory b/w on both sockets.
    689Note that unlike cache masks, memory b/w cannot specify whether these
    690allocations can overlap or not. The allocations specifies the maximum
    691b/w that the group may be able to use and the system admin can configure
    692the b/w accordingly.
    693
    694If resctrl is using the software controller (mba_sc) then user can enter the
    695max b/w in MB rather than the percentage values.
    696::
    697
    698  # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
    699  # echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
    700
    701In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w
    702of 1024MB where as on socket 1 they would use 500MB.
    703
    7042) Example 2
    705
    706Again two sockets, but this time with a more realistic 20-bit mask.
    707
    708Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
    709processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
    710neighbors, each of the two real-time tasks exclusively occupies one quarter
    711of L3 cache on socket 0.
    712::
    713
    714  # mount -t resctrl resctrl /sys/fs/resctrl
    715  # cd /sys/fs/resctrl
    716
    717First we reset the schemata for the default group so that the "upper"
    71850% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
    719ordinary tasks::
    720
    721  # echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata
    722
    723Next we make a resource group for our first real time task and give
    724it access to the "top" 25% of the cache on socket 0.
    725::
    726
    727  # mkdir p0
    728  # echo "L3:0=f8000;1=fffff" > p0/schemata
    729
    730Finally we move our first real time task into this resource group. We
    731also use taskset(1) to ensure the task always runs on a dedicated CPU
    732on socket 0. Most uses of resource groups will also constrain which
    733processors tasks run on.
    734::
    735
    736  # echo 1234 > p0/tasks
    737  # taskset -cp 1 1234
    738
    739Ditto for the second real time task (with the remaining 25% of cache)::
    740
    741  # mkdir p1
    742  # echo "L3:0=7c00;1=fffff" > p1/schemata
    743  # echo 5678 > p1/tasks
    744  # taskset -cp 2 5678
    745
    746For the same 2 socket system with memory b/w resource and CAT L3 the
    747schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
    74810):
    749
    750For our first real time task this would request 20% memory b/w on socket 0.
    751::
    752
    753  # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
    754
    755For our second real time task this would request an other 20% memory b/w
    756on socket 0.
    757::
    758
    759  # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
    760
    7613) Example 3
    762
    763A single socket system which has real-time tasks running on core 4-7 and
    764non real-time workload assigned to core 0-3. The real-time tasks share text
    765and data, so a per task association is not required and due to interaction
    766with the kernel it's desired that the kernel on these cores shares L3 with
    767the tasks.
    768::
    769
    770  # mount -t resctrl resctrl /sys/fs/resctrl
    771  # cd /sys/fs/resctrl
    772
    773First we reset the schemata for the default group so that the "upper"
    77450% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
    775cannot be used by ordinary tasks::
    776
    777  # echo "L3:0=3ff\nMB:0=50" > schemata
    778
    779Next we make a resource group for our real time cores and give it access
    780to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
    781socket 0.
    782::
    783
    784  # mkdir p0
    785  # echo "L3:0=ffc00\nMB:0=50" > p0/schemata
    786
    787Finally we move core 4-7 over to the new group and make sure that the
    788kernel and the tasks running there get 50% of the cache. They should
    789also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
    790siblings and only the real time threads are scheduled on the cores 4-7.
    791::
    792
    793  # echo F0 > p0/cpus
    794
    7954) Example 4
    796
    797The resource groups in previous examples were all in the default "shareable"
    798mode allowing sharing of their cache allocations. If one resource group
    799configures a cache allocation then nothing prevents another resource group
    800to overlap with that allocation.
    801
    802In this example a new exclusive resource group will be created on a L2 CAT
    803system with two L2 cache instances that can be configured with an 8-bit
    804capacity bitmask. The new exclusive resource group will be configured to use
    80525% of each cache instance.
    806::
    807
    808  # mount -t resctrl resctrl /sys/fs/resctrl/
    809  # cd /sys/fs/resctrl
    810
    811First, we observe that the default group is configured to allocate to all L2
    812cache::
    813
    814  # cat schemata
    815  L2:0=ff;1=ff
    816
    817We could attempt to create the new resource group at this point, but it will
    818fail because of the overlap with the schemata of the default group::
    819
    820  # mkdir p0
    821  # echo 'L2:0=0x3;1=0x3' > p0/schemata
    822  # cat p0/mode
    823  shareable
    824  # echo exclusive > p0/mode
    825  -sh: echo: write error: Invalid argument
    826  # cat info/last_cmd_status
    827  schemata overlaps
    828
    829To ensure that there is no overlap with another resource group the default
    830resource group's schemata has to change, making it possible for the new
    831resource group to become exclusive.
    832::
    833
    834  # echo 'L2:0=0xfc;1=0xfc' > schemata
    835  # echo exclusive > p0/mode
    836  # grep . p0/*
    837  p0/cpus:0
    838  p0/mode:exclusive
    839  p0/schemata:L2:0=03;1=03
    840  p0/size:L2:0=262144;1=262144
    841
    842A new resource group will on creation not overlap with an exclusive resource
    843group::
    844
    845  # mkdir p1
    846  # grep . p1/*
    847  p1/cpus:0
    848  p1/mode:shareable
    849  p1/schemata:L2:0=fc;1=fc
    850  p1/size:L2:0=786432;1=786432
    851
    852The bit_usage will reflect how the cache is used::
    853
    854  # cat info/L2/bit_usage
    855  0=SSSSSSEE;1=SSSSSSEE
    856
    857A resource group cannot be forced to overlap with an exclusive resource group::
    858
    859  # echo 'L2:0=0x1;1=0x1' > p1/schemata
    860  -sh: echo: write error: Invalid argument
    861  # cat info/last_cmd_status
    862  overlaps with exclusive group
    863
    864Example of Cache Pseudo-Locking
    865~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    866Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
    867region is exposed at /dev/pseudo_lock/newlock that can be provided to
    868application for argument to mmap().
    869::
    870
    871  # mount -t resctrl resctrl /sys/fs/resctrl/
    872  # cd /sys/fs/resctrl
    873
    874Ensure that there are bits available that can be pseudo-locked, since only
    875unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
    876removed from the default resource group's schemata::
    877
    878  # cat info/L2/bit_usage
    879  0=SSSSSSSS;1=SSSSSSSS
    880  # echo 'L2:1=0xfc' > schemata
    881  # cat info/L2/bit_usage
    882  0=SSSSSSSS;1=SSSSSS00
    883
    884Create a new resource group that will be associated with the pseudo-locked
    885region, indicate that it will be used for a pseudo-locked region, and
    886configure the requested pseudo-locked region capacity bitmask::
    887
    888  # mkdir newlock
    889  # echo pseudo-locksetup > newlock/mode
    890  # echo 'L2:1=0x3' > newlock/schemata
    891
    892On success the resource group's mode will change to pseudo-locked, the
    893bit_usage will reflect the pseudo-locked region, and the character device
    894exposing the pseudo-locked region will exist::
    895
    896  # cat newlock/mode
    897  pseudo-locked
    898  # cat info/L2/bit_usage
    899  0=SSSSSSSS;1=SSSSSSPP
    900  # ls -l /dev/pseudo_lock/newlock
    901  crw------- 1 root root 243, 0 Apr  3 05:01 /dev/pseudo_lock/newlock
    902
    903::
    904
    905  /*
    906  * Example code to access one page of pseudo-locked cache region
    907  * from user space.
    908  */
    909  #define _GNU_SOURCE
    910  #include <fcntl.h>
    911  #include <sched.h>
    912  #include <stdio.h>
    913  #include <stdlib.h>
    914  #include <unistd.h>
    915  #include <sys/mman.h>
    916
    917  /*
    918  * It is required that the application runs with affinity to only
    919  * cores associated with the pseudo-locked region. Here the cpu
    920  * is hardcoded for convenience of example.
    921  */
    922  static int cpuid = 2;
    923
    924  int main(int argc, char *argv[])
    925  {
    926    cpu_set_t cpuset;
    927    long page_size;
    928    void *mapping;
    929    int dev_fd;
    930    int ret;
    931
    932    page_size = sysconf(_SC_PAGESIZE);
    933
    934    CPU_ZERO(&cpuset);
    935    CPU_SET(cpuid, &cpuset);
    936    ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
    937    if (ret < 0) {
    938      perror("sched_setaffinity");
    939      exit(EXIT_FAILURE);
    940    }
    941
    942    dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
    943    if (dev_fd < 0) {
    944      perror("open");
    945      exit(EXIT_FAILURE);
    946    }
    947
    948    mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
    949            dev_fd, 0);
    950    if (mapping == MAP_FAILED) {
    951      perror("mmap");
    952      close(dev_fd);
    953      exit(EXIT_FAILURE);
    954    }
    955
    956    /* Application interacts with pseudo-locked memory @mapping */
    957
    958    ret = munmap(mapping, page_size);
    959    if (ret < 0) {
    960      perror("munmap");
    961      close(dev_fd);
    962      exit(EXIT_FAILURE);
    963    }
    964
    965    close(dev_fd);
    966    exit(EXIT_SUCCESS);
    967  }
    968
    969Locking between applications
    970----------------------------
    971
    972Certain operations on the resctrl filesystem, composed of read/writes
    973to/from multiple files, must be atomic.
    974
    975As an example, the allocation of an exclusive reservation of L3 cache
    976involves:
    977
    978  1. Read the cbmmasks from each directory or the per-resource "bit_usage"
    979  2. Find a contiguous set of bits in the global CBM bitmask that is clear
    980     in any of the directory cbmmasks
    981  3. Create a new directory
    982  4. Set the bits found in step 2 to the new directory "schemata" file
    983
    984If two applications attempt to allocate space concurrently then they can
    985end up allocating the same bits so the reservations are shared instead of
    986exclusive.
    987
    988To coordinate atomic operations on the resctrlfs and to avoid the problem
    989above, the following locking procedure is recommended:
    990
    991Locking is based on flock, which is available in libc and also as a shell
    992script command
    993
    994Write lock:
    995
    996 A) Take flock(LOCK_EX) on /sys/fs/resctrl
    997 B) Read/write the directory structure.
    998 C) funlock
    999
   1000Read lock:
   1001
   1002 A) Take flock(LOCK_SH) on /sys/fs/resctrl
   1003 B) If success read the directory structure.
   1004 C) funlock
   1005
   1006Example with bash::
   1007
   1008  # Atomically read directory structure
   1009  $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
   1010
   1011  # Read directory contents and create new subdirectory
   1012
   1013  $ cat create-dir.sh
   1014  find /sys/fs/resctrl/ > output.txt
   1015  mask = function-of(output.txt)
   1016  mkdir /sys/fs/resctrl/newres/
   1017  echo mask > /sys/fs/resctrl/newres/schemata
   1018
   1019  $ flock /sys/fs/resctrl/ ./create-dir.sh
   1020
   1021Example with C::
   1022
   1023  /*
   1024  * Example code do take advisory locks
   1025  * before accessing resctrl filesystem
   1026  */
   1027  #include <sys/file.h>
   1028  #include <stdlib.h>
   1029
   1030  void resctrl_take_shared_lock(int fd)
   1031  {
   1032    int ret;
   1033
   1034    /* take shared lock on resctrl filesystem */
   1035    ret = flock(fd, LOCK_SH);
   1036    if (ret) {
   1037      perror("flock");
   1038      exit(-1);
   1039    }
   1040  }
   1041
   1042  void resctrl_take_exclusive_lock(int fd)
   1043  {
   1044    int ret;
   1045
   1046    /* release lock on resctrl filesystem */
   1047    ret = flock(fd, LOCK_EX);
   1048    if (ret) {
   1049      perror("flock");
   1050      exit(-1);
   1051    }
   1052  }
   1053
   1054  void resctrl_release_lock(int fd)
   1055  {
   1056    int ret;
   1057
   1058    /* take shared lock on resctrl filesystem */
   1059    ret = flock(fd, LOCK_UN);
   1060    if (ret) {
   1061      perror("flock");
   1062      exit(-1);
   1063    }
   1064  }
   1065
   1066  void main(void)
   1067  {
   1068    int fd, ret;
   1069
   1070    fd = open("/sys/fs/resctrl", O_DIRECTORY);
   1071    if (fd == -1) {
   1072      perror("open");
   1073      exit(-1);
   1074    }
   1075    resctrl_take_shared_lock(fd);
   1076    /* code to read directory contents */
   1077    resctrl_release_lock(fd);
   1078
   1079    resctrl_take_exclusive_lock(fd);
   1080    /* code to read and write directory contents */
   1081    resctrl_release_lock(fd);
   1082  }
   1083
   1084Examples for RDT Monitoring along with allocation usage
   1085=======================================================
   1086Reading monitored data
   1087----------------------
   1088Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would
   1089show the current snapshot of LLC occupancy of the corresponding MON
   1090group or CTRL_MON group.
   1091
   1092
   1093Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
   1094------------------------------------------------------------------------
   1095On a two socket machine (one L3 cache per socket) with just four bits
   1096for cache bit masks::
   1097
   1098  # mount -t resctrl resctrl /sys/fs/resctrl
   1099  # cd /sys/fs/resctrl
   1100  # mkdir p0 p1
   1101  # echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
   1102  # echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
   1103  # echo 5678 > p1/tasks
   1104  # echo 5679 > p1/tasks
   1105
   1106The default resource group is unmodified, so we have access to all parts
   1107of all caches (its schemata file reads "L3:0=f;1=f").
   1108
   1109Tasks that are under the control of group "p0" may only allocate from the
   1110"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
   1111Tasks in group "p1" use the "lower" 50% of cache on both sockets.
   1112
   1113Create monitor groups and assign a subset of tasks to each monitor group.
   1114::
   1115
   1116  # cd /sys/fs/resctrl/p1/mon_groups
   1117  # mkdir m11 m12
   1118  # echo 5678 > m11/tasks
   1119  # echo 5679 > m12/tasks
   1120
   1121fetch data (data shown in bytes)
   1122::
   1123
   1124  # cat m11/mon_data/mon_L3_00/llc_occupancy
   1125  16234000
   1126  # cat m11/mon_data/mon_L3_01/llc_occupancy
   1127  14789000
   1128  # cat m12/mon_data/mon_L3_00/llc_occupancy
   1129  16789000
   1130
   1131The parent ctrl_mon group shows the aggregated data.
   1132::
   1133
   1134  # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
   1135  31234000
   1136
   1137Example 2 (Monitor a task from its creation)
   1138--------------------------------------------
   1139On a two socket machine (one L3 cache per socket)::
   1140
   1141  # mount -t resctrl resctrl /sys/fs/resctrl
   1142  # cd /sys/fs/resctrl
   1143  # mkdir p0 p1
   1144
   1145An RMID is allocated to the group once its created and hence the <cmd>
   1146below is monitored from its creation.
   1147::
   1148
   1149  # echo $$ > /sys/fs/resctrl/p1/tasks
   1150  # <cmd>
   1151
   1152Fetch the data::
   1153
   1154  # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
   1155  31789000
   1156
   1157Example 3 (Monitor without CAT support or before creating CAT groups)
   1158---------------------------------------------------------------------
   1159
   1160Assume a system like HSW has only CQM and no CAT support. In this case
   1161the resctrl will still mount but cannot create CTRL_MON directories.
   1162But user can create different MON groups within the root group thereby
   1163able to monitor all tasks including kernel threads.
   1164
   1165This can also be used to profile jobs cache size footprint before being
   1166able to allocate them to different allocation groups.
   1167::
   1168
   1169  # mount -t resctrl resctrl /sys/fs/resctrl
   1170  # cd /sys/fs/resctrl
   1171  # mkdir mon_groups/m01
   1172  # mkdir mon_groups/m02
   1173
   1174  # echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks
   1175  # echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks
   1176
   1177Monitor the groups separately and also get per domain data. From the
   1178below its apparent that the tasks are mostly doing work on
   1179domain(socket) 0.
   1180::
   1181
   1182  # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy
   1183  31234000
   1184  # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy
   1185  34555
   1186  # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy
   1187  31234000
   1188  # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy
   1189  32789
   1190
   1191
   1192Example 4 (Monitor real time tasks)
   1193-----------------------------------
   1194
   1195A single socket system which has real time tasks running on cores 4-7
   1196and non real time tasks on other cpus. We want to monitor the cache
   1197occupancy of the real time threads on these cores.
   1198::
   1199
   1200  # mount -t resctrl resctrl /sys/fs/resctrl
   1201  # cd /sys/fs/resctrl
   1202  # mkdir p1
   1203
   1204Move the cpus 4-7 over to p1::
   1205
   1206  # echo f0 > p1/cpus
   1207
   1208View the llc occupancy snapshot::
   1209
   1210  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
   1211  11234000
   1212
   1213Intel RDT Errata
   1214================
   1215
   1216Intel MBM Counters May Report System Memory Bandwidth Incorrectly
   1217-----------------------------------------------------------------
   1218
   1219Errata SKX99 for Skylake server and BDF102 for Broadwell server.
   1220
   1221Problem: Intel Memory Bandwidth Monitoring (MBM) counters track metrics
   1222according to the assigned Resource Monitor ID (RMID) for that logical
   1223core. The IA32_QM_CTR register (MSR 0xC8E), used to report these
   1224metrics, may report incorrect system bandwidth for certain RMID values.
   1225
   1226Implication: Due to the errata, system memory bandwidth may not match
   1227what is reported.
   1228
   1229Workaround: MBM total and local readings are corrected according to the
   1230following correction factor table:
   1231
   1232+---------------+---------------+---------------+-----------------+
   1233|core count	|rmid count	|rmid threshold	|correction factor|
   1234+---------------+---------------+---------------+-----------------+
   1235|1		|8		|0		|1.000000	  |
   1236+---------------+---------------+---------------+-----------------+
   1237|2		|16		|0		|1.000000	  |
   1238+---------------+---------------+---------------+-----------------+
   1239|3		|24		|15		|0.969650	  |
   1240+---------------+---------------+---------------+-----------------+
   1241|4		|32		|0		|1.000000	  |
   1242+---------------+---------------+---------------+-----------------+
   1243|6		|48		|31		|0.969650	  |
   1244+---------------+---------------+---------------+-----------------+
   1245|7		|56		|47		|1.142857	  |
   1246+---------------+---------------+---------------+-----------------+
   1247|8		|64		|0		|1.000000	  |
   1248+---------------+---------------+---------------+-----------------+
   1249|9		|72		|63		|1.185115	  |
   1250+---------------+---------------+---------------+-----------------+
   1251|10		|80		|63		|1.066553	  |
   1252+---------------+---------------+---------------+-----------------+
   1253|11		|88		|79		|1.454545	  |
   1254+---------------+---------------+---------------+-----------------+
   1255|12		|96		|0		|1.000000	  |
   1256+---------------+---------------+---------------+-----------------+
   1257|13		|104		|95		|1.230769	  |
   1258+---------------+---------------+---------------+-----------------+
   1259|14		|112		|95		|1.142857	  |
   1260+---------------+---------------+---------------+-----------------+
   1261|15		|120		|95		|1.066667	  |
   1262+---------------+---------------+---------------+-----------------+
   1263|16		|128		|0		|1.000000	  |
   1264+---------------+---------------+---------------+-----------------+
   1265|17		|136		|127		|1.254863	  |
   1266+---------------+---------------+---------------+-----------------+
   1267|18		|144		|127		|1.185255	  |
   1268+---------------+---------------+---------------+-----------------+
   1269|19		|152		|0		|1.000000	  |
   1270+---------------+---------------+---------------+-----------------+
   1271|20		|160		|127		|1.066667	  |
   1272+---------------+---------------+---------------+-----------------+
   1273|21		|168		|0		|1.000000	  |
   1274+---------------+---------------+---------------+-----------------+
   1275|22		|176		|159		|1.454334	  |
   1276+---------------+---------------+---------------+-----------------+
   1277|23		|184		|0		|1.000000	  |
   1278+---------------+---------------+---------------+-----------------+
   1279|24		|192		|127		|0.969744	  |
   1280+---------------+---------------+---------------+-----------------+
   1281|25		|200		|191		|1.280246	  |
   1282+---------------+---------------+---------------+-----------------+
   1283|26		|208		|191		|1.230921	  |
   1284+---------------+---------------+---------------+-----------------+
   1285|27		|216		|0		|1.000000	  |
   1286+---------------+---------------+---------------+-----------------+
   1287|28		|224		|191		|1.143118	  |
   1288+---------------+---------------+---------------+-----------------+
   1289
   1290If rmid > rmid threshold, MBM total and local values should be multiplied
   1291by the correction factor.
   1292
   1293See:
   1294
   12951. Erratum SKX99 in Intel Xeon Processor Scalable Family Specification Update:
   1296http://web.archive.org/web/20200716124958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
   1297
   12982. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family Specification Update:
   1299http://web.archive.org/web/20191125200531/https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf
   1300
   13013. The errata in Intel Resource Director Technology (Intel RDT) on 2nd Generation Intel Xeon Scalable Processors Reference Manual:
   1302https://software.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-manual.html
   1303
   1304for further information.