cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

numaperf.rst (7422B)


      1.. _numaperf:
      2
      3=============
      4NUMA Locality
      5=============
      6
      7Some platforms may have multiple types of memory attached to a compute
      8node. These disparate memory ranges may share some characteristics, such
      9as CPU cache coherence, but may have different performance. For example,
     10different media types and buses affect bandwidth and latency.
     11
     12A system supports such heterogeneous memory by grouping each memory type
     13under different domains, or "nodes", based on locality and performance
     14characteristics.  Some memory may share the same node as a CPU, and others
     15are provided as memory only nodes. While memory only nodes do not provide
     16CPUs, they may still be local to one or more compute nodes relative to
     17other nodes. The following diagram shows one such example of two compute
     18nodes with local memory and a memory only node for each of compute node::
     19
     20 +------------------+     +------------------+
     21 | Compute Node 0   +-----+ Compute Node 1   |
     22 | Local Node0 Mem  |     | Local Node1 Mem  |
     23 +--------+---------+     +--------+---------+
     24          |                        |
     25 +--------+---------+     +--------+---------+
     26 | Slower Node2 Mem |     | Slower Node3 Mem |
     27 +------------------+     +--------+---------+
     28
     29A "memory initiator" is a node containing one or more devices such as
     30CPUs or separate memory I/O devices that can initiate memory requests.
     31A "memory target" is a node containing one or more physical address
     32ranges accessible from one or more memory initiators.
     33
     34When multiple memory initiators exist, they may not all have the same
     35performance when accessing a given memory target. Each initiator-target
     36pair may be organized into different ranked access classes to represent
     37this relationship. The highest performing initiator to a given target
     38is considered to be one of that target's local initiators, and given
     39the highest access class, 0. Any given target may have one or more
     40local initiators, and any given initiator may have multiple local
     41memory targets.
     42
     43To aid applications matching memory targets with their initiators, the
     44kernel provides symlinks to each other. The following example lists the
     45relationship for the access class "0" memory initiators and targets::
     46
     47	# symlinks -v /sys/devices/system/node/nodeX/access0/targets/
     48	relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
     49
     50	# symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
     51	relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
     52
     53A memory initiator may have multiple memory targets in the same access
     54class. The target memory's initiators in a given class indicate the
     55nodes' access characteristics share the same performance relative to other
     56linked initiator nodes. Each target within an initiator's access class,
     57though, do not necessarily perform the same as each other.
     58
     59The access class "1" is used to allow differentiation between initiators
     60that are CPUs and hence suitable for generic task scheduling, and
     61IO initiators such as GPUs and NICs.  Unlike access class 0, only
     62nodes containing CPUs are considered.
     63
     64================
     65NUMA Performance
     66================
     67
     68Applications may wish to consider which node they want their memory to
     69be allocated from based on the node's performance characteristics. If
     70the system provides these attributes, the kernel exports them under the
     71node sysfs hierarchy by appending the attributes directory under the
     72memory node's access class 0 initiators as follows::
     73
     74	/sys/devices/system/node/nodeY/access0/initiators/
     75
     76These attributes apply only when accessed from nodes that have the
     77are linked under the this access's initiators.
     78
     79The performance characteristics the kernel provides for the local initiators
     80are exported are as follows::
     81
     82	# tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/
     83	/sys/devices/system/node/nodeY/access0/initiators/
     84	|-- read_bandwidth
     85	|-- read_latency
     86	|-- write_bandwidth
     87	`-- write_latency
     88
     89The bandwidth attributes are provided in MiB/second.
     90
     91The latency attributes are provided in nanoseconds.
     92
     93The values reported here correspond to the rated latency and bandwidth
     94for the platform.
     95
     96Access class 1 takes the same form but only includes values for CPU to
     97memory activity.
     98
     99==========
    100NUMA Cache
    101==========
    102
    103System memory may be constructed in a hierarchy of elements with various
    104performance characteristics in order to provide large address space of
    105slower performing memory cached by a smaller higher performing memory. The
    106system physical addresses memory  initiators are aware of are provided
    107by the last memory level in the hierarchy. The system meanwhile uses
    108higher performing memory to transparently cache access to progressively
    109slower levels.
    110
    111The term "far memory" is used to denote the last level memory in the
    112hierarchy. Each increasing cache level provides higher performing
    113initiator access, and the term "near memory" represents the fastest
    114cache provided by the system.
    115
    116This numbering is different than CPU caches where the cache level (ex:
    117L1, L2, L3) uses the CPU-side view where each increased level is lower
    118performing. In contrast, the memory cache level is centric to the last
    119level memory, so the higher numbered cache level corresponds to  memory
    120nearer to the CPU, and further from far memory.
    121
    122The memory-side caches are not directly addressable by software. When
    123software accesses a system address, the system will return it from the
    124near memory cache if it is present. If it is not present, the system
    125accesses the next level of memory until there is either a hit in that
    126cache level, or it reaches far memory.
    127
    128An application does not need to know about caching attributes in order
    129to use the system. Software may optionally query the memory cache
    130attributes in order to maximize the performance out of such a setup.
    131If the system provides a way for the kernel to discover this information,
    132for example with ACPI HMAT (Heterogeneous Memory Attribute Table),
    133the kernel will append these attributes to the NUMA node memory target.
    134
    135When the kernel first registers a memory cache with a node, the kernel
    136will create the following directory::
    137
    138	/sys/devices/system/node/nodeX/memory_side_cache/
    139
    140If that directory is not present, the system either does not provide
    141a memory-side cache, or that information is not accessible to the kernel.
    142
    143The attributes for each level of cache is provided under its cache
    144level index::
    145
    146	/sys/devices/system/node/nodeX/memory_side_cache/indexA/
    147	/sys/devices/system/node/nodeX/memory_side_cache/indexB/
    148	/sys/devices/system/node/nodeX/memory_side_cache/indexC/
    149
    150Each cache level's directory provides its attributes. For example, the
    151following shows a single cache level and the attributes available for
    152software to query::
    153
    154	# tree /sys/devices/system/node/node0/memory_side_cache/
    155	/sys/devices/system/node/node0/memory_side_cache/
    156	|-- index1
    157	|   |-- indexing
    158	|   |-- line_size
    159	|   |-- size
    160	|   `-- write_policy
    161
    162The "indexing" will be 0 if it is a direct-mapped cache, and non-zero
    163for any other indexed based, multi-way associativity.
    164
    165The "line_size" is the number of bytes accessed from the next cache
    166level on a miss.
    167
    168The "size" is the number of bytes provided by this cache level.
    169
    170The "write_policy" will be 0 for write-back, and non-zero for
    171write-through caching.
    172
    173========
    174See Also
    175========
    176
    177[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
    178- Section 5.2.27