cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

imc.rst (6429B)


      1.. SPDX-License-Identifier: GPL-2.0
      2.. _imc:
      3
      4===================================
      5IMC (In-Memory Collection Counters)
      6===================================
      7
      8Anju T Sudhakar, 10 May 2019
      9
     10.. contents::
     11    :depth: 3
     12
     13
     14Basic overview
     15==============
     16
     17IMC (In-Memory collection counters) is a hardware monitoring facility that
     18collects large numbers of hardware performance events at Nest level (these are
     19on-chip but off-core), Core level and Thread level.
     20
     21The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC
     22(On-Chip Controller) complex. The microcode collects the counter data and moves
     23the nest IMC counter data to memory.
     24
     25The Core and Thread IMC PMU counters are handled in the core. Core level PMU
     26counters give us the IMC counters' data per core and thread level PMU counters
     27give us the IMC counters' data per CPU thread.
     28
     29OPAL obtains the IMC PMU and supported events information from the IMC Catalog
     30and passes on to the kernel via the device tree. The event's information
     31contains:
     32
     33- Event name
     34- Event Offset
     35- Event description
     36
     37and possibly also:
     38
     39- Event scale
     40- Event unit
     41
     42Some PMUs may have a common scale and unit values for all their supported
     43events. For those cases, the scale and unit properties for those events must be
     44inherited from the PMU.
     45
     46The event offset in the memory is where the counter data gets accumulated.
     47
     48IMC catalog is available at:
     49	https://github.com/open-power/ima-catalog
     50
     51The kernel discovers the IMC counters information in the device tree at the
     52`imc-counters` device node which has a compatible field
     53`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs
     54and their event's information and register the PMU and its attributes in the
     55kernel.
     56
     57IMC example usage
     58=================
     59
     60.. code-block:: sh
     61
     62  # perf list
     63  [...]
     64  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/            [Kernel PMU event]
     65  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/            [Kernel PMU event]
     66  [...]
     67  core_imc/CPM_0THRD_NON_IDLE_PCYC/                  [Kernel PMU event]
     68  core_imc/CPM_1THRD_NON_IDLE_INST/                  [Kernel PMU event]
     69  [...]
     70  thread_imc/CPM_0THRD_NON_IDLE_PCYC/                [Kernel PMU event]
     71  thread_imc/CPM_1THRD_NON_IDLE_INST/                [Kernel PMU event]
     72
     73To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
     74
     75.. code-block:: sh
     76
     77  # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
     78
     79To see non-idle instructions for core 0:
     80
     81.. code-block:: sh
     82
     83  # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
     84
     85To see non-idle instructions for a "make":
     86
     87.. code-block:: sh
     88
     89  # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
     90
     91
     92IMC Trace-mode
     93===============
     94
     95POWER9 supports two modes for IMC which are the Accumulation mode and Trace
     96mode. In Accumulation mode, event counts are accumulated in system Memory.
     97Hypervisor then reads the posted counts periodically or when requested. In IMC
     98Trace mode, the 64 bit trace SCOM value is initialized with the event
     99information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event
    100to be monitored and the sampling duration. On each overflow in the CPMCxSEL,
    101hardware snapshots the program counter along with event counts and writes into
    102memory pointed by LDBAR.
    103
    104LDBAR is a 64 bit special purpose per thread register, it has bits to indicate
    105whether hardware is configured for accumulation or trace mode.
    106
    107LDBAR Register Layout
    108---------------------
    109
    110  +-------+----------------------+
    111  | 0     | Enable/Disable       |
    112  +-------+----------------------+
    113  | 1     | 0: Accumulation Mode |
    114  |       +----------------------+
    115  |       | 1: Trace Mode        |
    116  +-------+----------------------+
    117  | 2:3   | Reserved             |
    118  +-------+----------------------+
    119  | 4-6   | PB scope             |
    120  +-------+----------------------+
    121  | 7     | Reserved             |
    122  +-------+----------------------+
    123  | 8:50  | Counter Address      |
    124  +-------+----------------------+
    125  | 51:63 | Reserved             |
    126  +-------+----------------------+
    127
    128TRACE_IMC_SCOM bit representation
    129---------------------------------
    130
    131  +-------+------------+
    132  | 0:1   | SAMPSEL    |
    133  +-------+------------+
    134  | 2:33  | CPMC_LOAD  |
    135  +-------+------------+
    136  | 34:40 | CPMC1SEL   |
    137  +-------+------------+
    138  | 41:47 | CPMC2SEL   |
    139  +-------+------------+
    140  | 48:50 | BUFFERSIZE |
    141  +-------+------------+
    142  | 51:63 | RESERVED   |
    143  +-------+------------+
    144
    145CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the
    146event to count. BUFFERSIZE indicates the memory range. On each overflow,
    147hardware snapshots the program counter along with event counts and updates the
    148memory and reloads the CMPC_LOAD value for the next sampling duration. IMC
    149hardware does not support exceptions, so it quietly wraps around if memory
    150buffer reaches the end.
    151
    152*Currently the event monitored for trace-mode is fixed as cycle.*
    153
    154Trace IMC example usage
    155=======================
    156
    157.. code-block:: sh
    158
    159  # perf list
    160  [....]
    161  trace_imc/trace_cycles/                            [Kernel PMU event]
    162
    163To record an application/process with trace-imc event:
    164
    165.. code-block:: sh
    166
    167  # perf record -e trace_imc/trace_cycles/ yes > /dev/null
    168  [ perf record: Woken up 1 times to write data ]
    169  [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
    170
    171The `perf.data` generated, can be read using perf report.
    172
    173Benefits of using IMC trace-mode
    174================================
    175
    176PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC
    177trace mode snapshots the program counter and updates to the memory. And this
    178also provide a way for the operating system to do instruction sampling in real
    179time without PMI processing overhead.
    180
    181Performance data using `perf top` with and without trace-imc event.
    182
    183PMI interrupts count when `perf top` command is executed without trace-imc event.
    184
    185.. code-block:: sh
    186
    187  # grep PMI /proc/interrupts
    188  PMI:          0          0          0          0   Performance monitoring interrupts
    189  # ./perf top
    190  ...
    191  # grep PMI /proc/interrupts
    192  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
    193  # ./perf top -e trace_imc/trace_cycles/
    194  ...
    195  # grep PMI /proc/interrupts
    196  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
    197
    198
    199That is, the PMI interrupt counts do not increment when using the `trace_imc` event.