cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

intel-hybrid.txt (7301B)


      1Intel hybrid support
      2--------------------
      3Support for Intel hybrid events within perf tools.
      4
      5For some Intel platforms, such as AlderLake, which is hybrid platform and
      6it consists of atom cpu and core cpu. Each cpu has dedicated event list.
      7Part of events are available on core cpu, part of events are available
      8on atom cpu and even part of events are available on both.
      9
     10Kernel exports two new cpu pmus via sysfs:
     11/sys/devices/cpu_core
     12/sys/devices/cpu_atom
     13
     14The 'cpus' files are created under the directories. For example,
     15
     16cat /sys/devices/cpu_core/cpus
     170-15
     18
     19cat /sys/devices/cpu_atom/cpus
     2016-23
     21
     22It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
     23
     24Quickstart
     25
     26List hybrid event
     27-----------------
     28
     29As before, use perf-list to list the symbolic event.
     30
     31perf list
     32
     33inst_retired.any
     34	[Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
     35inst_retired.any
     36	[Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
     37
     38The 'Unit: xxx' is added to brief description to indicate which pmu
     39the event is belong to. Same event name but with different pmu can
     40be supported.
     41
     42Enable hybrid event with a specific pmu
     43---------------------------------------
     44
     45To enable a core only event or atom only event, following syntax is supported:
     46
     47	cpu_core/<event name>/
     48or
     49	cpu_atom/<event name>/
     50
     51For example, count the 'cycles' event on core cpus.
     52
     53	perf stat -e cpu_core/cycles/
     54
     55Create two events for one hardware event automatically
     56------------------------------------------------------
     57
     58When creating one event and the event is available on both atom and core,
     59two events are created automatically. One is for atom, the other is for
     60core. Most of hardware events and cache events are available on both
     61cpu_core and cpu_atom.
     62
     63For hardware events, they have pre-defined configs (e.g. 0 for cycles).
     64But on hybrid platform, kernel needs to know where the event comes from
     65(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
     66can't carry pmu information. So now this type is extended to be PMU aware
     67type. The PMU type ID is stored at attr.config[63:32].
     68
     69PMU type ID is retrieved from sysfs.
     70/sys/devices/cpu_atom/type
     71/sys/devices/cpu_core/type
     72
     73The new attr.config layout for PERF_TYPE_HARDWARE:
     74
     75PERF_TYPE_HARDWARE:                 0xEEEEEEEE000000AA
     76                                    AA: hardware event ID
     77                                    EEEEEEEE: PMU type ID
     78
     79Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
     80PMU aware type. The PMU type ID is stored at attr.config[63:32].
     81
     82The new attr.config layout for PERF_TYPE_HW_CACHE:
     83
     84PERF_TYPE_HW_CACHE:                 0xEEEEEEEE00DDCCBB
     85                                    BB: hardware cache ID
     86                                    CC: hardware cache op ID
     87                                    DD: hardware cache op result ID
     88                                    EEEEEEEE: PMU type ID
     89
     90When enabling a hardware event without specified pmu, such as,
     91perf stat -e cycles -a (use system-wide in this example), two events
     92are created automatically.
     93
     94  ------------------------------------------------------------
     95  perf_event_attr:
     96    size                             120
     97    config                           0x400000000
     98    sample_type                      IDENTIFIER
     99    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    100    disabled                         1
    101    inherit                          1
    102    exclude_guest                    1
    103  ------------------------------------------------------------
    104
    105and
    106
    107  ------------------------------------------------------------
    108  perf_event_attr:
    109    size                             120
    110    config                           0x800000000
    111    sample_type                      IDENTIFIER
    112    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    113    disabled                         1
    114    inherit                          1
    115    exclude_guest                    1
    116  ------------------------------------------------------------
    117
    118type 0 is PERF_TYPE_HARDWARE.
    1190x4 in 0x400000000 indicates it's cpu_core pmu.
    1200x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random).
    121
    122The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
    123and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus).
    124
    125For perf-stat result, it displays two events:
    126
    127 Performance counter stats for 'system wide':
    128
    129           6,744,979      cpu_core/cycles/
    130           1,965,552      cpu_atom/cycles/
    131
    132The first 'cycles' is core event, the second 'cycles' is atom event.
    133
    134Thread mode example:
    135--------------------
    136
    137perf-stat reports the scaled counts for hybrid event and with a percentage
    138displayed. The percentage is the event's running time/enabling time.
    139
    140One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
    141scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
    142
    143perf stat -e cycles \-- taskset -c 16 ./triad_loop
    144
    145As previous, two events are created.
    146
    147------------------------------------------------------------
    148perf_event_attr:
    149  size                             120
    150  config                           0x400000000
    151  sample_type                      IDENTIFIER
    152  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    153  disabled                         1
    154  inherit                          1
    155  enable_on_exec                   1
    156  exclude_guest                    1
    157------------------------------------------------------------
    158
    159and
    160
    161------------------------------------------------------------
    162perf_event_attr:
    163  size                             120
    164  config                           0x800000000
    165  sample_type                      IDENTIFIER
    166  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    167  disabled                         1
    168  inherit                          1
    169  enable_on_exec                   1
    170  exclude_guest                    1
    171------------------------------------------------------------
    172
    173 Performance counter stats for 'taskset -c 16 ./triad_loop':
    174
    175       233,066,666      cpu_core/cycles/                                              (0.43%)
    176       604,097,080      cpu_atom/cycles/                                              (99.57%)
    177
    178perf-record:
    179------------
    180
    181If there is no '-e' specified in perf record, on hybrid platform,
    182it creates two default 'cycles' and adds them to event list. One
    183is for core, the other is for atom.
    184
    185perf-stat:
    186----------
    187
    188If there is no '-e' specified in perf stat, on hybrid platform,
    189besides of software events, following events are created and
    190added to event list in order.
    191
    192cpu_core/cycles/,
    193cpu_atom/cycles/,
    194cpu_core/instructions/,
    195cpu_atom/instructions/,
    196cpu_core/branches/,
    197cpu_atom/branches/,
    198cpu_core/branch-misses/,
    199cpu_atom/branch-misses/
    200
    201Of course, both perf-stat and perf-record support to enable
    202hybrid event with a specific pmu.
    203
    204e.g.
    205perf stat -e cpu_core/cycles/
    206perf stat -e cpu_atom/cycles/
    207perf stat -e cpu_core/r1a/
    208perf stat -e cpu_atom/L1-icache-loads/
    209perf stat -e cpu_core/cycles/,cpu_atom/instructions/
    210perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
    211
    212But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
    213warning and disable grouping, because the pmus in group are
    214not matched (cpu_core vs. cpu_atom).