cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

perf-list.txt (11997B)


      1perf-list(1)
      2============
      3
      4NAME
      5----
      6perf-list - List all symbolic event types
      7
      8SYNOPSIS
      9--------
     10[verse]
     11'perf list' [--no-desc] [--long-desc]
     12            [hw|sw|cache|tracepoint|pmu|sdt|metric|metricgroup|event_glob]
     13
     14DESCRIPTION
     15-----------
     16This command displays the symbolic event types which can be selected in the
     17various perf commands with the -e option.
     18
     19OPTIONS
     20-------
     21-d::
     22--desc::
     23Print extra event descriptions. (default)
     24
     25--no-desc::
     26Don't print descriptions.
     27
     28-v::
     29--long-desc::
     30Print longer event descriptions.
     31
     32--debug::
     33Enable debugging output.
     34
     35--details::
     36Print how named events are resolved internally into perf events, and also
     37any extra expressions computed by perf stat.
     38
     39--deprecated::
     40Print deprecated events. By default the deprecated events are hidden.
     41
     42--cputype::
     43Print events applying cpu with this type for hybrid platform
     44(e.g. --cputype core or --cputype atom)
     45
     46[[EVENT_MODIFIERS]]
     47EVENT MODIFIERS
     48---------------
     49
     50Events can optionally have a modifier by appending a colon and one or
     51more modifiers. Modifiers allow the user to restrict the events to be
     52counted. The following modifiers exist:
     53
     54 u - user-space counting
     55 k - kernel counting
     56 h - hypervisor counting
     57 I - non idle counting
     58 G - guest counting (in KVM guests)
     59 H - host counting (not in KVM guests)
     60 p - precise level
     61 P - use maximum detected precise level
     62 S - read sample value (PERF_SAMPLE_READ)
     63 D - pin the event to the PMU
     64 W - group is weak and will fallback to non-group if not schedulable,
     65 e - group or event are exclusive and do not share the PMU
     66
     67The 'p' modifier can be used for specifying how precise the instruction
     68address should be. The 'p' modifier can be specified multiple times:
     69
     70 0 - SAMPLE_IP can have arbitrary skid
     71 1 - SAMPLE_IP must have constant skid
     72 2 - SAMPLE_IP requested to have 0 skid
     73 3 - SAMPLE_IP must have 0 skid, or uses randomization to avoid
     74     sample shadowing effects.
     75
     76For Intel systems precise event sampling is implemented with PEBS
     77which supports up to precise-level 2, and precise level 3 for
     78some special cases
     79
     80On AMD systems it is implemented using IBS (up to precise-level 2).
     81The precise modifier works with event types 0x76 (cpu-cycles, CPU
     82clocks not halted) and 0xC1 (micro-ops retired). Both events map to
     83IBS execution sampling (IBS op) with the IBS Op Counter Control bit
     84(IbsOpCntCtl) set respectively (see the
     85Core Complex (CCX) -> Processor x86 Core -> Instruction Based Sampling (IBS)
     86section of the [AMD Processor Programming Reference (PPR)] relevant to the
     87family, model and stepping of the processor being used).
     88
     89Manual Volume 2: System Programming, 13.3 Instruction-Based
     90Sampling). Examples to use IBS:
     91
     92 perf record -a -e cpu-cycles:p ...    # use ibs op counting cycles
     93 perf record -a -e r076:p ...          # same as -e cpu-cycles:p
     94 perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
     95
     96RAW HARDWARE EVENT DESCRIPTOR
     97-----------------------------
     98Even when an event is not available in a symbolic form within perf right now,
     99it can be encoded in a per processor specific way.
    100
    101For instance on x86 CPUs, N is a hexadecimal value that represents the raw register encoding with the
    102layout of IA32_PERFEVTSELx MSRs (see [IntelĀ® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide] Figure 30-1 Layout
    103of IA32_PERFEVTSELx MSRs) or AMD's PERF_CTL MSRs (see the
    104Core Complex (CCX) -> Processor x86 Core -> MSR Registers section of the
    105[AMD Processor Programming Reference (PPR)] relevant to the family, model
    106and stepping of the processor being used).
    107
    108Note: Only the following bit fields can be set in x86 counter
    109registers: event, umask, edge, inv, cmask. Esp. guest/host only and
    110OS/user mode flags must be setup using <<EVENT_MODIFIERS, EVENT
    111MODIFIERS>>.
    112
    113Example:
    114
    115If the Intel docs for a QM720 Core i7 describe an event as:
    116
    117  Event  Umask  Event Mask
    118  Num.   Value  Mnemonic    Description                        Comment
    119
    120  A8H      01H  LSD.UOPS    Counts the number of micro-ops     Use cmask=1 and
    121                            delivered by loop stream detector  invert to count
    122                                                               cycles
    123
    124raw encoding of 0x1A8 can be used:
    125
    126 perf stat -e r1a8 -a sleep 1
    127 perf record -e r1a8 ...
    128
    129It's also possible to use pmu syntax:
    130
    131 perf record -e r1a8 -a sleep 1
    132 perf record -e cpu/r1a8/ ...
    133 perf record -e cpu/r0x1a8/ ...
    134
    135Some processors, like those from AMD, support event codes and unit masks
    136larger than a byte. In such cases, the bits corresponding to the event
    137configuration parameters can be seen with:
    138
    139  cat /sys/bus/event_source/devices/<pmu>/format/<config>
    140
    141Example:
    142
    143If the AMD docs for an EPYC 7713 processor describe an event as:
    144
    145  Event  Umask  Event Mask
    146  Num.   Value  Mnemonic                        Description
    147
    148  28FH     03H  op_cache_hit_miss.op_cache_hit  Counts Op Cache micro-tag
    149                                                hit events.
    150
    151raw encoding of 0x0328F cannot be used since the upper nibble of the
    152EventSelect bits have to be specified via bits 32-35 as can be seen with:
    153
    154  cat /sys/bus/event_source/devices/cpu/format/event
    155
    156raw encoding of 0x20000038F should be used instead:
    157
    158 perf stat -e r20000038f -a sleep 1
    159 perf record -e r20000038f ...
    160
    161It's also possible to use pmu syntax:
    162
    163 perf record -e r20000038f -a sleep 1
    164 perf record -e cpu/r20000038f/ ...
    165 perf record -e cpu/r0x20000038f/ ...
    166
    167You should refer to the processor specific documentation for getting these
    168details. Some of them are referenced in the SEE ALSO section below.
    169
    170ARBITRARY PMUS
    171--------------
    172
    173perf also supports an extended syntax for specifying raw parameters
    174to PMUs. Using this typically requires looking up the specific event
    175in the CPU vendor specific documentation.
    176
    177The available PMUs and their raw parameters can be listed with
    178
    179  ls /sys/devices/*/format
    180
    181For example the raw event "LSD.UOPS" core pmu event above could
    182be specified as
    183
    184  perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=0x1/ ...
    185
    186  or using extended name syntax
    187
    188  perf stat -e cpu/event=0xa8,umask=0x1,cmask=0x1,name=\'LSD.UOPS_CYCLES:cmask=0x1\'/ ...
    189
    190PER SOCKET PMUS
    191---------------
    192
    193Some PMUs are not associated with a core, but with a whole CPU socket.
    194Events on these PMUs generally cannot be sampled, but only counted globally
    195with perf stat -a. They can be bound to one logical CPU, but will measure
    196all the CPUs in the same socket.
    197
    198This example measures memory bandwidth every second
    199on the first memory controller on socket 0 of a Intel Xeon system
    200
    201  perf stat -C 0 -a uncore_imc_0/cas_count_read/,uncore_imc_0/cas_count_write/ -I 1000 ...
    202
    203Each memory controller has its own PMU.  Measuring the complete system
    204bandwidth would require specifying all imc PMUs (see perf list output),
    205and adding the values together. To simplify creation of multiple events,
    206prefix and glob matching is supported in the PMU name, and the prefix
    207'uncore_' is also ignored when performing the match. So the command above
    208can be expanded to all memory controllers by using the syntaxes:
    209
    210  perf stat -C 0 -a imc/cas_count_read/,imc/cas_count_write/ -I 1000 ...
    211  perf stat -C 0 -a *imc*/cas_count_read/,*imc*/cas_count_write/ -I 1000 ...
    212
    213This example measures the combined core power every second
    214
    215  perf stat -I 1000 -e power/energy-cores/  -a
    216
    217ACCESS RESTRICTIONS
    218-------------------
    219
    220For non root users generally only context switched PMU events are available.
    221This is normally only the events in the cpu PMU, the predefined events
    222like cycles and instructions and some software events.
    223
    224Other PMUs and global measurements are normally root only.
    225Some event qualifiers, such as "any", are also root only.
    226
    227This can be overridden by setting the kernel.perf_event_paranoid
    228sysctl to -1, which allows non root to use these events.
    229
    230For accessing trace point events perf needs to have read access to
    231/sys/kernel/debug/tracing, even when perf_event_paranoid is in a relaxed
    232setting.
    233
    234TRACING
    235-------
    236
    237Some PMUs control advanced hardware tracing capabilities, such as Intel PT,
    238that allows low overhead execution tracing.  These are described in a separate
    239intel-pt.txt document.
    240
    241PARAMETERIZED EVENTS
    242--------------------
    243
    244Some pmu events listed by 'perf-list' will be displayed with '?' in them. For
    245example:
    246
    247  hv_gpci/dtbp_ptitc,phys_processor_idx=?/
    248
    249This means that when provided as an event, a value for '?' must
    250also be supplied. For example:
    251
    252  perf stat -C 0 -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ...
    253
    254EVENT QUALIFIERS:
    255
    256It is also possible to add extra qualifiers to an event:
    257
    258percore:
    259
    260Sums up the event counts for all hardware threads in a core, e.g.:
    261
    262
    263  perf stat -e cpu/event=0,umask=0x3,percore=1/
    264
    265
    266EVENT GROUPS
    267------------
    268
    269Perf supports time based multiplexing of events, when the number of events
    270active exceeds the number of hardware performance counters. Multiplexing
    271can cause measurement errors when the workload changes its execution
    272profile.
    273
    274When metrics are computed using formulas from event counts, it is useful to
    275ensure some events are always measured together as a group to minimize multiplexing
    276errors. Event groups can be specified using { }.
    277
    278  perf stat -e '{instructions,cycles}' ...
    279
    280The number of available performance counters depend on the CPU. A group
    281cannot contain more events than available counters.
    282For example Intel Core CPUs typically have four generic performance counters
    283for the core, plus three fixed counters for instructions, cycles and
    284ref-cycles. Some special events have restrictions on which counter they
    285can schedule, and may not support multiple instances in a single group.
    286When too many events are specified in the group some of them will not
    287be measured.
    288
    289Globally pinned events can limit the number of counters available for
    290other groups. On x86 systems, the NMI watchdog pins a counter by default.
    291The nmi watchdog can be disabled as root with
    292
    293	echo 0 > /proc/sys/kernel/nmi_watchdog
    294
    295Events from multiple different PMUs cannot be mixed in a group, with
    296some exceptions for software events.
    297
    298LEADER SAMPLING
    299---------------
    300
    301perf also supports group leader sampling using the :S specifier.
    302
    303  perf record -e '{cycles,instructions}:S' ...
    304  perf report --group
    305
    306Normally all events in an event group sample, but with :S only
    307the first event (the leader) samples, and it only reads the values of the
    308other events in the group.
    309
    310However, in the case AUX area events (e.g. Intel PT or CoreSight), the AUX
    311area event must be the leader, so then the second event samples, not the first.
    312
    313OPTIONS
    314-------
    315
    316Without options all known events will be listed.
    317
    318To limit the list use:
    319
    320. 'hw' or 'hardware' to list hardware events such as cache-misses, etc.
    321
    322. 'sw' or 'software' to list software events such as context switches, etc.
    323
    324. 'cache' or 'hwcache' to list hardware cache events such as L1-dcache-loads, etc.
    325
    326. 'tracepoint' to list all tracepoint events, alternatively use
    327  'subsys_glob:event_glob' to filter by tracepoint subsystems such as sched,
    328  block, etc.
    329
    330. 'pmu' to print the kernel supplied PMU events.
    331
    332. 'sdt' to list all Statically Defined Tracepoint events.
    333
    334. 'metric' to list metrics
    335
    336. 'metricgroup' to list metricgroups with metrics.
    337
    338. If none of the above is matched, it will apply the supplied glob to all
    339  events, printing the ones that match.
    340
    341. As a last resort, it will do a substring search in all event names.
    342
    343One or more types can be used at the same time, listing the events for the
    344types specified.
    345
    346Support raw format:
    347
    348. '--raw-dump', shows the raw-dump of all the events.
    349. '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of
    350  a certain kind of events.
    351
    352SEE ALSO
    353--------
    354linkperf:perf-stat[1], linkperf:perf-top[1],
    355linkperf:perf-record[1],
    356http://www.intel.com/sdm/[IntelĀ® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide],
    357https://bugzilla.kernel.org/show_bug.cgi?id=206537[AMD Processor Programming Reference (PPR)]