cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

amd-pstate.rst (21520B)


      1.. SPDX-License-Identifier: GPL-2.0
      2.. include:: <isonum.txt>
      3
      4===============================================
      5``amd-pstate`` CPU Performance Scaling Driver
      6===============================================
      7
      8:Copyright: |copy| 2021 Advanced Micro Devices, Inc.
      9
     10:Author: Huang Rui <ray.huang@amd.com>
     11
     12
     13Introduction
     14===================
     15
     16``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
     17new CPU frequency control mechanism on modern AMD APU and CPU series in
     18Linux kernel. The new mechanism is based on Collaborative Processor
     19Performance Control (CPPC) which provides finer grain frequency management
     20than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
     21the ACPI P-states driver to manage CPU frequency and clocks with switching
     22only in 3 P-states. CPPC replaces the ACPI P-states controls and allows a
     23flexible, low-latency interface for the Linux kernel to directly
     24communicate the performance hints to hardware.
     25
     26``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
     27``ondemand``, etc. to manage the performance hints which are provided by
     28CPPC hardware functionality that internally follows the hardware
     29specification (for details refer to AMD64 Architecture Programmer's Manual
     30Volume 2: System Programming [1]_). Currently, ``amd-pstate`` supports basic
     31frequency control function according to kernel governors on some of the
     32Zen2 and Zen3 processors, and we will implement more AMD specific functions
     33in future after we verify them on the hardware and SBIOS.
     34
     35
     36AMD CPPC Overview
     37=======================
     38
     39Collaborative Processor Performance Control (CPPC) interface enumerates a
     40continuous, abstract, and unit-less performance value in a scale that is
     41not tied to a specific performance state / frequency. This is an ACPI
     42standard [2]_ which software can specify application performance goals and
     43hints as a relative target to the infrastructure limits. AMD processors
     44provide the low latency register model (MSR) instead of an AML code
     45interpreter for performance adjustments. ``amd-pstate`` will initialize a
     46``struct cpufreq_driver`` instance, ``amd_pstate_driver``, with the callbacks
     47to manage each performance update behavior. ::
     48
     49 Highest Perf ------>+-----------------------+                         +-----------------------+
     50                     |                       |                         |                       |
     51                     |                       |                         |                       |
     52                     |                       |          Max Perf  ---->|                       |
     53                     |                       |                         |                       |
     54                     |                       |                         |                       |
     55 Nominal Perf ------>+-----------------------+                         +-----------------------+
     56                     |                       |                         |                       |
     57                     |                       |                         |                       |
     58                     |                       |                         |                       |
     59                     |                       |                         |                       |
     60                     |                       |                         |                       |
     61                     |                       |                         |                       |
     62                     |                       |      Desired Perf  ---->|                       |
     63                     |                       |                         |                       |
     64                     |                       |                         |                       |
     65                     |                       |                         |                       |
     66                     |                       |                         |                       |
     67                     |                       |                         |                       |
     68                     |                       |                         |                       |
     69                     |                       |                         |                       |
     70                     |                       |                         |                       |
     71                     |                       |                         |                       |
     72  Lowest non-        |                       |                         |                       |
     73  linear perf ------>+-----------------------+                         +-----------------------+
     74                     |                       |                         |                       |
     75                     |                       |       Lowest perf  ---->|                       |
     76                     |                       |                         |                       |
     77  Lowest perf ------>+-----------------------+                         +-----------------------+
     78                     |                       |                         |                       |
     79                     |                       |                         |                       |
     80                     |                       |                         |                       |
     81          0   ------>+-----------------------+                         +-----------------------+
     82
     83                                     AMD P-States Performance Scale
     84
     85
     86.. _perf_cap:
     87
     88AMD CPPC Performance Capability
     89--------------------------------
     90
     91Highest Performance (RO)
     92.........................
     93
     94This is the absolute maximum performance an individual processor may reach,
     95assuming ideal conditions. This performance level may not be sustainable
     96for long durations and may only be achievable if other platform components
     97are in a specific state; for example, it may require other processors to be in
     98an idle state. This would be equivalent to the highest frequencies
     99supported by the processor.
    100
    101Nominal (Guaranteed) Performance (RO)
    102......................................
    103
    104This is the maximum sustained performance level of the processor, assuming
    105ideal operating conditions. In the absence of an external constraint (power,
    106thermal, etc.), this is the performance level the processor is expected to
    107be able to maintain continuously. All cores/processors are expected to be
    108able to sustain their nominal performance state simultaneously.
    109
    110Lowest non-linear Performance (RO)
    111...................................
    112
    113This is the lowest performance level at which nonlinear power savings are
    114achieved, for example, due to the combined effects of voltage and frequency
    115scaling. Above this threshold, lower performance levels should be generally
    116more energy efficient than higher performance levels. This register
    117effectively conveys the most efficient performance level to ``amd-pstate``.
    118
    119Lowest Performance (RO)
    120........................
    121
    122This is the absolute lowest performance level of the processor. Selecting a
    123performance level lower than the lowest nonlinear performance level may
    124cause an efficiency penalty but should reduce the instantaneous power
    125consumption of the processor.
    126
    127AMD CPPC Performance Control
    128------------------------------
    129
    130``amd-pstate`` passes performance goals through these registers. The
    131register drives the behavior of the desired performance target.
    132
    133Minimum requested performance (RW)
    134...................................
    135
    136``amd-pstate`` specifies the minimum allowed performance level.
    137
    138Maximum requested performance (RW)
    139...................................
    140
    141``amd-pstate`` specifies a limit the maximum performance that is expected
    142to be supplied by the hardware.
    143
    144Desired performance target (RW)
    145...................................
    146
    147``amd-pstate`` specifies a desired target in the CPPC performance scale as
    148a relative number. This can be expressed as percentage of nominal
    149performance (infrastructure max). Below the nominal sustained performance
    150level, desired performance expresses the average performance level of the
    151processor subject to hardware. Above the nominal performance level,
    152the processor must provide at least nominal performance requested and go higher
    153if current operating conditions allow.
    154
    155Energy Performance Preference (EPP) (RW)
    156.........................................
    157
    158This attribute provides a hint to the hardware if software wants to bias
    159toward performance (0x0) or energy efficiency (0xff).
    160
    161
    162Key Governors Support
    163=======================
    164
    165``amd-pstate`` can be used with all the (generic) scaling governors listed
    166by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
    167it is responsible for the configuration of policy objects corresponding to
    168CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
    169to the policy objects) with accurate information on the maximum and minimum
    170operating frequencies supported by the hardware. Users can check the
    171``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
    172
    173``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
    174frequency control. It is to fine tune the processor configuration on
    175``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
    176registers the adjust_perf callback to implement performance update behavior
    177similar to CPPC. It is initialized by ``sugov_start`` and then populates the
    178CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as the
    179utilization update callback function in the CPU scheduler. The CPU scheduler
    180will call ``cpufreq_update_util`` and assigns the target performance according
    181to the ``struct sugov_cpu`` that the utilization update belongs to.
    182Then, ``amd-pstate`` updates the desired performance according to the CPU
    183scheduler assigned.
    184
    185
    186Processor Support
    187=======================
    188
    189The ``amd-pstate`` initialization will fail if the ``_CPC`` entry in the ACPI
    190SBIOS does not exist in the detected processor. It uses ``acpi_cpc_valid``
    191to check the existence of ``_CPC``. All Zen based processors support the legacy
    192ACPI hardware P-States function, so when ``amd-pstate`` fails initialization,
    193the kernel will fall back to initialize the ``acpi-cpufreq`` driver.
    194
    195There are two types of hardware implementations for ``amd-pstate``: one is
    196`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
    197<perf_cap_>`_. It can use the :c:macro:`X86_FEATURE_CPPC` feature flag to
    198indicate the different types. (For details, refer to the Processor Programming
    199Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors [3]_.)
    200``amd-pstate`` is to register different ``static_call`` instances for different
    201hardware implementations.
    202
    203Currently, some of the Zen2 and Zen3 processors support ``amd-pstate``. In the
    204future, it will be supported on more and more AMD processors.
    205
    206Full MSR Support
    207-----------------
    208
    209Some new Zen3 processors such as Cezanne provide the MSR registers directly
    210while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set.
    211``amd-pstate`` can handle the MSR register to implement the fast switch
    212function in ``CPUFreq`` that can reduce the latency of frequency control in
    213interrupt context. The functions with a ``pstate_xxx`` prefix represent the
    214operations on MSR registers.
    215
    216Shared Memory Support
    217----------------------
    218
    219If the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, the
    220processor supports the shared memory solution. In this case, ``amd-pstate``
    221uses the ``cppc_acpi`` helper methods to implement the callback functions
    222that are defined on ``static_call``. The functions with the ``cppc_xxx`` prefix
    223represent the operations of ACPI CPPC helpers for the shared memory solution.
    224
    225
    226AMD P-States and ACPI hardware P-States always can be supported in one
    227processor. But AMD P-States has the higher priority and if it is enabled
    228with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
    229to the request from AMD P-States.
    230
    231
    232User Space Interface in ``sysfs``
    233==================================
    234
    235``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
    236control its functionality at the system level. They are located in the
    237``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
    238
    239 root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
    240 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
    241 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
    242 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
    243
    244
    245``amd_pstate_highest_perf / amd_pstate_max_freq``
    246
    247Maximum CPPC performance and CPU frequency that the driver is allowed to
    248set, in percent of the maximum supported CPPC performance level (the highest
    249performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
    250In some ASICs, the highest CPPC performance is not the one in the ``_CPC``
    251table, so we need to expose it to sysfs. If boost is not active, but
    252still supported, this maximum frequency will be larger than the one in
    253``cpuinfo``.
    254This attribute is read-only.
    255
    256``amd_pstate_lowest_nonlinear_freq``
    257
    258The lowest non-linear CPPC CPU frequency that the driver is allowed to set,
    259in percent of the maximum supported CPPC performance level. (Please see the
    260lowest non-linear performance in `AMD CPPC Performance Capability
    261<perf_cap_>`_.)
    262This attribute is read-only.
    263
    264Other performance and frequency values can be read back from
    265``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.
    266
    267
    268``amd-pstate`` vs ``acpi-cpufreq``
    269======================================
    270
    271On the majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
    272provided by the platform firmware are used for CPU performance scaling, but
    273only provide 3 P-states on AMD processors.
    274However, on modern AMD APU and CPU series, hardware provides the Collaborative
    275Processor Performance Control according to the ACPI protocol and customizes this
    276for AMD platforms. That is, fine-grained and continuous frequency ranges
    277instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
    278module which supports the new AMD P-States mechanism on most of the future AMD
    279platforms. The AMD P-States mechanism is the more performance and energy
    280efficiency frequency management method on AMD processors.
    281
    282Kernel Module Options for ``amd-pstate``
    283=========================================
    284
    285``shared_mem``
    286Use a module param (shared_mem) to enable related processors manually with
    287**amd_pstate.shared_mem=1**.
    288Due to the performance issue on the processors with `Shared Memory Support
    289<perf_cap_>`_, we disable it presently and will re-enable this by default
    290once we address performance issue with this solution.
    291
    292To check whether the current processor is using `Full MSR Support <perf_cap_>`_
    293or `Shared Memory Support <perf_cap_>`_ : ::
    294
    295  ray@hr-test1:~$ lscpu | grep cppc
    296  Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
    297
    298If the CPU flags have ``cppc``, then this processor supports `Full MSR Support
    299<perf_cap_>`_. Otherwise, it supports `Shared Memory Support <perf_cap_>`_.
    300
    301
    302``cpupower`` tool support for ``amd-pstate``
    303===============================================
    304
    305``amd-pstate`` is supported by the ``cpupower`` tool, which can be used to dump
    306frequency information. Development is in progress to support more and more
    307operations for the new ``amd-pstate`` module with this tool. ::
    308
    309 root@hr-test1:/home/ray# cpupower frequency-info
    310 analyzing CPU 0:
    311   driver: amd-pstate
    312   CPUs which run at the same hardware frequency: 0
    313   CPUs which need to have their frequency coordinated by software: 0
    314   maximum transition latency: 131 us
    315   hardware limits: 400 MHz - 4.68 GHz
    316   available cpufreq governors: ondemand conservative powersave userspace performance schedutil
    317   current policy: frequency should be within 400 MHz and 4.68 GHz.
    318                   The governor "schedutil" may decide which speed to use
    319                   within this range.
    320   current CPU frequency: Unable to call hardware
    321   current CPU frequency: 4.02 GHz (asserted by call to kernel)
    322   boost state support:
    323     Supported: yes
    324     Active: yes
    325     AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
    326     AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
    327     AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
    328     AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
    329
    330
    331Diagnostics and Tuning
    332=======================
    333
    334Trace Events
    335--------------
    336
    337There are two static trace events that can be used for ``amd-pstate``
    338diagnostics. One of them is the ``cpu_frequency`` trace event generally used
    339by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
    340specific to ``amd-pstate``.  The following sequence of shell commands can
    341be used to enable them and see their output (if the kernel is
    342configured to support event tracing). ::
    343
    344 root@hr-test1:/home/ray# cd /sys/kernel/tracing/
    345 root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
    346 root@hr-test1:/sys/kernel/tracing# cat trace
    347 # tracer: nop
    348 #
    349 # entries-in-buffer/entries-written: 47827/42233061   #P:2
    350 #
    351 #                                _-----=> irqs-off
    352 #                               / _----=> need-resched
    353 #                              | / _---=> hardirq/softirq
    354 #                              || / _--=> preempt-depth
    355 #                              ||| /     delay
    356 #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
    357 #              | |         |   ||||      |         |
    358          <idle>-0       [015] dN...  4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true
    359          <idle>-0       [007] d.h..  4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
    360             cat-2161    [000] d....  4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true
    361            sshd-2125    [004] d.s..  4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true
    362          <idle>-0       [007] d.s..  4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
    363          <idle>-0       [003] d.s..  4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true
    364          <idle>-0       [011] d.s..  4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true
    365
    366The ``cpu_frequency`` trace event will be triggered either by the ``schedutil`` scaling
    367governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
    368policies with other scaling governors).
    369
    370
    371Tracer Tool
    372-------------
    373
    374``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
    375generate performance plots. This utility can be used to debug and tune the
    376performance of ``amd-pstate`` driver. The tracer tool needs to import intel
    377pstate tracer.
    378
    379Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
    380used in two ways. If trace file is available, then directly parse the file
    381with command ::
    382
    383 ./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>
    384
    385Or generate trace file with root privilege, then parse and plot with command ::
    386
    387 sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]
    388
    389The test result can be found in ``results/test_name``. Following is the example
    390about part of the output. ::
    391
    392 common_cpu  common_secs  common_usecs  min_perf  des_perf  max_perf  freq    mperf   apef    tsc       load   duration_ms  sample_num  elapsed_time  common_comm
    393 CPU_005     712          116384        39        49        166       0.7565  9645075 2214891 38431470  25.1   11.646       469         2.496         kworker/5:0-40
    394 CPU_006     712          116408        39        49        166       0.6769  8950227 1839034 37192089  24.06  11.272       470         2.496         kworker/6:0-1264
    395
    396
    397Reference
    398===========
    399
    400.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
    401       https://www.amd.com/system/files/TechDocs/24593.pdf
    402
    403.. [2] Advanced Configuration and Power Interface Specification,
    404       https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
    405
    406.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors
    407       https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip