cachepc-qemu

Fork of AMDESE/qemu with changes for cachepc side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-qemu
Log | Files | Refs | Submodules | LICENSE | sfeed.txt

ppc-spapr-xive.rst (10761B)


      1XIVE for sPAPR (pseries machines)
      2=================================
      3
      4The POWER9 processor comes with a new interrupt controller
      5architecture, called XIVE as "eXternal Interrupt Virtualization
      6Engine". It supports a larger number of interrupt sources and offers
      7virtualization features which enables the HW to deliver interrupts
      8directly to virtual processors without hypervisor assistance.
      9
     10A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
     11processors can run under two interrupt modes:
     12
     13- *Legacy Compatibility Mode*
     14
     15  the hypervisor provides identical interfaces and similar
     16  functionality to PAPR+ Version 2.7.  This is the default mode
     17
     18  It is also referred as *XICS* in QEMU.
     19
     20- *XIVE native exploitation mode*
     21
     22  the hypervisor provides new interfaces to manage the XIVE control
     23  structures, and provides direct control for interrupt management
     24  through MMIO pages.
     25
     26Which interrupt modes can be used by the machine is negotiated with
     27the guest O/S during the Client Architecture Support negotiation
     28sequence. The two modes are mutually exclusive.
     29
     30Both interrupt mode share the same IRQ number space. See below for the
     31layout.
     32
     33CAS Negotiation
     34---------------
     35
     36QEMU advertises the supported interrupt modes in the device tree
     37property ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS
     38Selection for XIVE is indicated in the ``ibm,architecture-vec-5``
     39property byte 23.
     40
     41The interrupt modes supported by the machine depend on the CPU type
     42(POWER9 is required for XIVE) but also on the machine property
     43``ic-mode`` which can be set on the command line. It can take the
     44following values: ``xics``, ``xive``, and ``dual`` which is the
     45default mode. ``dual`` means that both modes XICS **and** XIVE are
     46supported and if the guest OS supports XIVE, this mode will be
     47selected.
     48
     49The chosen interrupt mode is activated after a reconfiguration done
     50in a machine reset.
     51
     52KVM negotiation
     53---------------
     54
     55When the guest starts under KVM, the capabilities of the host kernel
     56and QEMU are also negotiated. Depending on the version of the host
     57kernel, KVM will advertise the XIVE capability to QEMU or not.
     58
     59Nevertheless, the available interrupt modes in the machine should not
     60depend on the XIVE KVM capability of the host. On older kernels
     61without XIVE KVM support, QEMU will use the emulated XIVE device as a
     62fallback and on newer kernels (>=5.2), the KVM XIVE device.
     63
     64XIVE native exploitation mode is not supported for KVM nested guests,
     65VMs running under a L1 hypervisor (KVM on pSeries). In that case, the
     66hypervisor will not advertise the KVM capability and QEMU will use the
     67emulated XIVE device, same as for older versions of KVM.
     68
     69As a final refinement, the user can also switch the use of the KVM
     70device with the machine option ``kernel_irqchip``.
     71
     72
     73XIVE support in KVM
     74~~~~~~~~~~~~~~~~~~~
     75
     76For guest OSes supporting XIVE, the resulting interrupt modes on host
     77kernels with XIVE KVM support are the following:
     78
     79==============  =============  =============  ================
     80ic-mode                            kernel_irqchip
     81--------------  ----------------------------------------------
     82/               allowed        off            on
     83                (default)
     84==============  =============  =============  ================
     85dual (default)  XIVE KVM       XIVE emul.     XIVE KVM
     86xive            XIVE KVM       XIVE emul.     XIVE KVM
     87xics            XICS KVM       XICS emul.     XICS KVM
     88==============  =============  =============  ================
     89
     90For legacy guest OSes without XIVE support, the resulting interrupt
     91modes are the following:
     92
     93==============  =============  =============  ================
     94ic-mode                            kernel_irqchip
     95--------------  ----------------------------------------------
     96/               allowed        off            on
     97                (default)
     98==============  =============  =============  ================
     99dual (default)  XICS KVM       XICS emul.     XICS KVM
    100xive            QEMU error(3)  QEMU error(3)  QEMU error(3)
    101xics            XICS KVM       XICS emul.     XICS KVM
    102==============  =============  =============  ================
    103
    104(3) QEMU fails at CAS with ``Guest requested unavailable interrupt
    105    mode (XICS), either don't set the ic-mode machine property or try
    106    ic-mode=xics or ic-mode=dual``
    107
    108
    109No XIVE support in KVM
    110~~~~~~~~~~~~~~~~~~~~~~
    111
    112For guest OSes supporting XIVE, the resulting interrupt modes on host
    113kernels without XIVE KVM support are the following:
    114
    115==============  =============  =============  ================
    116ic-mode                            kernel_irqchip
    117--------------  ----------------------------------------------
    118/               allowed        off            on
    119                (default)
    120==============  =============  =============  ================
    121dual (default)  XIVE emul.(1)  XIVE emul.     QEMU error (2)
    122xive            XIVE emul.(1)  XIVE emul.     QEMU error (2)
    123xics            XICS KVM       XICS emul.     XICS KVM
    124==============  =============  =============  ================
    125
    126
    127(1) QEMU warns with ``warning: kernel_irqchip requested but unavailable:
    128    IRQ_XIVE capability must be present for KVM``
    129    In some cases (old host kernels or KVM nested guests), one may hit a
    130    QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
    131    with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on``
    132(2) QEMU fails with ``kernel_irqchip requested but unavailable:
    133    IRQ_XIVE capability must be present for KVM``
    134
    135
    136For legacy guest OSes without XIVE support, the resulting interrupt
    137modes are the following:
    138
    139==============  =============  =============  ================
    140ic-mode                            kernel_irqchip
    141--------------  ----------------------------------------------
    142/               allowed        off            on
    143                (default)
    144==============  =============  =============  ================
    145dual (default)  QEMU error(4)  XICS emul.     QEMU error(4)
    146xive            QEMU error(3)  QEMU error(3)  QEMU error(3)
    147xics            XICS KVM       XICS emul.     XICS KVM
    148==============  =============  =============  ================
    149
    150(3) QEMU fails at CAS with ``Guest requested unavailable interrupt
    151    mode (XICS), either don't set the ic-mode machine property or try
    152    ic-mode=xics or ic-mode=dual``
    153(4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
    154    with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on``
    155
    156
    157XIVE Device tree properties
    158---------------------------
    159
    160The properties for the PAPR interrupt controller node when the *XIVE
    161native exploitation mode* is selected should contain:
    162
    163- ``device_type``
    164
    165  value should be "power-ivpe".
    166
    167- ``compatible``
    168
    169  value should be "ibm,power-ivpe".
    170
    171- ``reg``
    172
    173  contains the base address and size of the thread interrupt
    174  managnement areas (TIMA), for the User level and for the Guest OS
    175  level. Only the Guest OS level is taken into account today.
    176
    177- ``ibm,xive-eq-sizes``
    178
    179  the size of the event queues. One cell per size supported, contains
    180  log2 of size, in ascending order.
    181
    182- ``ibm,xive-lisn-ranges``
    183
    184  the IRQ interrupt number ranges assigned to the guest for the IPIs.
    185
    186The root node also exports :
    187
    188- ``ibm,plat-res-int-priorities``
    189
    190  contains a list of priorities that the hypervisor has reserved for
    191  its own use.
    192
    193IRQ number space
    194----------------
    195
    196IRQ Number space of the ``pseries`` machine is 8K wide and is the same
    197for both interrupt mode. The different ranges are defined as follow :
    198
    199- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
    200- ``0x1000 .. 0x1000`` 1 EPOW
    201- ``0x1001 .. 0x1001`` 1 HOTPLUG
    202- ``0x1002 .. 0x10FF`` unused
    203- ``0x1100 .. 0x11FF`` 256 VIO devices
    204- ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices
    205- ``0x1280 .. 0x12FF`` unused
    206- ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated)
    207
    208Monitoring XIVE
    209---------------
    210
    211The state of the XIVE interrupt controller can be queried through the
    212monitor commands ``info pic``. The output comes in two parts.
    213
    214First, the state of the thread interrupt context registers is dumped
    215for each CPU :
    216
    217::
    218
    219   (qemu) info pic
    220   CPU[0000]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
    221   CPU[0000]: USER    00   00  00    00   00  00  00   00  00000000
    222   CPU[0000]:   OS    00   ff  00    00   ff  00  ff   ff  80000400
    223   CPU[0000]: POOL    00   00  00    00   00  00  00   00  00000000
    224   CPU[0000]: PHYS    00   00  00    00   00  00  00   ff  00000000
    225   ...
    226
    227In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
    228the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
    229line which is set to the VP identifier.
    230
    231Then comes the routing information which aggregates the EAS and the
    232END configuration:
    233
    234::
    235
    236   ...
    237   LISN         PQ    EISN     CPU/PRIO EQ
    238   00000000 MSI --    00000010   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
    239   00000001 MSI --    00000010   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
    240   00000002 MSI --    00000010   2/6    220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
    241   00000003 MSI --    00000010   3/6    201/16384 @1fc390000 ^1 [ 80000010 ... ]
    242   00000004 MSI -Q  M 00000000
    243   00000005 MSI -Q  M 00000000
    244   00000006 MSI -Q  M 00000000
    245   00000007 MSI -Q  M 00000000
    246   00001000 MSI --    00000012   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
    247   00001001 MSI --    00000013   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
    248   00001100 MSI --    00000100   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
    249   00001101 MSI -Q  M 00000000
    250   00001200 LSI -Q  M 00000000
    251   00001201 LSI -Q  M 00000000
    252   00001202 LSI -Q  M 00000000
    253   00001203 LSI -Q  M 00000000
    254   00001300 MSI --    00000102   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
    255   00001301 MSI --    00000103   2/6    220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
    256   00001302 MSI --    00000104   3/6    201/16384 @1fc390000 ^1 [ 80000010 ... ]
    257
    258The source information and configuration:
    259
    260- The ``LISN`` column outputs the interrupt number of the source in
    261  range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
    262- The ``PQ`` column reflects the state of the PQ bits of the source :
    263
    264  - ``--`` source is ready to take events
    265  - ``P-`` an event was sent and an EOI is PENDING
    266  - ``PQ`` an event was QUEUED
    267  - ``-Q`` source is OFF
    268
    269  a ``M`` indicates that source is *MASKED* at the EAS level,
    270
    271The targeting configuration :
    272
    273- The ``EISN`` column is the event data that will be queued in the event
    274  queue of the O/S.
    275- The ``CPU/PRIO`` column is the tuple defining the CPU number and
    276  priority queue serving the source.
    277- The ``EQ`` column outputs :
    278
    279  - the current index of the event queue/ the max number of entries
    280  - the O/S event queue address
    281  - the toggle bit
    282  - the last entries that were pushed in the event queue.