vcpu-requests.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
vcpu-requests.rst (13936B)
      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=================
      4KVM VCPU Requests
      5=================
      6
      7Overview
      8========
      9
     10KVM supports an internal API enabling threads to request a VCPU thread to
     11perform some activity.  For example, a thread may request a VCPU to flush
     12its TLB with a VCPU request.  The API consists of the following functions::
     13
     14  /* Check if any requests are pending for VCPU @vcpu. */
     15  bool kvm_request_pending(struct kvm_vcpu *vcpu);
     16
     17  /* Check if VCPU @vcpu has request @req pending. */
     18  bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
     19
     20  /* Clear request @req for VCPU @vcpu. */
     21  void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
     22
     23  /*
     24   * Check if VCPU @vcpu has request @req pending. When the request is
     25   * pending it will be cleared and a memory barrier, which pairs with
     26   * another in kvm_make_request(), will be issued.
     27   */
     28  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
     29
     30  /*
     31   * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
     32   * with another in kvm_check_request(), prior to setting the request.
     33   */
     34  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
     35
     36  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
     37  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
     38
     39Typically a requester wants the VCPU to perform the activity as soon
     40as possible after making the request.  This means most requests
     41(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
     42and kvm_make_all_cpus_request() has the kicking of all VCPUs built
     43into it.
     44
     45VCPU Kicks
     46----------
     47
     48The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
     49order to perform some KVM maintenance.  To do so, an IPI is sent, forcing
     50a guest mode exit.  However, a VCPU thread may not be in guest mode at the
     51time of the kick.  Therefore, depending on the mode and state of the VCPU
     52thread, there are two other actions a kick may take.  All three actions
     53are listed below:
     54
     551) Send an IPI.  This forces a guest mode exit.
     562) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
     57   mode that wait on waitqueues.  Waking them removes the threads from
     58   the waitqueues, allowing the threads to run again.  This behavior
     59   may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
     603) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
     61   sleeping, then there is nothing to do.
     62
     63VCPU Mode
     64---------
     65
     66VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
     67guest is running in guest mode or not, as well as some specific
     68outside guest mode states.  The architecture may use ``vcpu->mode`` to
     69ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
     70as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
     71even to ensure IPI acknowledgements are waited upon (see "Waiting for
     72Acknowledgements").  The following modes are defined:
     73
     74OUTSIDE_GUEST_MODE
     75
     76  The VCPU thread is outside guest mode.
     77
     78IN_GUEST_MODE
     79
     80  The VCPU thread is in guest mode.
     81
     82EXITING_GUEST_MODE
     83
     84  The VCPU thread is transitioning from IN_GUEST_MODE to
     85  OUTSIDE_GUEST_MODE.
     86
     87READING_SHADOW_PAGE_TABLES
     88
     89  The VCPU thread is outside guest mode, but it wants the sender of
     90  certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
     91  thread is done reading the page tables.
     92
     93VCPU Request Internals
     94======================
     95
     96VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
     97This means general bitops, like those documented in [atomic-ops]_ could
     98also be used, e.g. ::
     99
    100  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
    101
    102However, VCPU request users should refrain from doing so, as it would
    103break the abstraction.  The first 8 bits are reserved for architecture
    104independent requests, all additional bits are available for architecture
    105dependent requests.
    106
    107Architecture Independent Requests
    108---------------------------------
    109
    110KVM_REQ_TLB_FLUSH
    111
    112  KVM's common MMU notifier may need to flush all of a guest's TLB
    113  entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
    114  choose to use the common kvm_flush_remote_tlbs() implementation will
    115  need to handle this VCPU request.
    116
    117KVM_REQ_VM_DEAD
    118
    119  This request informs all VCPUs that the VM is dead and unusable, e.g. due to
    120  fatal error or because the VM's state has been intentionally destroyed.
    121
    122KVM_REQ_UNBLOCK
    123
    124  This request informs the vCPU to exit kvm_vcpu_block.  It is used for
    125  example from timer handlers that run on the host on behalf of a vCPU,
    126  or in order to update the interrupt routing and ensure that assigned
    127  devices will wake up the vCPU.
    128
    129KVM_REQ_UNHALT
    130
    131  This request may be made from the KVM common function kvm_vcpu_block(),
    132  which is used to emulate an instruction that causes a CPU to halt until
    133  one of an architectural specific set of events and/or interrupts is
    134  received (determined by checking kvm_arch_vcpu_runnable()).  When that
    135  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
    136  in contrast to when kvm_vcpu_block() returns due to any other reason,
    137  such as a pending signal, which does not indicate the VCPU's halt
    138  emulation should stop, and therefore does not make the request.
    139
    140KVM_REQ_OUTSIDE_GUEST_MODE
    141
    142  This "request" ensures the target vCPU has exited guest mode prior to the
    143  sender of the request continuing on.  No action needs be taken by the target,
    144  and so no request is actually logged for the target.  This request is similar
    145  to a "kick", but unlike a kick it guarantees the vCPU has actually exited
    146  guest mode.  A kick only guarantees the vCPU will exit at some point in the
    147  future, e.g. a previous kick may have started the process, but there's no
    148  guarantee the to-be-kicked vCPU has fully exited guest mode.
    149
    150KVM_REQUEST_MASK
    151----------------
    152
    153VCPU requests should be masked by KVM_REQUEST_MASK before using them with
    154bitops.  This is because only the lower 8 bits are used to represent the
    155request's number.  The upper bits are used as flags.  Currently only two
    156flags are defined.
    157
    158VCPU Request Flags
    159------------------
    160
    161KVM_REQUEST_NO_WAKEUP
    162
    163  This flag is applied to requests that only need immediate attention
    164  from VCPUs running in guest mode.  That is, sleeping VCPUs do not need
    165  to be awaken for these requests.  Sleeping VCPUs will handle the
    166  requests when they are awaken later for some other reason.
    167
    168KVM_REQUEST_WAIT
    169
    170  When requests with this flag are made with kvm_make_all_cpus_request(),
    171  then the caller will wait for each VCPU to acknowledge its IPI before
    172  proceeding.  This flag only applies to VCPUs that would receive IPIs.
    173  If, for example, the VCPU is sleeping, so no IPI is necessary, then
    174  the requesting thread does not wait.  This means that this flag may be
    175  safely combined with KVM_REQUEST_NO_WAKEUP.  See "Waiting for
    176  Acknowledgements" for more information about requests with
    177  KVM_REQUEST_WAIT.
    178
    179VCPU Requests with Associated State
    180===================================
    181
    182Requesters that want the receiving VCPU to handle new state need to ensure
    183the newly written state is observable to the receiving VCPU thread's CPU
    184by the time it observes the request.  This means a write memory barrier
    185must be inserted after writing the new state and before setting the VCPU
    186request bit.  Additionally, on the receiving VCPU thread's side, a
    187corresponding read barrier must be inserted after reading the request bit
    188and before proceeding to read the new state associated with it.  See
    189scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
    190[memory-barriers]_.
    191
    192The pair of functions, kvm_check_request() and kvm_make_request(), provide
    193the memory barriers, allowing this requirement to be handled internally by
    194the API.
    195
    196Ensuring Requests Are Seen
    197==========================
    198
    199When making requests to VCPUs, we want to avoid the receiving VCPU
    200executing in guest mode for an arbitrary long time without handling the
    201request.  We can be sure this won't happen as long as we ensure the VCPU
    202thread checks kvm_request_pending() before entering guest mode and that a
    203kick will send an IPI to force an exit from guest mode when necessary.
    204Extra care must be taken to cover the period after the VCPU thread's last
    205kvm_request_pending() check and before it has entered guest mode, as kick
    206IPIs will only trigger guest mode exits for VCPU threads that are in guest
    207mode or at least have already disabled interrupts in order to prepare to
    208enter guest mode.  This means that an optimized implementation (see "IPI
    209Reduction") must be certain when it's safe to not send the IPI.  One
    210solution, which all architectures except s390 apply, is to:
    211
    212- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
    213  the last kvm_request_pending() check;
    214- enable interrupts atomically when entering the guest.
    215
    216This solution also requires memory barriers to be placed carefully in both
    217the requesting thread and the receiving VCPU.  With the memory barriers we
    218can exclude the possibility of a VCPU thread observing
    219!kvm_request_pending() on its last check and then not receiving an IPI for
    220the next request made of it, even if the request is made immediately after
    221the check.  This is done by way of the Dekker memory barrier pattern
    222(scenario 10 of [lwn-mb]_).  As the Dekker pattern requires two variables,
    223this solution pairs ``vcpu->mode`` with ``vcpu->requests``.  Substituting
    224them into the pattern gives::
    225
    226  CPU1                                    CPU2
    227  =================                       =================
    228  local_irq_disable();
    229  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
    230  smp_mb();                               smp_mb();
    231  if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
    232                                              IN_GUEST_MODE) {
    233      ...abort guest entry...                 ...send IPI...
    234  }                                       }
    235
    236As stated above, the IPI is only useful for VCPU threads in guest mode or
    237that have already disabled interrupts.  This is why this specific case of
    238the Dekker pattern has been extended to disable interrupts before setting
    239``vcpu->mode`` to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
    240pedantically implement the memory barrier pattern, guaranteeing the
    241compiler doesn't interfere with ``vcpu->mode``'s carefully planned
    242accesses.
    243
    244IPI Reduction
    245-------------
    246
    247As only one IPI is needed to get a VCPU to check for any/all requests,
    248then they may be coalesced.  This is easily done by having the first IPI
    249sending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
    250transitional state, EXITING_GUEST_MODE, is used for this purpose.
    251
    252Waiting for Acknowledgements
    253----------------------------
    254
    255Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
    256be sent, and the acknowledgements to be waited upon, even when the target
    257VCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
    258is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
    259is set after disabling interrupts.  To support these cases, the
    260KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
    261checking that the VCPU is IN_GUEST_MODE to checking that it is not
    262OUTSIDE_GUEST_MODE.
    263
    264Request-less VCPU Kicks
    265-----------------------
    266
    267As the determination of whether or not to send an IPI depends on the
    268two-variable Dekker memory barrier pattern, then it's clear that
    269request-less VCPU kicks are almost never correct.  Without the assurance
    270that a non-IPI generating kick will still result in an action by the
    271receiving VCPU, as the final kvm_request_pending() check does for
    272request-accompanying kicks, then the kick may not do anything useful at
    273all.  If, for instance, a request-less kick was made to a VCPU that was
    274just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
    275the VCPU thread may continue its entry without actually having done
    276whatever it was the kick was meant to initiate.
    277
    278One exception is x86's posted interrupt mechanism.  In this case, however,
    279even the request-less VCPU kick is coupled with the same
    280local_irq_disable() + smp_mb() pattern described above; the ON bit
    281(Outstanding Notification) in the posted interrupt descriptor takes the
    282role of ``vcpu->requests``.  When sending a posted interrupt, PIR.ON is
    283set before reading ``vcpu->mode``; dually, in the VCPU thread,
    284vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
    285IN_GUEST_MODE.
    286
    287Additional Considerations
    288=========================
    289
    290Sleeping VCPUs
    291--------------
    292
    293VCPU threads may need to consider requests before and/or after calling
    294functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
    295do or not, and, if they do, which requests need consideration, is
    296architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
    297to check if it should awaken.  One reason to do so is to provide
    298architectures a function where requests may be checked if necessary.
    299
    300Clearing Requests
    301-----------------
    302
    303Generally it only makes sense for the receiving VCPU thread to clear a
    304request.  However, in some circumstances, such as when the requesting
    305thread and the receiving VCPU thread are executed serially, such as when
    306they are the same thread, or when they are using some form of concurrency
    307control to temporarily execute synchronously, then it's possible to know
    308that the request may be cleared immediately, rather than waiting for the
    309receiving VCPU thread to handle the request in VCPU RUN.  The only current
    310examples of this are kvm_vcpu_block() calls made by VCPUs to block
    311themselves.  A possible side-effect of that call is to make the
    312KVM_REQ_UNHALT request, which may then be cleared immediately when the
    313VCPU returns from the call.
    314
    315References
    316==========
    317
    318.. [atomic-ops] Documentation/atomic_bitops.txt and Documentation/atomic_t.txt
    319.. [memory-barriers] Documentation/memory-barriers.txt
    320.. [lwn-mb] https://lwn.net/Articles/573436/