mds.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
mds.rst (8694B)
      1Microarchitectural Data Sampling (MDS) mitigation
      2=================================================
      3
      4.. _mds:
      5
      6Overview
      7--------
      8
      9Microarchitectural Data Sampling (MDS) is a family of side channel attacks
     10on internal buffers in Intel CPUs. The variants are:
     11
     12 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
     13 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
     14 - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
     15 - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
     16
     17MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
     18dependent load (store-to-load forwarding) as an optimization. The forward
     19can also happen to a faulting or assisting load operation for a different
     20memory address, which can be exploited under certain conditions. Store
     21buffers are partitioned between Hyper-Threads so cross thread forwarding is
     22not possible. But if a thread enters or exits a sleep state the store
     23buffer is repartitioned which can expose data from one thread to the other.
     24
     25MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
     26L1 miss situations and to hold data which is returned or sent in response
     27to a memory or I/O operation. Fill buffers can forward data to a load
     28operation and also write data to the cache. When the fill buffer is
     29deallocated it can retain the stale data of the preceding operations which
     30can then be forwarded to a faulting or assisting load operation, which can
     31be exploited under certain conditions. Fill buffers are shared between
     32Hyper-Threads so cross thread leakage is possible.
     33
     34MLPDS leaks Load Port Data. Load ports are used to perform load operations
     35from memory or I/O. The received data is then forwarded to the register
     36file or a subsequent operation. In some implementations the Load Port can
     37contain stale data from a previous operation which can be forwarded to
     38faulting or assisting loads under certain conditions, which again can be
     39exploited eventually. Load ports are shared between Hyper-Threads so cross
     40thread leakage is possible.
     41
     42MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
     43memory that takes a fault or assist can leave data in a microarchitectural
     44structure that may later be observed using one of the same methods used by
     45MSBDS, MFBDS or MLPDS.
     46
     47Exposure assumptions
     48--------------------
     49
     50It is assumed that attack code resides in user space or in a guest with one
     51exception. The rationale behind this assumption is that the code construct
     52needed for exploiting MDS requires:
     53
     54 - to control the load to trigger a fault or assist
     55
     56 - to have a disclosure gadget which exposes the speculatively accessed
     57   data for consumption through a side channel.
     58
     59 - to control the pointer through which the disclosure gadget exposes the
     60   data
     61
     62The existence of such a construct in the kernel cannot be excluded with
     63100% certainty, but the complexity involved makes it extremly unlikely.
     64
     65There is one exception, which is untrusted BPF. The functionality of
     66untrusted BPF is limited, but it needs to be thoroughly investigated
     67whether it can be used to create such a construct.
     68
     69
     70Mitigation strategy
     71-------------------
     72
     73All variants have the same mitigation strategy at least for the single CPU
     74thread case (SMT off): Force the CPU to clear the affected buffers.
     75
     76This is achieved by using the otherwise unused and obsolete VERW
     77instruction in combination with a microcode update. The microcode clears
     78the affected CPU buffers when the VERW instruction is executed.
     79
     80For virtualization there are two ways to achieve CPU buffer
     81clearing. Either the modified VERW instruction or via the L1D Flush
     82command. The latter is issued when L1TF mitigation is enabled so the extra
     83VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
     84be issued.
     85
     86If the VERW instruction with the supplied segment selector argument is
     87executed on a CPU without the microcode update there is no side effect
     88other than a small number of pointlessly wasted CPU cycles.
     89
     90This does not protect against cross Hyper-Thread attacks except for MSBDS
     91which is only exploitable cross Hyper-thread when one of the Hyper-Threads
     92enters a C-state.
     93
     94The kernel provides a function to invoke the buffer clearing:
     95
     96    mds_clear_cpu_buffers()
     97
     98The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
     99(idle) transitions.
    100
    101As a special quirk to address virtualization scenarios where the host has
    102the microcode updated, but the hypervisor does not (yet) expose the
    103MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
    104hope that it might actually clear the buffers. The state is reflected
    105accordingly.
    106
    107According to current knowledge additional mitigations inside the kernel
    108itself are not required because the necessary gadgets to expose the leaked
    109data cannot be controlled in a way which allows exploitation from malicious
    110user space or VM guests.
    111
    112Kernel internal mitigation modes
    113--------------------------------
    114
    115 ======= ============================================================
    116 off      Mitigation is disabled. Either the CPU is not affected or
    117          mds=off is supplied on the kernel command line
    118
    119 full     Mitigation is enabled. CPU is affected and MD_CLEAR is
    120          advertised in CPUID.
    121
    122 vmwerv	  Mitigation is enabled. CPU is affected and MD_CLEAR is not
    123	  advertised in CPUID. That is mainly for virtualization
    124	  scenarios where the host has the updated microcode but the
    125	  hypervisor does not expose MD_CLEAR in CPUID. It's a best
    126	  effort approach without guarantee.
    127 ======= ============================================================
    128
    129If the CPU is affected and mds=off is not supplied on the kernel command
    130line then the kernel selects the appropriate mitigation mode depending on
    131the availability of the MD_CLEAR CPUID bit.
    132
    133Mitigation points
    134-----------------
    135
    1361. Return to user space
    137^^^^^^^^^^^^^^^^^^^^^^^
    138
    139   When transitioning from kernel to user space the CPU buffers are flushed
    140   on affected CPUs when the mitigation is not disabled on the kernel
    141   command line. The migitation is enabled through the static key
    142   mds_user_clear.
    143
    144   The mitigation is invoked in prepare_exit_to_usermode() which covers
    145   all but one of the kernel to user space transitions.  The exception
    146   is when we return from a Non Maskable Interrupt (NMI), which is
    147   handled directly in do_nmi().
    148
    149   (The reason that NMI is special is that prepare_exit_to_usermode() can
    150    enable IRQs.  In NMI context, NMIs are blocked, and we don't want to
    151    enable IRQs with NMIs blocked.)
    152
    153
    1542. C-State transition
    155^^^^^^^^^^^^^^^^^^^^^
    156
    157   When a CPU goes idle and enters a C-State the CPU buffers need to be
    158   cleared on affected CPUs when SMT is active. This addresses the
    159   repartitioning of the store buffer when one of the Hyper-Threads enters
    160   a C-State.
    161
    162   When SMT is inactive, i.e. either the CPU does not support it or all
    163   sibling threads are offline CPU buffer clearing is not required.
    164
    165   The idle clearing is enabled on CPUs which are only affected by MSBDS
    166   and not by any other MDS variant. The other MDS variants cannot be
    167   protected against cross Hyper-Thread attacks because the Fill Buffer and
    168   the Load Ports are shared. So on CPUs affected by other variants, the
    169   idle clearing would be a window dressing exercise and is therefore not
    170   activated.
    171
    172   The invocation is controlled by the static key mds_idle_clear which is
    173   switched depending on the chosen mitigation mode and the SMT state of
    174   the system.
    175
    176   The buffer clear is only invoked before entering the C-State to prevent
    177   that stale data from the idling CPU from spilling to the Hyper-Thread
    178   sibling after the store buffer got repartitioned and all entries are
    179   available to the non idle sibling.
    180
    181   When coming out of idle the store buffer is partitioned again so each
    182   sibling has half of it available. The back from idle CPU could be then
    183   speculatively exposed to contents of the sibling. The buffers are
    184   flushed either on exit to user space or on VMENTER so malicious code
    185   in user space or the guest cannot speculatively access them.
    186
    187   The mitigation is hooked into all variants of halt()/mwait(), but does
    188   not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
    189   has been superseded by the intel_idle driver around 2010 and is
    190   preferred on all affected CPUs which are expected to gain the MD_CLEAR
    191   functionality in microcode. Aside of that the IO-Port mechanism is a
    192   legacy interface which is only used on older systems which are either
    193   not affected or do not receive microcode updates anymore.