cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

sgx.rst (12444B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3===============================
      4Software Guard eXtensions (SGX)
      5===============================
      6
      7Overview
      8========
      9
     10Software Guard eXtensions (SGX) hardware enables for user space applications
     11to set aside private memory regions of code and data:
     12
     13* Privileged (ring-0) ENCLS functions orchestrate the construction of the
     14  regions.
     15* Unprivileged (ring-3) ENCLU functions allow an application to enter and
     16  execute inside the regions.
     17
     18These memory regions are called enclaves. An enclave can be only entered at a
     19fixed set of entry points. Each entry point can hold a single hardware thread
     20at a time.  While the enclave is loaded from a regular binary file by using
     21ENCLS functions, only the threads inside the enclave can access its memory. The
     22region is denied from outside access by the CPU, and encrypted before it leaves
     23from LLC.
     24
     25The support can be determined by
     26
     27	``grep sgx /proc/cpuinfo``
     28
     29SGX must both be supported in the processor and enabled by the BIOS.  If SGX
     30appears to be unsupported on a system which has hardware support, ensure
     31support is enabled in the BIOS.  If a BIOS presents a choice between "Enabled"
     32and "Software Enabled" modes for SGX, choose "Enabled".
     33
     34Enclave Page Cache
     35==================
     36
     37SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated
     38with an enclave. It is contained in a BIOS-reserved region of physical memory.
     39Unlike pages used for regular memory, pages can only be accessed from outside of
     40the enclave during enclave construction with special, limited SGX instructions.
     41
     42Only a CPU executing inside an enclave can directly access enclave memory.
     43However, a CPU executing inside an enclave may access normal memory outside the
     44enclave.
     45
     46The kernel manages enclave memory similar to how it treats device memory.
     47
     48Enclave Page Types
     49------------------
     50
     51**SGX Enclave Control Structure (SECS)**
     52   Enclave's address range, attributes and other global data are defined
     53   by this structure.
     54
     55**Regular (REG)**
     56   Regular EPC pages contain the code and data of an enclave.
     57
     58**Thread Control Structure (TCS)**
     59   Thread Control Structure pages define the entry points to an enclave and
     60   track the execution state of an enclave thread.
     61
     62**Version Array (VA)**
     63   Version Array pages contain 512 slots, each of which can contain a version
     64   number for a page evicted from the EPC.
     65
     66Enclave Page Cache Map
     67----------------------
     68
     69The processor tracks EPC pages in a hardware metadata structure called the
     70*Enclave Page Cache Map (EPCM)*.  The EPCM contains an entry for each EPC page
     71which describes the owning enclave, access rights and page type among the other
     72things.
     73
     74EPCM permissions are separate from the normal page tables.  This prevents the
     75kernel from, for instance, allowing writes to data which an enclave wishes to
     76remain read-only.  EPCM permissions may only impose additional restrictions on
     77top of normal x86 page permissions.
     78
     79For all intents and purposes, the SGX architecture allows the processor to
     80invalidate all EPCM entries at will.  This requires that software be prepared to
     81handle an EPCM fault at any time.  In practice, this can happen on events like
     82power transitions when the ephemeral key that encrypts enclave memory is lost.
     83
     84Application interface
     85=====================
     86
     87Enclave build functions
     88-----------------------
     89
     90In addition to the traditional compiler and linker build process, SGX has a
     91separate enclave “build” process.  Enclaves must be built before they can be
     92executed (entered). The first step in building an enclave is opening the
     93**/dev/sgx_enclave** device.  Since enclave memory is protected from direct
     94access, special privileged instructions are then used to copy data into enclave
     95pages and establish enclave page permissions.
     96
     97.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
     98   :functions: sgx_ioc_enclave_create
     99               sgx_ioc_enclave_add_pages
    100               sgx_ioc_enclave_init
    101               sgx_ioc_enclave_provision
    102
    103Enclave vDSO
    104------------
    105
    106Entering an enclave can only be done through SGX-specific EENTER and ERESUME
    107functions, and is a non-trivial process.  Because of the complexity of
    108transitioning to and from an enclave, enclaves typically utilize a library to
    109handle the actual transitions.  This is roughly analogous to how glibc
    110implementations are used by most applications to wrap system calls.
    111
    112Another crucial characteristic of enclaves is that they can generate exceptions
    113as part of their normal operation that need to be handled in the enclave or are
    114unique to SGX.
    115
    116Instead of the traditional signal mechanism to handle these exceptions, SGX
    117can leverage special exception fixup provided by the vDSO.  The kernel-provided
    118vDSO function wraps low-level transitions to/from the enclave like EENTER and
    119ERESUME.  The vDSO function intercepts exceptions that would otherwise generate
    120a signal and return the fault information directly to its caller.  This avoids
    121the need to juggle signal handlers.
    122
    123.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
    124   :functions: vdso_sgx_enter_enclave_t
    125
    126ksgxd
    127=====
    128
    129SGX support includes a kernel thread called *ksgxd*.
    130
    131EPC sanitization
    132----------------
    133
    134ksgxd is started when SGX initializes.  Enclave memory is typically ready
    135for use when the processor powers on or resets.  However, if SGX has been in
    136use since the reset, enclave pages may be in an inconsistent state.  This might
    137occur after a crash and kexec() cycle, for instance.  At boot, ksgxd
    138reinitializes all enclave pages so that they can be allocated and re-used.
    139
    140The sanitization is done by going through EPC address space and applying the
    141EREMOVE function to each physical page. Some enclave pages like SECS pages have
    142hardware dependencies on other pages which prevents EREMOVE from functioning.
    143Executing two EREMOVE passes removes the dependencies.
    144
    145Page reclaimer
    146--------------
    147
    148Similar to the core kswapd, ksgxd, is responsible for managing the
    149overcommitment of enclave memory.  If the system runs out of enclave memory,
    150*ksgxd* “swaps” enclave memory to normal memory.
    151
    152Launch Control
    153==============
    154
    155SGX provides a launch control mechanism. After all enclave pages have been
    156copied, kernel executes EINIT function, which initializes the enclave. Only after
    157this the CPU can execute inside the enclave.
    158
    159EINIT function takes an RSA-3072 signature of the enclave measurement.  The function
    160checks that the measurement is correct and signature is signed with the key
    161hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
    162SHA256 of a public key.
    163
    164Those MSRs can be configured by the BIOS to be either readable or writable.
    165Linux supports only writable configuration in order to give full control to the
    166kernel on launch control policy. Before calling EINIT function, the driver sets
    167the MSRs to match the enclave's signing key.
    168
    169Encryption engines
    170==================
    171
    172In order to conceal the enclave data while it is out of the CPU package, the
    173memory controller has an encryption engine to transparently encrypt and decrypt
    174enclave memory.
    175
    176In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to
    177encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in
    178SRAM to maintain integrity of the encrypted data. This provides integrity and
    179anti-replay protection but does not scale to large memory sizes because the time
    180required to update the Merkle tree grows logarithmically in relation to the
    181memory size.
    182
    183CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
    184MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
    185means integrity and replay-attacks are not mitigated.  B, it includes
    186additional changes to prevent cipher text from being returned and SW memory
    187aliases from being created.
    188
    189DMA to enclave memory is blocked by range registers on both MEE and TME systems
    190(SDM section 41.10).
    191
    192Usage Models
    193============
    194
    195Shared Library
    196--------------
    197
    198Sensitive data and the code that acts on it is partitioned from the application
    199into a separate library. The library is then linked as a DSO which can be loaded
    200into an enclave. The application can then make individual function calls into
    201the enclave through special SGX instructions. A run-time within the enclave is
    202configured to marshal function parameters into and out of the enclave and to
    203call the correct library function.
    204
    205Application Container
    206---------------------
    207
    208An application may be loaded into a container enclave which is specially
    209configured with a library OS and run-time which permits the application to run.
    210The enclave run-time and library OS work together to execute the application
    211when a thread enters the enclave.
    212
    213Impact of Potential Kernel SGX Bugs
    214===================================
    215
    216EPC leaks
    217---------
    218
    219When EPC page leaks happen, a WARNING like this is shown in dmesg:
    220
    221"EREMOVE returned ... and an EPC page was leaked.  SGX may become unusable..."
    222
    223This is effectively a kernel use-after-free of an EPC page, and due
    224to the way SGX works, the bug is detected at freeing. Rather than
    225adding the page back to the pool of available EPC pages, the kernel
    226intentionally leaks the page to avoid additional errors in the future.
    227
    228When this happens, the kernel will likely soon leak more EPC pages, and
    229SGX will likely become unusable because the memory available to SGX is
    230limited. However, while this may be fatal to SGX, the rest of the kernel
    231is unlikely to be impacted and should continue to work.
    232
    233As a result, when this happpens, user should stop running any new
    234SGX workloads, (or just any new workloads), and migrate all valuable
    235workloads. Although a machine reboot can recover all EPC memory, the bug
    236should be reported to Linux developers.
    237
    238
    239Virtual EPC
    240===========
    241
    242The implementation has also a virtual EPC driver to support SGX enclaves
    243in guests. Unlike the SGX driver, an EPC page allocated by the virtual
    244EPC driver doesn't have a specific enclave associated with it. This is
    245because KVM doesn't track how a guest uses EPC pages.
    246
    247As a result, the SGX core page reclaimer doesn't support reclaiming EPC
    248pages allocated to KVM guests through the virtual EPC driver. If the
    249user wants to deploy SGX applications both on the host and in guests
    250on the same machine, the user should reserve enough EPC (by taking out
    251total virtual EPC size of all SGX VMs from the physical EPC size) for
    252host SGX applications so they can run with acceptable performance.
    253
    254Architectural behavior is to restore all EPC pages to an uninitialized
    255state also after a guest reboot.  Because this state can be reached only
    256through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
    257provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
    258on all pages in the virtual EPC.
    259
    260``EREMOVE`` can fail for three reasons.  Userspace must pay attention
    261to expected failures and handle them as follows:
    262
    2631. Page removal will always fail when any thread is running in the
    264   enclave to which the page belongs.  In this case the ioctl will
    265   return ``EBUSY`` independent of whether it has successfully removed
    266   some pages; userspace can avoid these failures by preventing execution
    267   of any vcpu which maps the virtual EPC.
    268
    2692. Page removal will cause a general protection fault if two calls to
    270   ``EREMOVE`` happen concurrently for pages that refer to the same
    271   "SECS" metadata pages.  This can happen if there are concurrent
    272   invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc``
    273   file descriptor in the guest is closed at the same time as
    274   ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``.
    275   This can be avoided in userspace by serializing calls to the ioctl()
    276   and to close(), but in general it should not be a problem.
    277
    2783. Finally, page removal will fail for SECS metadata pages which still
    279   have child pages.  Child pages can be removed by executing
    280   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
    281   mapped into the guest.  This means that the ioctl() must be called
    282   twice: an initial set of calls to remove child pages and a subsequent
    283   set of calls to remove SECS pages.  The second set of calls is only
    284   required for those mappings that returned a nonzero value from the
    285   first call.  It indicates a bug in the kernel or the userspace client
    286   if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
    287   a return code other than 0.