cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

memory-tagging-extension.rst (14553B)


      1===============================================
      2Memory Tagging Extension (MTE) in AArch64 Linux
      3===============================================
      4
      5Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
      6         Catalin Marinas <catalin.marinas@arm.com>
      7
      8Date: 2020-02-25
      9
     10This document describes the provision of the Memory Tagging Extension
     11functionality in AArch64 Linux.
     12
     13Introduction
     14============
     15
     16ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
     17feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
     18(Top Byte Ignore) feature and allows software to access a 4-bit
     19allocation tag for each 16-byte granule in the physical address space.
     20Such memory range must be mapped with the Normal-Tagged memory
     21attribute. A logical tag is derived from bits 59-56 of the virtual
     22address used for the memory access. A CPU with MTE enabled will compare
     23the logical tag against the allocation tag and potentially raise an
     24exception on mismatch, subject to system registers configuration.
     25
     26Userspace Support
     27=================
     28
     29When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
     30supported by the hardware, the kernel advertises the feature to
     31userspace via ``HWCAP2_MTE``.
     32
     33PROT_MTE
     34--------
     35
     36To access the allocation tags, a user process must enable the Tagged
     37memory attribute on an address range using a new ``prot`` flag for
     38``mmap()`` and ``mprotect()``:
     39
     40``PROT_MTE`` - Pages allow access to the MTE allocation tags.
     41
     42The allocation tag is set to 0 when such pages are first mapped in the
     43user address space and preserved on copy-on-write. ``MAP_SHARED`` is
     44supported and the allocation tags can be shared between processes.
     45
     46**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
     47RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
     48types of mapping will result in ``-EINVAL`` returned by these system
     49calls.
     50
     51**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
     52be cleared by ``mprotect()``.
     53
     54**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
     55``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
     56point after the system call.
     57
     58Tag Check Faults
     59----------------
     60
     61When ``PROT_MTE`` is enabled on an address range and a mismatch between
     62the logical and allocation tags occurs on access, there are three
     63configurable behaviours:
     64
     65- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
     66  tag check fault.
     67
     68- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
     69  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
     70  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
     71  by the offending thread, the containing process is terminated with a
     72  ``coredump``.
     73
     74- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
     75  thread, asynchronously following one or multiple tag check faults,
     76  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
     77  address is unknown).
     78
     79- *Asymmetric* - Reads are handled as for synchronous mode while writes
     80  are handled as for asynchronous mode.
     81
     82The user can select the above modes, per thread, using the
     83``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
     84contains any number of the following values in the ``PR_MTE_TCF_MASK``
     85bit-field:
     86
     87- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
     88                         (ignored if combined with other options)
     89- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
     90- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
     91
     92If no modes are specified, tag check faults are ignored. If a single
     93mode is specified, the program will run in that mode. If multiple
     94modes are specified, the mode is selected as described in the "Per-CPU
     95preferred tag checking modes" section below.
     96
     97The current tag check fault configuration can be read using the
     98``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If
     99multiple modes were requested then all will be reported.
    100
    101Tag checking can also be disabled for a user thread by setting the
    102``PSTATE.TCO`` bit with ``MSR TCO, #1``.
    103
    104**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
    105irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
    106``sigreturn()``.
    107
    108**Note**: There are no *match-all* logical tags available for user
    109applications.
    110
    111**Note**: Kernel accesses to the user address space (e.g. ``read()``
    112system call) are not checked if the user thread tag checking mode is
    113``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
    114``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
    115address accesses, however it cannot always guarantee it. Kernel accesses
    116to user addresses are always performed with an effective ``PSTATE.TCO``
    117value of zero, regardless of the user configuration.
    118
    119Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
    120-----------------------------------------------------------------
    121
    122The architecture allows excluding certain tags to be randomly generated
    123via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
    124excludes all tags other than 0. A user thread can enable specific tags
    125in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
    126flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
    127in the ``PR_MTE_TAG_MASK`` bit-field.
    128
    129**Note**: The hardware uses an exclude mask but the ``prctl()``
    130interface provides an include mask. An include mask of ``0`` (exclusion
    131mask ``0xffff``) results in the CPU always generating tag ``0``.
    132
    133Per-CPU preferred tag checking mode
    134-----------------------------------
    135
    136On some CPUs the performance of MTE in stricter tag checking modes
    137is similar to that of less strict tag checking modes. This makes it
    138worthwhile to enable stricter checks on those CPUs when a less strict
    139checking mode is requested, in order to gain the error detection
    140benefits of the stricter checks without the performance downsides. To
    141support this scenario, a privileged user may configure a stricter
    142tag checking mode as the CPU's preferred tag checking mode.
    143
    144The preferred tag checking mode for each CPU is controlled by
    145``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
    146privileged user may write the value ``async``, ``sync`` or ``asymm``.  The
    147default preferred mode for each CPU is ``async``.
    148
    149To allow a program to potentially run in the CPU's preferred tag
    150checking mode, the user program may set multiple tag check fault mode
    151bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
    152flags, 0, 0, 0)`` system call. If both synchronous and asynchronous
    153modes are requested then asymmetric mode may also be selected by the
    154kernel. If the CPU's preferred tag checking mode is in the task's set
    155of provided tag checking modes, that mode will be selected. Otherwise,
    156one of the modes in the task's mode will be selected by the kernel
    157from the task's mode set using the preference order:
    158
    159	1. Asynchronous
    160	2. Asymmetric
    161	3. Synchronous
    162
    163Note that there is no way for userspace to request multiple modes and
    164also disable asymmetric mode.
    165
    166Initial process state
    167---------------------
    168
    169On ``execve()``, the new process has the following configuration:
    170
    171- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
    172- No tag checking modes are selected (tag check faults ignored)
    173- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
    174- ``PSTATE.TCO`` set to 0
    175- ``PROT_MTE`` not set on any of the initial memory maps
    176
    177On ``fork()``, the new process inherits the parent's configuration and
    178memory map attributes with the exception of the ``madvise()`` ranges
    179with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
    180to 0).
    181
    182The ``ptrace()`` interface
    183--------------------------
    184
    185``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
    186the tags from or set the tags to a tracee's address space. The
    187``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
    188data)`` where:
    189
    190- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
    191- ``pid`` - the tracee's PID.
    192- ``addr`` - address in the tracee's address space.
    193- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
    194  a buffer of ``iov_len`` length in the tracer's address space.
    195
    196The tags in the tracer's ``iov_base`` buffer are represented as one
    1974-bit tag per byte and correspond to a 16-byte MTE tag granule in the
    198tracee's address space.
    199
    200**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
    201will use the corresponding aligned address.
    202
    203``ptrace()`` return value:
    204
    205- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
    206  number of tags transferred. This may be smaller than the requested
    207  ``iov_len`` if the requested address range in the tracee's or the
    208  tracer's space cannot be accessed or does not have valid tags.
    209- ``-EPERM`` - the specified process cannot be traced.
    210- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
    211  address) and no tags copied. ``iov_len`` not updated.
    212- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
    213  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
    214- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
    215  mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
    216
    217**Note**: There are no transient errors for the requests above, so user
    218programs should not retry in case of a non-zero system call return.
    219
    220``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
    221``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
    222address ABI control and MTE configuration of a process as per the
    223``prctl()`` options described in
    224Documentation/arm64/tagged-address-abi.rst and above. The corresponding
    225``regset`` is 1 element of 8 bytes (``sizeof(long))``).
    226
    227Core dump support
    228-----------------
    229
    230The allocation tags for user memory mapped with ``PROT_MTE`` are dumped
    231in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The
    232program header for such segment is defined as:
    233
    234:``p_type``: ``PT_AARCH64_MEMTAG_MTE``
    235:``p_flags``: 0
    236:``p_offset``: segment file offset
    237:``p_vaddr``: segment virtual address, same as the corresponding
    238  ``PT_LOAD`` segment
    239:``p_paddr``: 0
    240:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32``
    241  (two 4-bit tags cover 32 bytes of memory)
    242:``p_memsz``: segment size in memory, same as the corresponding
    243  ``PT_LOAD`` segment
    244:``p_align``: 0
    245
    246The tags are stored in the core file at ``p_offset`` as two 4-bit tags
    247in a byte. With the tag granule of 16 bytes, a 4K page requires 128
    248bytes in the core file.
    249
    250Example of correct usage
    251========================
    252
    253*MTE Example code*
    254
    255.. code-block:: c
    256
    257    /*
    258     * To be compiled with -march=armv8.5-a+memtag
    259     */
    260    #include <errno.h>
    261    #include <stdint.h>
    262    #include <stdio.h>
    263    #include <stdlib.h>
    264    #include <unistd.h>
    265    #include <sys/auxv.h>
    266    #include <sys/mman.h>
    267    #include <sys/prctl.h>
    268
    269    /*
    270     * From arch/arm64/include/uapi/asm/hwcap.h
    271     */
    272    #define HWCAP2_MTE              (1 << 18)
    273
    274    /*
    275     * From arch/arm64/include/uapi/asm/mman.h
    276     */
    277    #define PROT_MTE                 0x20
    278
    279    /*
    280     * From include/uapi/linux/prctl.h
    281     */
    282    #define PR_SET_TAGGED_ADDR_CTRL 55
    283    #define PR_GET_TAGGED_ADDR_CTRL 56
    284    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
    285    # define PR_MTE_TCF_SHIFT       1
    286    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
    287    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
    288    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
    289    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
    290    # define PR_MTE_TAG_SHIFT       3
    291    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
    292
    293    /*
    294     * Insert a random logical tag into the given pointer.
    295     */
    296    #define insert_random_tag(ptr) ({                       \
    297            uint64_t __val;                                 \
    298            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
    299            __val;                                          \
    300    })
    301
    302    /*
    303     * Set the allocation tag on the destination address.
    304     */
    305    #define set_tag(tagged_addr) do {                                      \
    306            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
    307    } while (0)
    308
    309    int main()
    310    {
    311            unsigned char *a;
    312            unsigned long page_sz = sysconf(_SC_PAGESIZE);
    313            unsigned long hwcap2 = getauxval(AT_HWCAP2);
    314
    315            /* check if MTE is present */
    316            if (!(hwcap2 & HWCAP2_MTE))
    317                    return EXIT_FAILURE;
    318
    319            /*
    320             * Enable the tagged address ABI, synchronous or asynchronous MTE
    321             * tag check faults (based on per-CPU preference) and allow all
    322             * non-zero tags in the randomly generated set.
    323             */
    324            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
    325                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
    326                      (0xfffe << PR_MTE_TAG_SHIFT),
    327                      0, 0, 0)) {
    328                    perror("prctl() failed");
    329                    return EXIT_FAILURE;
    330            }
    331
    332            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
    333                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    334            if (a == MAP_FAILED) {
    335                    perror("mmap() failed");
    336                    return EXIT_FAILURE;
    337            }
    338
    339            /*
    340             * Enable MTE on the above anonymous mmap. The flag could be passed
    341             * directly to mmap() and skip this step.
    342             */
    343            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
    344                    perror("mprotect() failed");
    345                    return EXIT_FAILURE;
    346            }
    347
    348            /* access with the default tag (0) */
    349            a[0] = 1;
    350            a[1] = 2;
    351
    352            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
    353
    354            /* set the logical and allocation tags */
    355            a = (unsigned char *)insert_random_tag(a);
    356            set_tag(a);
    357
    358            printf("%p\n", a);
    359
    360            /* non-zero tag access */
    361            a[0] = 3;
    362            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
    363
    364            /*
    365             * If MTE is enabled correctly the next instruction will generate an
    366             * exception.
    367             */
    368            printf("Expecting SIGSEGV...\n");
    369            a[16] = 0xdd;
    370
    371            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
    372            printf("...haven't got one\n");
    373
    374            return EXIT_FAILURE;
    375    }