cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

kernel_mode_neon.rst (5757B)


      1================
      2Kernel mode NEON
      3================
      4
      5TL;DR summary
      6-------------
      7* Use only NEON instructions, or VFP instructions that don't rely on support
      8  code
      9* Isolate your NEON code in a separate compilation unit, and compile it with
     10  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
     11* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
     12  NEON code
     13* Don't sleep in your NEON code, and be aware that it will be executed with
     14  preemption disabled
     15
     16
     17Introduction
     18------------
     19It is possible to use NEON instructions (and in some cases, VFP instructions) in
     20code that runs in kernel mode. However, for performance reasons, the NEON/VFP
     21register file is not preserved and restored at every context switch or taken
     22exception like the normal register file is, so some manual intervention is
     23required. Furthermore, special care is required for code that may sleep [i.e.,
     24may call schedule()], as NEON or VFP instructions will be executed in a
     25non-preemptible section for reasons outlined below.
     26
     27
     28Lazy preserve and restore
     29-------------------------
     30The NEON/VFP register file is managed using lazy preserve (on UP systems) and
     31lazy restore (on both SMP and UP systems). This means that the register file is
     32kept 'live', and is only preserved and restored when multiple tasks are
     33contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
     34another core). Lazy restore is implemented by disabling the NEON/VFP unit after
     35every context switch, resulting in a trap when subsequently a NEON/VFP
     36instruction is issued, allowing the kernel to step in and perform the restore if
     37necessary.
     38
     39Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
     40it is required to do an 'eager' preserve of the NEON/VFP register file, and
     41enable the NEON/VFP unit explicitly so no exceptions are generated on first
     42subsequent use. This is handled by the function kernel_neon_begin(), which
     43should be called before any kernel mode NEON or VFP instructions are issued.
     44Likewise, the NEON/VFP unit should be disabled again after use to make sure user
     45mode will hit the lazy restore trap upon next use. This is handled by the
     46function kernel_neon_end().
     47
     48
     49Interruptions in kernel mode
     50----------------------------
     51For reasons of performance and simplicity, it was decided that there shall be no
     52preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
     53implies that interruptions of a kernel mode NEON section can only be allowed if
     54they are guaranteed not to touch the NEON/VFP registers. For this reason, the
     55following rules and restrictions apply in the kernel:
     56* NEON/VFP code is not allowed in interrupt context;
     57* NEON/VFP code is not allowed to sleep;
     58* NEON/VFP code is executed with preemption disabled.
     59
     60If latency is a concern, it is possible to put back to back calls to
     61kernel_neon_end() and kernel_neon_begin() in places in your code where none of
     62the NEON registers are live. (Additional calls to kernel_neon_begin() should be
     63reasonably cheap if no context switch occurred in the meantime)
     64
     65
     66VFP and support code
     67--------------------
     68Earlier versions of VFP (prior to version 3) rely on software support for things
     69like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
     70software assistance, it signals the kernel by raising an undefined instruction
     71exception. The kernel responds by inspecting the VFP control registers and the
     72current instruction and arguments, and emulates the instruction in software.
     73
     74Such software assistance is currently not implemented for VFP instructions
     75executed in kernel mode. If such a condition is encountered, the kernel will
     76fail and generate an OOPS.
     77
     78
     79Separating NEON code from ordinary code
     80---------------------------------------
     81The compiler is not aware of the special significance of kernel_neon_begin() and
     82kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
     83between calls to these respective functions. Furthermore, GCC may generate NEON
     84instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
     85kernel is currently compiled at -O2, future changes may result in NEON/VFP
     86instructions appearing in unexpected places if no special care is taken.
     87
     88Therefore, the recommended and only supported way of using NEON/VFP in the
     89kernel is by adhering to the following rules:
     90
     91* isolate the NEON code in a separate compilation unit and compile it with
     92  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
     93* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
     94  into the unit containing the NEON code from a compilation unit which is *not*
     95  built with the GCC flag '-mfpu=neon' set.
     96
     97As the kernel is compiled with '-msoft-float', the above will guarantee that
     98both NEON and VFP instructions will only ever appear in designated compilation
     99units at any optimization level.
    100
    101
    102NEON assembler
    103--------------
    104NEON assembler is supported with no additional caveats as long as the rules
    105above are followed.
    106
    107
    108NEON code generated by GCC
    109--------------------------
    110The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
    111parallelism, and generates NEON code from ordinary C source code. This is fully
    112supported as long as the rules above are followed.
    113
    114
    115NEON intrinsics
    116---------------
    117NEON intrinsics are also supported. However, as code using NEON intrinsics
    118relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
    119observe the following in addition to the rules above:
    120
    121* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
    122  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
    123  does not supply);
    124* Include <arm_neon.h> last, or at least after <linux/types.h>