cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

reliable-stacktrace.rst (12747B)


      1===================
      2Reliable Stacktrace
      3===================
      4
      5This document outlines basic information about reliable stacktracing.
      6
      7.. Table of Contents:
      8
      9.. contents:: :local:
     10
     111. Introduction
     12===============
     13
     14The kernel livepatch consistency model relies on accurately identifying which
     15functions may have live state and therefore may not be safe to patch. One way
     16to identify which functions are live is to use a stacktrace.
     17
     18Existing stacktrace code may not always give an accurate picture of all
     19functions with live state, and best-effort approaches which can be helpful for
     20debugging are unsound for livepatching. Livepatching depends on architectures
     21to provide a *reliable* stacktrace which ensures it never omits any live
     22functions from a trace.
     23
     24
     252. Requirements
     26===============
     27
     28Architectures must implement one of the reliable stacktrace functions.
     29Architectures using CONFIG_ARCH_STACKWALK must implement
     30'arch_stack_walk_reliable', and other architectures must implement
     31'save_stack_trace_tsk_reliable'.
     32
     33Principally, the reliable stacktrace function must ensure that either:
     34
     35* The trace includes all functions that the task may be returned to, and the
     36  return code is zero to indicate that the trace is reliable.
     37
     38* The return code is non-zero to indicate that the trace is not reliable.
     39
     40.. note::
     41   In some cases it is legitimate to omit specific functions from the trace,
     42   but all other functions must be reported. These cases are described in
     43   futher detail below.
     44
     45Secondly, the reliable stacktrace function must be robust to cases where
     46the stack or other unwind state is corrupt or otherwise unreliable. The
     47function should attempt to detect such cases and return a non-zero error
     48code, and should not get stuck in an infinite loop or access memory in
     49an unsafe way.  Specific cases are described in further detail below.
     50
     51
     523. Compile-time analysis
     53========================
     54
     55To ensure that kernel code can be correctly unwound in all cases,
     56architectures may need to verify that code has been compiled in a manner
     57expected by the unwinder. For example, an unwinder may expect that
     58functions manipulate the stack pointer in a limited way, or that all
     59functions use specific prologue and epilogue sequences. Architectures
     60with such requirements should verify the kernel compilation using
     61objtool.
     62
     63In some cases, an unwinder may require metadata to correctly unwind.
     64Where necessary, this metadata should be generated at build time using
     65objtool.
     66
     67
     684. Considerations
     69=================
     70
     71The unwinding process varies across architectures, their respective procedure
     72call standards, and kernel configurations. This section describes common
     73details that architectures should consider.
     74
     754.1 Identifying successful termination
     76--------------------------------------
     77
     78Unwinding may terminate early for a number of reasons, including:
     79
     80* Stack or frame pointer corruption.
     81
     82* Missing unwind support for an uncommon scenario, or a bug in the unwinder.
     83
     84* Dynamically generated code (e.g. eBPF) or foreign code (e.g. EFI runtime
     85  services) not following the conventions expected by the unwinder.
     86
     87To ensure that this does not result in functions being omitted from the trace,
     88even if not caught by other checks, it is strongly recommended that
     89architectures verify that a stacktrace ends at an expected location, e.g.
     90
     91* Within a specific function that is an entry point to the kernel.
     92
     93* At a specific location on a stack expected for a kernel entry point.
     94
     95* On a specific stack expected for a kernel entry point (e.g. if the
     96  architecture has separate task and IRQ stacks).
     97
     984.2 Identifying unwindable code
     99-------------------------------
    100
    101Unwinding typically relies on code following specific conventions (e.g.
    102manipulating a frame pointer), but there can be code which may not follow these
    103conventions and may require special handling in the unwinder, e.g.
    104
    105* Exception vectors and entry assembly.
    106
    107* Procedure Linkage Table (PLT) entries and veneer functions.
    108
    109* Trampoline assembly (e.g. ftrace, kprobes).
    110
    111* Dynamically generated code (e.g. eBPF, optprobe trampolines).
    112
    113* Foreign code (e.g. EFI runtime services).
    114
    115To ensure that such cases do not result in functions being omitted from a
    116trace, it is strongly recommended that architectures positively identify code
    117which is known to be reliable to unwind from, and reject unwinding from all
    118other code.
    119
    120Kernel code including modules and eBPF can be distinguished from foreign code
    121using '__kernel_text_address()'. Checking for this also helps to detect stack
    122corruption.
    123
    124There are several ways an architecture may identify kernel code which is deemed
    125unreliable to unwind from, e.g.
    126
    127* Placing such code into special linker sections, and rejecting unwinding from
    128  any code in these sections.
    129
    130* Identifying specific portions of code using bounds information.
    131
    1324.3 Unwinding across interrupts and exceptions
    133----------------------------------------------
    134
    135At function call boundaries the stack and other unwind state is expected to be
    136in a consistent state suitable for reliable unwinding, but this may not be the
    137case part-way through a function. For example, during a function prologue or
    138epilogue a frame pointer may be transiently invalid, or during the function
    139body the return address may be held in an arbitrary general purpose register.
    140For some architectures this may change at runtime as a result of dynamic
    141instrumentation.
    142
    143If an interrupt or other exception is taken while the stack or other unwind
    144state is in an inconsistent state, it may not be possible to reliably unwind,
    145and it may not be possible to identify whether such unwinding will be reliable.
    146See below for examples.
    147
    148Architectures which cannot identify when it is reliable to unwind such cases
    149(or where it is never reliable) must reject unwinding across exception
    150boundaries. Note that it may be reliable to unwind across certain
    151exceptions (e.g. IRQ) but unreliable to unwind across other exceptions
    152(e.g. NMI).
    153
    154Architectures which can identify when it is reliable to unwind such cases (or
    155have no such cases) should attempt to unwind across exception boundaries, as
    156doing so can prevent unnecessarily stalling livepatch consistency checks and
    157permits livepatch transitions to complete more quickly.
    158
    1594.4 Rewriting of return addresses
    160---------------------------------
    161
    162Some trampolines temporarily modify the return address of a function in order
    163to intercept when that function returns with a return trampoline, e.g.
    164
    165* An ftrace trampoline may modify the return address so that function graph
    166  tracing can intercept returns.
    167
    168* A kprobes (or optprobes) trampoline may modify the return address so that
    169  kretprobes can intercept returns.
    170
    171When this happens, the original return address will not be in its usual
    172location. For trampolines which are not subject to live patching, where an
    173unwinder can reliably determine the original return address and no unwind state
    174is altered by the trampoline, the unwinder may report the original return
    175address in place of the trampoline and report this as reliable. Otherwise, an
    176unwinder must report these cases as unreliable.
    177
    178Special care is required when identifying the original return address, as this
    179information is not in a consistent location for the duration of the entry
    180trampoline or return trampoline. For example, considering the x86_64
    181'return_to_handler' return trampoline:
    182
    183.. code-block:: none
    184
    185   SYM_CODE_START(return_to_handler)
    186           UNWIND_HINT_EMPTY
    187           subq  $24, %rsp
    188
    189           /* Save the return values */
    190           movq %rax, (%rsp)
    191           movq %rdx, 8(%rsp)
    192           movq %rbp, %rdi
    193
    194           call ftrace_return_to_handler
    195
    196           movq %rax, %rdi
    197           movq 8(%rsp), %rdx
    198           movq (%rsp), %rax
    199           addq $24, %rsp
    200           JMP_NOSPEC rdi
    201   SYM_CODE_END(return_to_handler)
    202
    203While the traced function runs its return address on the stack points to
    204the start of return_to_handler, and the original return address is stored in
    205the task's cur_ret_stack. During this time the unwinder can find the return
    206address using ftrace_graph_ret_addr().
    207
    208When the traced function returns to return_to_handler, there is no longer a
    209return address on the stack, though the original return address is still stored
    210in the task's cur_ret_stack. Within ftrace_return_to_handler(), the original
    211return address is removed from cur_ret_stack and is transiently moved
    212arbitrarily by the compiler before being returned in rax. The return_to_handler
    213trampoline moves this into rdi before jumping to it.
    214
    215Architectures might not always be able to unwind such sequences, such as when
    216ftrace_return_to_handler() has removed the address from cur_ret_stack, and the
    217location of the return address cannot be reliably determined.
    218
    219It is recommended that architectures unwind cases where return_to_handler has
    220not yet been returned to, but architectures are not required to unwind from the
    221middle of return_to_handler and can report this as unreliable. Architectures
    222are not required to unwind from other trampolines which modify the return
    223address.
    224
    2254.5 Obscuring of return addresses
    226---------------------------------
    227
    228Some trampolines do not rewrite the return address in order to intercept
    229returns, but do transiently clobber the return address or other unwind state.
    230
    231For example, the x86_64 implementation of optprobes patches the probed function
    232with a JMP instruction which targets the associated optprobe trampoline. When
    233the probe is hit, the CPU will branch to the optprobe trampoline, and the
    234address of the probed function is not held in any register or on the stack.
    235
    236Similarly, the arm64 implementation of DYNAMIC_FTRACE_WITH_REGS patches traced
    237functions with the following:
    238
    239.. code-block:: none
    240
    241   MOV X9, X30
    242   BL <trampoline>
    243
    244The MOV saves the link register (X30) into X9 to preserve the return address
    245before the BL clobbers the link register and branches to the trampoline. At the
    246start of the trampoline, the address of the traced function is in X9 rather
    247than the link register as would usually be the case.
    248
    249Architectures must either ensure that unwinders either reliably unwind
    250such cases, or report the unwinding as unreliable.
    251
    2524.6 Link register unreliability
    253-------------------------------
    254
    255On some other architectures, 'call' instructions place the return address into a
    256link register, and 'return' instructions consume the return address from the
    257link register without modifying the register. On these architectures software
    258must save the return address to the stack prior to making a function call. Over
    259the duration of a function call, the return address may be held in the link
    260register alone, on the stack alone, or in both locations.
    261
    262Unwinders typically assume the link register is always live, but this
    263assumption can lead to unreliable stack traces. For example, consider the
    264following arm64 assembly for a simple function:
    265
    266.. code-block:: none
    267
    268   function:
    269           STP X29, X30, [SP, -16]!
    270           MOV X29, SP
    271           BL <other_function>
    272           LDP X29, X30, [SP], #16
    273           RET
    274
    275At entry to the function, the link register (x30) points to the caller, and the
    276frame pointer (X29) points to the caller's frame including the caller's return
    277address. The first two instructions create a new stackframe and update the
    278frame pointer, and at this point the link register and the frame pointer both
    279describe this function's return address. A trace at this point may describe
    280this function twice, and if the function return is being traced, the unwinder
    281may consume two entries from the fgraph return stack rather than one entry.
    282
    283The BL invokes 'other_function' with the link register pointing to this
    284function's LDR and the frame pointer pointing to this function's stackframe.
    285When 'other_function' returns, the link register is left pointing at the BL,
    286and so a trace at this point could result in 'function' appearing twice in the
    287backtrace.
    288
    289Similarly, a function may deliberately clobber the LR, e.g.
    290
    291.. code-block:: none
    292
    293   caller:
    294           STP X29, X30, [SP, -16]!
    295           MOV X29, SP
    296           ADR LR, <callee>
    297           BLR LR
    298           LDP X29, X30, [SP], #16
    299           RET
    300
    301The ADR places the address of 'callee' into the LR, before the BLR branches to
    302this address. If a trace is made immediately after the ADR, 'callee' will
    303appear to be the parent of 'caller', rather than the child.
    304
    305Due to cases such as the above, it may only be possible to reliably consume a
    306link register value at a function call boundary. Architectures where this is
    307the case must reject unwinding across exception boundaries unless they can
    308reliably identify when the LR or stack value should be used (e.g. using
    309metadata generated by objtool).