cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

orc-unwinder.rst (8234B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3============
      4ORC unwinder
      5============
      6
      7Overview
      8========
      9
     10The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is
     11similar in concept to a DWARF unwinder.  The difference is that the
     12format of the ORC data is much simpler than DWARF, which in turn allows
     13the ORC unwinder to be much simpler and faster.
     14
     15The ORC data consists of unwind tables which are generated by objtool.
     16They contain out-of-band data which is used by the in-kernel ORC
     17unwinder.  Objtool generates the ORC data by first doing compile-time
     18stack metadata validation (CONFIG_STACK_VALIDATION).  After analyzing
     19all the code paths of a .o file, it determines information about the
     20stack state at each instruction address in the file and outputs that
     21information to the .orc_unwind and .orc_unwind_ip sections.
     22
     23The per-object ORC sections are combined at link time and are sorted and
     24post-processed at boot time.  The unwinder uses the resulting data to
     25correlate instruction addresses with their stack states at run time.
     26
     27
     28ORC vs frame pointers
     29=====================
     30
     31With frame pointers enabled, GCC adds instrumentation code to every
     32function in the kernel.  The kernel's .text size increases by about
     333.2%, resulting in a broad kernel-wide slowdown.  Measurements by Mel
     34Gorman [1]_ have shown a slowdown of 5-10% for some workloads.
     35
     36In contrast, the ORC unwinder has no effect on text size or runtime
     37performance, because the debuginfo is out of band.  So if you disable
     38frame pointers and enable the ORC unwinder, you get a nice performance
     39improvement across the board, and still have reliable stack traces.
     40
     41Ingo Molnar says:
     42
     43  "Note that it's not just a performance improvement, but also an
     44  instruction cache locality improvement: 3.2% .text savings almost
     45  directly transform into a similarly sized reduction in cache
     46  footprint. That can transform to even higher speedups for workloads
     47  whose cache locality is borderline."
     48
     49Another benefit of ORC compared to frame pointers is that it can
     50reliably unwind across interrupts and exceptions.  Frame pointer based
     51unwinds can sometimes skip the caller of the interrupted function, if it
     52was a leaf function or if the interrupt hit before the frame pointer was
     53saved.
     54
     55The main disadvantage of the ORC unwinder compared to frame pointers is
     56that it needs more memory to store the ORC unwind tables: roughly 2-4MB
     57depending on the kernel config.
     58
     59
     60ORC vs DWARF
     61============
     62
     63ORC debuginfo's advantage over DWARF itself is that it's much simpler.
     64It gets rid of the complex DWARF CFI state machine and also gets rid of
     65the tracking of unnecessary registers.  This allows the unwinder to be
     66much simpler, meaning fewer bugs, which is especially important for
     67mission critical oops code.
     68
     69The simpler debuginfo format also enables the unwinder to be much faster
     70than DWARF, which is important for perf and lockdep.  In a basic
     71performance test by Jiri Slaby [2]_, the ORC unwinder was about 20x
     72faster than an out-of-tree DWARF unwinder.  (Note: That measurement was
     73taken before some performance tweaks were added, which doubled
     74performance, so the speedup over DWARF may be closer to 40x.)
     75
     76The ORC data format does have a few downsides compared to DWARF.  ORC
     77unwind tables take up ~50% more RAM (+1.3MB on an x86 defconfig kernel)
     78than DWARF-based eh_frame tables.
     79
     80Another potential downside is that, as GCC evolves, it's conceivable
     81that the ORC data may end up being *too* simple to describe the state of
     82the stack for certain optimizations.  But IMO this is unlikely because
     83GCC saves the frame pointer for any unusual stack adjustments it does,
     84so I suspect we'll really only ever need to keep track of the stack
     85pointer and the frame pointer between call frames.  But even if we do
     86end up having to track all the registers DWARF tracks, at least we will
     87still be able to control the format, e.g. no complex state machines.
     88
     89
     90ORC unwind table generation
     91===========================
     92
     93The ORC data is generated by objtool.  With the existing compile-time
     94stack metadata validation feature, objtool already follows all code
     95paths, and so it already has all the information it needs to be able to
     96generate ORC data from scratch.  So it's an easy step to go from stack
     97validation to ORC data generation.
     98
     99It should be possible to instead generate the ORC data with a simple
    100tool which converts DWARF to ORC data.  However, such a solution would
    101be incomplete due to the kernel's extensive use of asm, inline asm, and
    102special sections like exception tables.
    103
    104That could be rectified by manually annotating those special code paths
    105using GNU assembler .cfi annotations in .S files, and homegrown
    106annotations for inline asm in .c files.  But asm annotations were tried
    107in the past and were found to be unmaintainable.  They were often
    108incorrect/incomplete and made the code harder to read and keep updated.
    109And based on looking at glibc code, annotating inline asm in .c files
    110might be even worse.
    111
    112Objtool still needs a few annotations, but only in code which does
    113unusual things to the stack like entry code.  And even then, far fewer
    114annotations are needed than what DWARF would need, so they're much more
    115maintainable than DWARF CFI annotations.
    116
    117So the advantages of using objtool to generate ORC data are that it
    118gives more accurate debuginfo, with very few annotations.  It also
    119insulates the kernel from toolchain bugs which can be very painful to
    120deal with in the kernel since we often have to workaround issues in
    121older versions of the toolchain for years.
    122
    123The downside is that the unwinder now becomes dependent on objtool's
    124ability to reverse engineer GCC code flow.  If GCC optimizations become
    125too complicated for objtool to follow, the ORC data generation might
    126stop working or become incomplete.  (It's worth noting that livepatch
    127already has such a dependency on objtool's ability to follow GCC code
    128flow.)
    129
    130If newer versions of GCC come up with some optimizations which break
    131objtool, we may need to revisit the current implementation.  Some
    132possible solutions would be asking GCC to make the optimizations more
    133palatable, or having objtool use DWARF as an additional input, or
    134creating a GCC plugin to assist objtool with its analysis.  But for now,
    135objtool follows GCC code quite well.
    136
    137
    138Unwinder implementation details
    139===============================
    140
    141Objtool generates the ORC data by integrating with the compile-time
    142stack metadata validation feature, which is described in detail in
    143tools/objtool/Documentation/stack-validation.txt.  After analyzing all
    144the code paths of a .o file, it creates an array of orc_entry structs,
    145and a parallel array of instruction addresses associated with those
    146structs, and writes them to the .orc_unwind and .orc_unwind_ip sections
    147respectively.
    148
    149The ORC data is split into the two arrays for performance reasons, to
    150make the searchable part of the data (.orc_unwind_ip) more compact.  The
    151arrays are sorted in parallel at boot time.
    152
    153Performance is further improved by the use of a fast lookup table which
    154is created at runtime.  The fast lookup table associates a given address
    155with a range of indices for the .orc_unwind table, so that only a small
    156subset of the table needs to be searched.
    157
    158
    159Etymology
    160=========
    161
    162Orcs, fearsome creatures of medieval folklore, are the Dwarves' natural
    163enemies.  Similarly, the ORC unwinder was created in opposition to the
    164complexity and slowness of DWARF.
    165
    166"Although Orcs rarely consider multiple solutions to a problem, they do
    167excel at getting things done because they are creatures of action, not
    168thought." [3]_  Similarly, unlike the esoteric DWARF unwinder, the
    169veracious ORC unwinder wastes no time or siloconic effort decoding
    170variable-length zero-extended unsigned-integer byte-coded
    171state-machine-based debug information entries.
    172
    173Similar to how Orcs frequently unravel the well-intentioned plans of
    174their adversaries, the ORC unwinder frequently unravels stacks with
    175brutal, unyielding efficiency.
    176
    177ORC stands for Oops Rewind Capability.
    178
    179
    180.. [1] https://lore.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de
    181.. [2] https://lore.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz
    182.. [3] http://dustin.wikidot.com/half-orcs-and-orcs