cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

kfence.rst (14960B)


      1.. SPDX-License-Identifier: GPL-2.0
      2.. Copyright (C) 2020, Google LLC.
      3
      4Kernel Electric-Fence (KFENCE)
      5==============================
      6
      7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
      8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
      9invalid-free errors.
     10
     11KFENCE is designed to be enabled in production kernels, and has near zero
     12performance overhead. Compared to KASAN, KFENCE trades performance for
     13precision. The main motivation behind KFENCE's design, is that with enough
     14total uptime KFENCE will detect bugs in code paths not typically exercised by
     15non-production test workloads. One way to quickly achieve a large enough total
     16uptime is when the tool is deployed across a large fleet of machines.
     17
     18Usage
     19-----
     20
     21To enable KFENCE, configure the kernel with::
     22
     23    CONFIG_KFENCE=y
     24
     25To build a kernel with KFENCE support, but disabled by default (to enable, set
     26``kfence.sample_interval`` to non-zero value), configure the kernel with::
     27
     28    CONFIG_KFENCE=y
     29    CONFIG_KFENCE_SAMPLE_INTERVAL=0
     30
     31KFENCE provides several other configuration options to customize behaviour (see
     32the respective help text in ``lib/Kconfig.kfence`` for more info).
     33
     34Tuning performance
     35~~~~~~~~~~~~~~~~~~
     36
     37The most important parameter is KFENCE's sample interval, which can be set via
     38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
     39sample interval determines the frequency with which heap allocations will be
     40guarded by KFENCE. The default is configurable via the Kconfig option
     41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
     42disables KFENCE.
     43
     44The sample interval controls a timer that sets up KFENCE allocations. By
     45default, to keep the real sample interval predictable, the normal timer also
     46causes CPU wake-ups when the system is completely idle. This may be undesirable
     47on power-constrained systems. The boot parameter ``kfence.deferrable=1``
     48instead switches to a "deferrable" timer which does not force CPU wake-ups on
     49idle systems, at the risk of unpredictable sample intervals. The default is
     50configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``.
     51
     52.. warning::
     53   The KUnit test suite is very likely to fail when using a deferrable timer
     54   since it currently causes very unpredictable sample intervals.
     55
     56The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
     57further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
     58255), the number of available guarded objects can be controlled. Each object
     59requires 2 pages, one for the object itself and the other one used as a guard
     60page; object pages are interleaved with guard pages, and every object page is
     61therefore surrounded by two guard pages.
     62
     63The total memory dedicated to the KFENCE memory pool can be computed as::
     64
     65    ( #objects + 1 ) * 2 * PAGE_SIZE
     66
     67Using the default config, and assuming a page size of 4 KiB, results in
     68dedicating 2 MiB to the KFENCE memory pool.
     69
     70Note: On architectures that support huge pages, KFENCE will ensure that the
     71pool is using pages of size ``PAGE_SIZE``. This will result in additional page
     72tables being allocated.
     73
     74Error reports
     75~~~~~~~~~~~~~
     76
     77A typical out-of-bounds access looks like this::
     78
     79    ==================================================================
     80    BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234
     81
     82    Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72):
     83     test_out_of_bounds_read+0xa6/0x234
     84     kunit_try_run_case+0x61/0xa0
     85     kunit_generic_run_threadfn_adapter+0x16/0x30
     86     kthread+0x176/0x1b0
     87     ret_from_fork+0x22/0x30
     88
     89    kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32
     90
     91    allocated by task 484 on cpu 0 at 32.919330s:
     92     test_alloc+0xfe/0x738
     93     test_out_of_bounds_read+0x9b/0x234
     94     kunit_try_run_case+0x61/0xa0
     95     kunit_generic_run_threadfn_adapter+0x16/0x30
     96     kthread+0x176/0x1b0
     97     ret_from_fork+0x22/0x30
     98
     99    CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
    100    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
    101    ==================================================================
    102
    103The header of the report provides a short summary of the function involved in
    104the access. It is followed by more detailed information about the access and
    105its origin. Note that, real kernel addresses are only shown when using the
    106kernel command line option ``no_hash_pointers``.
    107
    108Use-after-free accesses are reported as::
    109
    110    ==================================================================
    111    BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
    112
    113    Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79):
    114     test_use_after_free_read+0xb3/0x143
    115     kunit_try_run_case+0x61/0xa0
    116     kunit_generic_run_threadfn_adapter+0x16/0x30
    117     kthread+0x176/0x1b0
    118     ret_from_fork+0x22/0x30
    119
    120    kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32
    121
    122    allocated by task 488 on cpu 2 at 33.871326s:
    123     test_alloc+0xfe/0x738
    124     test_use_after_free_read+0x76/0x143
    125     kunit_try_run_case+0x61/0xa0
    126     kunit_generic_run_threadfn_adapter+0x16/0x30
    127     kthread+0x176/0x1b0
    128     ret_from_fork+0x22/0x30
    129
    130    freed by task 488 on cpu 2 at 33.871358s:
    131     test_use_after_free_read+0xa8/0x143
    132     kunit_try_run_case+0x61/0xa0
    133     kunit_generic_run_threadfn_adapter+0x16/0x30
    134     kthread+0x176/0x1b0
    135     ret_from_fork+0x22/0x30
    136
    137    CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G    B             5.13.0-rc3+ #7
    138    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
    139    ==================================================================
    140
    141KFENCE also reports on invalid frees, such as double-frees::
    142
    143    ==================================================================
    144    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
    145
    146    Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81):
    147     test_double_free+0xdc/0x171
    148     kunit_try_run_case+0x61/0xa0
    149     kunit_generic_run_threadfn_adapter+0x16/0x30
    150     kthread+0x176/0x1b0
    151     ret_from_fork+0x22/0x30
    152
    153    kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32
    154
    155    allocated by task 490 on cpu 1 at 34.175321s:
    156     test_alloc+0xfe/0x738
    157     test_double_free+0x76/0x171
    158     kunit_try_run_case+0x61/0xa0
    159     kunit_generic_run_threadfn_adapter+0x16/0x30
    160     kthread+0x176/0x1b0
    161     ret_from_fork+0x22/0x30
    162
    163    freed by task 490 on cpu 1 at 34.175348s:
    164     test_double_free+0xa8/0x171
    165     kunit_try_run_case+0x61/0xa0
    166     kunit_generic_run_threadfn_adapter+0x16/0x30
    167     kthread+0x176/0x1b0
    168     ret_from_fork+0x22/0x30
    169
    170    CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G    B             5.13.0-rc3+ #7
    171    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
    172    ==================================================================
    173
    174KFENCE also uses pattern-based redzones on the other side of an object's guard
    175page, to detect out-of-bounds writes on the unprotected side of the object.
    176These are reported on frees::
    177
    178    ==================================================================
    179    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
    180
    181    Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156):
    182     test_kmalloc_aligned_oob_write+0xef/0x184
    183     kunit_try_run_case+0x61/0xa0
    184     kunit_generic_run_threadfn_adapter+0x16/0x30
    185     kthread+0x176/0x1b0
    186     ret_from_fork+0x22/0x30
    187
    188    kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96
    189
    190    allocated by task 502 on cpu 7 at 42.159302s:
    191     test_alloc+0xfe/0x738
    192     test_kmalloc_aligned_oob_write+0x57/0x184
    193     kunit_try_run_case+0x61/0xa0
    194     kunit_generic_run_threadfn_adapter+0x16/0x30
    195     kthread+0x176/0x1b0
    196     ret_from_fork+0x22/0x30
    197
    198    CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G    B             5.13.0-rc3+ #7
    199    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
    200    ==================================================================
    201
    202For such errors, the address where the corruption occurred as well as the
    203invalidly written bytes (offset from the address) are shown; in this
    204representation, '.' denote untouched bytes. In the example above ``0xac`` is
    205the value written to the invalid address at offset 0, and the remaining '.'
    206denote that no following bytes have been touched. Note that, real values are
    207only shown if the kernel was booted with ``no_hash_pointers``; to avoid
    208information disclosure otherwise, '!' is used instead to denote invalidly
    209written bytes.
    210
    211And finally, KFENCE may also report on invalid accesses to any protected page
    212where it was not possible to determine an associated object, e.g. if adjacent
    213object pages had not yet been allocated::
    214
    215    ==================================================================
    216    BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
    217
    218    Invalid read at 0xffffffffb670b00a:
    219     test_invalid_access+0x26/0xe0
    220     kunit_try_run_case+0x51/0x85
    221     kunit_generic_run_threadfn_adapter+0x16/0x30
    222     kthread+0x137/0x160
    223     ret_from_fork+0x22/0x30
    224
    225    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
    226    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
    227    ==================================================================
    228
    229DebugFS interface
    230~~~~~~~~~~~~~~~~~
    231
    232Some debugging information is exposed via debugfs:
    233
    234* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
    235
    236* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
    237  allocated via KFENCE, including those already freed but protected.
    238
    239Implementation Details
    240----------------------
    241
    242Guarded allocations are set up based on the sample interval. After expiration
    243of the sample interval, the next allocation through the main allocator (SLAB or
    244SLUB) returns a guarded allocation from the KFENCE object pool (allocation
    245sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
    246the next allocation is set up after the expiration of the interval.
    247
    248When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated"
    249through the main allocator's fast-path by relying on static branches via the
    250static keys infrastructure. The static branch is toggled to redirect the
    251allocation to KFENCE. Depending on sample interval, target workloads, and
    252system architecture, this may perform better than the simple dynamic branch.
    253Careful benchmarking is recommended.
    254
    255KFENCE objects each reside on a dedicated page, at either the left or right
    256page boundaries selected at random. The pages to the left and right of the
    257object page are "guard pages", whose attributes are changed to a protected
    258state, and cause page faults on any attempted access. Such page faults are then
    259intercepted by KFENCE, which handles the fault gracefully by reporting an
    260out-of-bounds access, and marking the page as accessible so that the faulting
    261code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
    262
    263To detect out-of-bounds writes to memory within the object's page itself,
    264KFENCE also uses pattern-based redzones. For each object page, a redzone is set
    265up for all non-object memory. For typical alignments, the redzone is only
    266required on the unguarded side of an object. Because KFENCE must honor the
    267cache's requested alignment, special alignments may result in unprotected gaps
    268on either side of an object, all of which are redzoned.
    269
    270The following figure illustrates the page layout::
    271
    272    ---+-----------+-----------+-----------+-----------+-----------+---
    273       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
    274       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
    275       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
    276       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
    277       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
    278       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
    279    ---+-----------+-----------+-----------+-----------+-----------+---
    280
    281Upon deallocation of a KFENCE object, the object's page is again protected and
    282the object is marked as freed. Any further access to the object causes a fault
    283and KFENCE reports a use-after-free access. Freed objects are inserted at the
    284tail of KFENCE's freelist, so that the least recently freed objects are reused
    285first, and the chances of detecting use-after-frees of recently freed objects
    286is increased.
    287
    288If pool utilization reaches 75% (default) or above, to reduce the risk of the
    289pool eventually being fully occupied by allocated objects yet ensure diverse
    290coverage of allocations, KFENCE limits currently covered allocations of the
    291same source from further filling up the pool. The "source" of an allocation is
    292based on its partial allocation stack trace. A side-effect is that this also
    293limits frequent long-lived allocations (e.g. pagecache) of the same source
    294filling up the pool permanently, which is the most common risk for the pool
    295becoming full and the sampled allocation rate dropping to zero. The threshold
    296at which to start limiting currently covered allocations can be configured via
    297the boot parameter ``kfence.skip_covered_thresh`` (pool usage%).
    298
    299Interface
    300---------
    301
    302The following describes the functions which are used by allocators as well as
    303page handling code to set up and deal with KFENCE allocations.
    304
    305.. kernel-doc:: include/linux/kfence.h
    306   :functions: is_kfence_address
    307               kfence_shutdown_cache
    308               kfence_alloc kfence_free __kfence_free
    309               kfence_ksize kfence_object_start
    310               kfence_handle_page_fault
    311
    312Related Tools
    313-------------
    314
    315In userspace, a similar approach is taken by `GWP-ASan
    316<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
    317a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
    318directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
    319similar but non-sampling approach, that also inspired the name "KFENCE", can be
    320found in the userspace `Electric Fence Malloc Debugger
    321<https://linux.die.net/man/3/efence>`_.
    322
    323In the kernel, several tools exist to debug memory access errors, and in
    324particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
    325is more precise, relying on compiler instrumentation, this comes at a
    326performance cost.
    327
    328It is worth highlighting that KASAN and KFENCE are complementary, with
    329different target environments. For instance, KASAN is the better debugging-aid,
    330where test cases or reproducers exists: due to the lower chance to detect the
    331error, it would require more effort using KFENCE to debug. Deployments at scale
    332that cannot afford to enable KASAN, however, would benefit from using KFENCE to
    333discover bugs due to code paths not exercised by test cases or fuzzers.