kfence.rst (14960B)
1.. SPDX-License-Identifier: GPL-2.0 2.. Copyright (C) 2020, Google LLC. 3 4Kernel Electric-Fence (KFENCE) 5============================== 6 7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety 8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and 9invalid-free errors. 10 11KFENCE is designed to be enabled in production kernels, and has near zero 12performance overhead. Compared to KASAN, KFENCE trades performance for 13precision. The main motivation behind KFENCE's design, is that with enough 14total uptime KFENCE will detect bugs in code paths not typically exercised by 15non-production test workloads. One way to quickly achieve a large enough total 16uptime is when the tool is deployed across a large fleet of machines. 17 18Usage 19----- 20 21To enable KFENCE, configure the kernel with:: 22 23 CONFIG_KFENCE=y 24 25To build a kernel with KFENCE support, but disabled by default (to enable, set 26``kfence.sample_interval`` to non-zero value), configure the kernel with:: 27 28 CONFIG_KFENCE=y 29 CONFIG_KFENCE_SAMPLE_INTERVAL=0 30 31KFENCE provides several other configuration options to customize behaviour (see 32the respective help text in ``lib/Kconfig.kfence`` for more info). 33 34Tuning performance 35~~~~~~~~~~~~~~~~~~ 36 37The most important parameter is KFENCE's sample interval, which can be set via 38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The 39sample interval determines the frequency with which heap allocations will be 40guarded by KFENCE. The default is configurable via the Kconfig option 41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` 42disables KFENCE. 43 44The sample interval controls a timer that sets up KFENCE allocations. By 45default, to keep the real sample interval predictable, the normal timer also 46causes CPU wake-ups when the system is completely idle. This may be undesirable 47on power-constrained systems. The boot parameter ``kfence.deferrable=1`` 48instead switches to a "deferrable" timer which does not force CPU wake-ups on 49idle systems, at the risk of unpredictable sample intervals. The default is 50configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``. 51 52.. warning:: 53 The KUnit test suite is very likely to fail when using a deferrable timer 54 since it currently causes very unpredictable sample intervals. 55 56The KFENCE memory pool is of fixed size, and if the pool is exhausted, no 57further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 58255), the number of available guarded objects can be controlled. Each object 59requires 2 pages, one for the object itself and the other one used as a guard 60page; object pages are interleaved with guard pages, and every object page is 61therefore surrounded by two guard pages. 62 63The total memory dedicated to the KFENCE memory pool can be computed as:: 64 65 ( #objects + 1 ) * 2 * PAGE_SIZE 66 67Using the default config, and assuming a page size of 4 KiB, results in 68dedicating 2 MiB to the KFENCE memory pool. 69 70Note: On architectures that support huge pages, KFENCE will ensure that the 71pool is using pages of size ``PAGE_SIZE``. This will result in additional page 72tables being allocated. 73 74Error reports 75~~~~~~~~~~~~~ 76 77A typical out-of-bounds access looks like this:: 78 79 ================================================================== 80 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234 81 82 Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72): 83 test_out_of_bounds_read+0xa6/0x234 84 kunit_try_run_case+0x61/0xa0 85 kunit_generic_run_threadfn_adapter+0x16/0x30 86 kthread+0x176/0x1b0 87 ret_from_fork+0x22/0x30 88 89 kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32 90 91 allocated by task 484 on cpu 0 at 32.919330s: 92 test_alloc+0xfe/0x738 93 test_out_of_bounds_read+0x9b/0x234 94 kunit_try_run_case+0x61/0xa0 95 kunit_generic_run_threadfn_adapter+0x16/0x30 96 kthread+0x176/0x1b0 97 ret_from_fork+0x22/0x30 98 99 CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7 100 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 101 ================================================================== 102 103The header of the report provides a short summary of the function involved in 104the access. It is followed by more detailed information about the access and 105its origin. Note that, real kernel addresses are only shown when using the 106kernel command line option ``no_hash_pointers``. 107 108Use-after-free accesses are reported as:: 109 110 ================================================================== 111 BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 112 113 Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79): 114 test_use_after_free_read+0xb3/0x143 115 kunit_try_run_case+0x61/0xa0 116 kunit_generic_run_threadfn_adapter+0x16/0x30 117 kthread+0x176/0x1b0 118 ret_from_fork+0x22/0x30 119 120 kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32 121 122 allocated by task 488 on cpu 2 at 33.871326s: 123 test_alloc+0xfe/0x738 124 test_use_after_free_read+0x76/0x143 125 kunit_try_run_case+0x61/0xa0 126 kunit_generic_run_threadfn_adapter+0x16/0x30 127 kthread+0x176/0x1b0 128 ret_from_fork+0x22/0x30 129 130 freed by task 488 on cpu 2 at 33.871358s: 131 test_use_after_free_read+0xa8/0x143 132 kunit_try_run_case+0x61/0xa0 133 kunit_generic_run_threadfn_adapter+0x16/0x30 134 kthread+0x176/0x1b0 135 ret_from_fork+0x22/0x30 136 137 CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 138 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 139 ================================================================== 140 141KFENCE also reports on invalid frees, such as double-frees:: 142 143 ================================================================== 144 BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 145 146 Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81): 147 test_double_free+0xdc/0x171 148 kunit_try_run_case+0x61/0xa0 149 kunit_generic_run_threadfn_adapter+0x16/0x30 150 kthread+0x176/0x1b0 151 ret_from_fork+0x22/0x30 152 153 kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32 154 155 allocated by task 490 on cpu 1 at 34.175321s: 156 test_alloc+0xfe/0x738 157 test_double_free+0x76/0x171 158 kunit_try_run_case+0x61/0xa0 159 kunit_generic_run_threadfn_adapter+0x16/0x30 160 kthread+0x176/0x1b0 161 ret_from_fork+0x22/0x30 162 163 freed by task 490 on cpu 1 at 34.175348s: 164 test_double_free+0xa8/0x171 165 kunit_try_run_case+0x61/0xa0 166 kunit_generic_run_threadfn_adapter+0x16/0x30 167 kthread+0x176/0x1b0 168 ret_from_fork+0x22/0x30 169 170 CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 171 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 172 ================================================================== 173 174KFENCE also uses pattern-based redzones on the other side of an object's guard 175page, to detect out-of-bounds writes on the unprotected side of the object. 176These are reported on frees:: 177 178 ================================================================== 179 BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 180 181 Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156): 182 test_kmalloc_aligned_oob_write+0xef/0x184 183 kunit_try_run_case+0x61/0xa0 184 kunit_generic_run_threadfn_adapter+0x16/0x30 185 kthread+0x176/0x1b0 186 ret_from_fork+0x22/0x30 187 188 kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96 189 190 allocated by task 502 on cpu 7 at 42.159302s: 191 test_alloc+0xfe/0x738 192 test_kmalloc_aligned_oob_write+0x57/0x184 193 kunit_try_run_case+0x61/0xa0 194 kunit_generic_run_threadfn_adapter+0x16/0x30 195 kthread+0x176/0x1b0 196 ret_from_fork+0x22/0x30 197 198 CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 199 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 200 ================================================================== 201 202For such errors, the address where the corruption occurred as well as the 203invalidly written bytes (offset from the address) are shown; in this 204representation, '.' denote untouched bytes. In the example above ``0xac`` is 205the value written to the invalid address at offset 0, and the remaining '.' 206denote that no following bytes have been touched. Note that, real values are 207only shown if the kernel was booted with ``no_hash_pointers``; to avoid 208information disclosure otherwise, '!' is used instead to denote invalidly 209written bytes. 210 211And finally, KFENCE may also report on invalid accesses to any protected page 212where it was not possible to determine an associated object, e.g. if adjacent 213object pages had not yet been allocated:: 214 215 ================================================================== 216 BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0 217 218 Invalid read at 0xffffffffb670b00a: 219 test_invalid_access+0x26/0xe0 220 kunit_try_run_case+0x51/0x85 221 kunit_generic_run_threadfn_adapter+0x16/0x30 222 kthread+0x137/0x160 223 ret_from_fork+0x22/0x30 224 225 CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 226 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 227 ================================================================== 228 229DebugFS interface 230~~~~~~~~~~~~~~~~~ 231 232Some debugging information is exposed via debugfs: 233 234* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. 235 236* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects 237 allocated via KFENCE, including those already freed but protected. 238 239Implementation Details 240---------------------- 241 242Guarded allocations are set up based on the sample interval. After expiration 243of the sample interval, the next allocation through the main allocator (SLAB or 244SLUB) returns a guarded allocation from the KFENCE object pool (allocation 245sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and 246the next allocation is set up after the expiration of the interval. 247 248When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated" 249through the main allocator's fast-path by relying on static branches via the 250static keys infrastructure. The static branch is toggled to redirect the 251allocation to KFENCE. Depending on sample interval, target workloads, and 252system architecture, this may perform better than the simple dynamic branch. 253Careful benchmarking is recommended. 254 255KFENCE objects each reside on a dedicated page, at either the left or right 256page boundaries selected at random. The pages to the left and right of the 257object page are "guard pages", whose attributes are changed to a protected 258state, and cause page faults on any attempted access. Such page faults are then 259intercepted by KFENCE, which handles the fault gracefully by reporting an 260out-of-bounds access, and marking the page as accessible so that the faulting 261code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). 262 263To detect out-of-bounds writes to memory within the object's page itself, 264KFENCE also uses pattern-based redzones. For each object page, a redzone is set 265up for all non-object memory. For typical alignments, the redzone is only 266required on the unguarded side of an object. Because KFENCE must honor the 267cache's requested alignment, special alignments may result in unprotected gaps 268on either side of an object, all of which are redzoned. 269 270The following figure illustrates the page layout:: 271 272 ---+-----------+-----------+-----------+-----------+-----------+--- 273 | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | 274 | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | 275 | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | 276 | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | 277 | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | 278 | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | 279 ---+-----------+-----------+-----------+-----------+-----------+--- 280 281Upon deallocation of a KFENCE object, the object's page is again protected and 282the object is marked as freed. Any further access to the object causes a fault 283and KFENCE reports a use-after-free access. Freed objects are inserted at the 284tail of KFENCE's freelist, so that the least recently freed objects are reused 285first, and the chances of detecting use-after-frees of recently freed objects 286is increased. 287 288If pool utilization reaches 75% (default) or above, to reduce the risk of the 289pool eventually being fully occupied by allocated objects yet ensure diverse 290coverage of allocations, KFENCE limits currently covered allocations of the 291same source from further filling up the pool. The "source" of an allocation is 292based on its partial allocation stack trace. A side-effect is that this also 293limits frequent long-lived allocations (e.g. pagecache) of the same source 294filling up the pool permanently, which is the most common risk for the pool 295becoming full and the sampled allocation rate dropping to zero. The threshold 296at which to start limiting currently covered allocations can be configured via 297the boot parameter ``kfence.skip_covered_thresh`` (pool usage%). 298 299Interface 300--------- 301 302The following describes the functions which are used by allocators as well as 303page handling code to set up and deal with KFENCE allocations. 304 305.. kernel-doc:: include/linux/kfence.h 306 :functions: is_kfence_address 307 kfence_shutdown_cache 308 kfence_alloc kfence_free __kfence_free 309 kfence_ksize kfence_object_start 310 kfence_handle_page_fault 311 312Related Tools 313------------- 314 315In userspace, a similar approach is taken by `GWP-ASan 316<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and 317a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is 318directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another 319similar but non-sampling approach, that also inspired the name "KFENCE", can be 320found in the userspace `Electric Fence Malloc Debugger 321<https://linux.die.net/man/3/efence>`_. 322 323In the kernel, several tools exist to debug memory access errors, and in 324particular KASAN can detect all bug classes that KFENCE can detect. While KASAN 325is more precise, relying on compiler instrumentation, this comes at a 326performance cost. 327 328It is worth highlighting that KASAN and KFENCE are complementary, with 329different target environments. For instance, KASAN is the better debugging-aid, 330where test cases or reproducers exists: due to the lower chance to detect the 331error, it would require more effort using KFENCE to debug. Deployments at scale 332that cannot afford to enable KASAN, however, would benefit from using KFENCE to 333discover bugs due to code paths not exercised by test cases or fuzzers.