cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

l1tf.rst (25113B)


      1L1TF - L1 Terminal Fault
      2========================
      3
      4L1 Terminal Fault is a hardware vulnerability which allows unprivileged
      5speculative access to data which is available in the Level 1 Data Cache
      6when the page table entry controlling the virtual address, which is used
      7for the access, has the Present bit cleared or other reserved bits set.
      8
      9Affected processors
     10-------------------
     11
     12This vulnerability affects a wide range of Intel processors. The
     13vulnerability is not present on:
     14
     15   - Processors from AMD, Centaur and other non Intel vendors
     16
     17   - Older processor models, where the CPU family is < 6
     18
     19   - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
     20     Penwell, Pineview, Silvermont, Airmont, Merrifield)
     21
     22   - The Intel XEON PHI family
     23
     24   - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
     25     IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
     26     by the Meltdown vulnerability either. These CPUs should become
     27     available by end of 2018.
     28
     29Whether a processor is affected or not can be read out from the L1TF
     30vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
     31
     32Related CVEs
     33------------
     34
     35The following CVE entries are related to the L1TF vulnerability:
     36
     37   =============  =================  ==============================
     38   CVE-2018-3615  L1 Terminal Fault  SGX related aspects
     39   CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
     40   CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
     41   =============  =================  ==============================
     42
     43Problem
     44-------
     45
     46If an instruction accesses a virtual address for which the relevant page
     47table entry (PTE) has the Present bit cleared or other reserved bits set,
     48then speculative execution ignores the invalid PTE and loads the referenced
     49data if it is present in the Level 1 Data Cache, as if the page referenced
     50by the address bits in the PTE was still present and accessible.
     51
     52While this is a purely speculative mechanism and the instruction will raise
     53a page fault when it is retired eventually, the pure act of loading the
     54data and making it available to other speculative instructions opens up the
     55opportunity for side channel attacks to unprivileged malicious code,
     56similar to the Meltdown attack.
     57
     58While Meltdown breaks the user space to kernel space protection, L1TF
     59allows to attack any physical memory address in the system and the attack
     60works across all protection domains. It allows an attack of SGX and also
     61works from inside virtual machines because the speculation bypasses the
     62extended page table (EPT) protection mechanism.
     63
     64
     65Attack scenarios
     66----------------
     67
     681. Malicious user space
     69^^^^^^^^^^^^^^^^^^^^^^^
     70
     71   Operating Systems store arbitrary information in the address bits of a
     72   PTE which is marked non present. This allows a malicious user space
     73   application to attack the physical memory to which these PTEs resolve.
     74   In some cases user-space can maliciously influence the information
     75   encoded in the address bits of the PTE, thus making attacks more
     76   deterministic and more practical.
     77
     78   The Linux kernel contains a mitigation for this attack vector, PTE
     79   inversion, which is permanently enabled and has no performance
     80   impact. The kernel ensures that the address bits of PTEs, which are not
     81   marked present, never point to cacheable physical memory space.
     82
     83   A system with an up to date kernel is protected against attacks from
     84   malicious user space applications.
     85
     862. Malicious guest in a virtual machine
     87^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     88
     89   The fact that L1TF breaks all domain protections allows malicious guest
     90   OSes, which can control the PTEs directly, and malicious guest user
     91   space applications, which run on an unprotected guest kernel lacking the
     92   PTE inversion mitigation for L1TF, to attack physical host memory.
     93
     94   A special aspect of L1TF in the context of virtualization is symmetric
     95   multi threading (SMT). The Intel implementation of SMT is called
     96   HyperThreading. The fact that Hyperthreads on the affected processors
     97   share the L1 Data Cache (L1D) is important for this. As the flaw allows
     98   only to attack data which is present in L1D, a malicious guest running
     99   on one Hyperthread can attack the data which is brought into the L1D by
    100   the context which runs on the sibling Hyperthread of the same physical
    101   core. This context can be host OS, host user space or a different guest.
    102
    103   If the processor does not support Extended Page Tables, the attack is
    104   only possible, when the hypervisor does not sanitize the content of the
    105   effective (shadow) page tables.
    106
    107   While solutions exist to mitigate these attack vectors fully, these
    108   mitigations are not enabled by default in the Linux kernel because they
    109   can affect performance significantly. The kernel provides several
    110   mechanisms which can be utilized to address the problem depending on the
    111   deployment scenario. The mitigations, their protection scope and impact
    112   are described in the next sections.
    113
    114   The default mitigations and the rationale for choosing them are explained
    115   at the end of this document. See :ref:`default_mitigations`.
    116
    117.. _l1tf_sys_info:
    118
    119L1TF system information
    120-----------------------
    121
    122The Linux kernel provides a sysfs interface to enumerate the current L1TF
    123status of the system: whether the system is vulnerable, and which
    124mitigations are active. The relevant sysfs file is:
    125
    126/sys/devices/system/cpu/vulnerabilities/l1tf
    127
    128The possible values in this file are:
    129
    130  ===========================   ===============================
    131  'Not affected'		The processor is not vulnerable
    132  'Mitigation: PTE Inversion'	The host protection is active
    133  ===========================   ===============================
    134
    135If KVM/VMX is enabled and the processor is vulnerable then the following
    136information is appended to the 'Mitigation: PTE Inversion' part:
    137
    138  - SMT status:
    139
    140    =====================  ================
    141    'VMX: SMT vulnerable'  SMT is enabled
    142    'VMX: SMT disabled'    SMT is disabled
    143    =====================  ================
    144
    145  - L1D Flush mode:
    146
    147    ================================  ====================================
    148    'L1D vulnerable'		      L1D flushing is disabled
    149
    150    'L1D conditional cache flushes'   L1D flush is conditionally enabled
    151
    152    'L1D cache flushes'		      L1D flush is unconditionally enabled
    153    ================================  ====================================
    154
    155The resulting grade of protection is discussed in the following sections.
    156
    157
    158Host mitigation mechanism
    159-------------------------
    160
    161The kernel is unconditionally protected against L1TF attacks from malicious
    162user space running on the host.
    163
    164
    165Guest mitigation mechanisms
    166---------------------------
    167
    168.. _l1d_flush:
    169
    1701. L1D flush on VMENTER
    171^^^^^^^^^^^^^^^^^^^^^^^
    172
    173   To make sure that a guest cannot attack data which is present in the L1D
    174   the hypervisor flushes the L1D before entering the guest.
    175
    176   Flushing the L1D evicts not only the data which should not be accessed
    177   by a potentially malicious guest, it also flushes the guest
    178   data. Flushing the L1D has a performance impact as the processor has to
    179   bring the flushed guest data back into the L1D. Depending on the
    180   frequency of VMEXIT/VMENTER and the type of computations in the guest
    181   performance degradation in the range of 1% to 50% has been observed. For
    182   scenarios where guest VMEXIT/VMENTER are rare the performance impact is
    183   minimal. Virtio and mechanisms like posted interrupts are designed to
    184   confine the VMEXITs to a bare minimum, but specific configurations and
    185   application scenarios might still suffer from a high VMEXIT rate.
    186
    187   The kernel provides two L1D flush modes:
    188    - conditional ('cond')
    189    - unconditional ('always')
    190
    191   The conditional mode avoids L1D flushing after VMEXITs which execute
    192   only audited code paths before the corresponding VMENTER. These code
    193   paths have been verified that they cannot expose secrets or other
    194   interesting data to an attacker, but they can leak information about the
    195   address space layout of the hypervisor.
    196
    197   Unconditional mode flushes L1D on all VMENTER invocations and provides
    198   maximum protection. It has a higher overhead than the conditional
    199   mode. The overhead cannot be quantified correctly as it depends on the
    200   workload scenario and the resulting number of VMEXITs.
    201
    202   The general recommendation is to enable L1D flush on VMENTER. The kernel
    203   defaults to conditional mode on affected processors.
    204
    205   **Note**, that L1D flush does not prevent the SMT problem because the
    206   sibling thread will also bring back its data into the L1D which makes it
    207   attackable again.
    208
    209   L1D flush can be controlled by the administrator via the kernel command
    210   line and sysfs control files. See :ref:`mitigation_control_command_line`
    211   and :ref:`mitigation_control_kvm`.
    212
    213.. _guest_confinement:
    214
    2152. Guest VCPU confinement to dedicated physical cores
    216^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    217
    218   To address the SMT problem, it is possible to make a guest or a group of
    219   guests affine to one or more physical cores. The proper mechanism for
    220   that is to utilize exclusive cpusets to ensure that no other guest or
    221   host tasks can run on these cores.
    222
    223   If only a single guest or related guests run on sibling SMT threads on
    224   the same physical core then they can only attack their own memory and
    225   restricted parts of the host memory.
    226
    227   Host memory is attackable, when one of the sibling SMT threads runs in
    228   host OS (hypervisor) context and the other in guest context. The amount
    229   of valuable information from the host OS context depends on the context
    230   which the host OS executes, i.e. interrupts, soft interrupts and kernel
    231   threads. The amount of valuable data from these contexts cannot be
    232   declared as non-interesting for an attacker without deep inspection of
    233   the code.
    234
    235   **Note**, that assigning guests to a fixed set of physical cores affects
    236   the ability of the scheduler to do load balancing and might have
    237   negative effects on CPU utilization depending on the hosting
    238   scenario. Disabling SMT might be a viable alternative for particular
    239   scenarios.
    240
    241   For further information about confining guests to a single or to a group
    242   of cores consult the cpusets documentation:
    243
    244   https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
    245
    246.. _interrupt_isolation:
    247
    2483. Interrupt affinity
    249^^^^^^^^^^^^^^^^^^^^^
    250
    251   Interrupts can be made affine to logical CPUs. This is not universally
    252   true because there are types of interrupts which are truly per CPU
    253   interrupts, e.g. the local timer interrupt. Aside of that multi queue
    254   devices affine their interrupts to single CPUs or groups of CPUs per
    255   queue without allowing the administrator to control the affinities.
    256
    257   Moving the interrupts, which can be affinity controlled, away from CPUs
    258   which run untrusted guests, reduces the attack vector space.
    259
    260   Whether the interrupts with are affine to CPUs, which run untrusted
    261   guests, provide interesting data for an attacker depends on the system
    262   configuration and the scenarios which run on the system. While for some
    263   of the interrupts it can be assumed that they won't expose interesting
    264   information beyond exposing hints about the host OS memory layout, there
    265   is no way to make general assumptions.
    266
    267   Interrupt affinity can be controlled by the administrator via the
    268   /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
    269   available at:
    270
    271   https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
    272
    273.. _smt_control:
    274
    2754. SMT control
    276^^^^^^^^^^^^^^
    277
    278   To prevent the SMT issues of L1TF it might be necessary to disable SMT
    279   completely. Disabling SMT can have a significant performance impact, but
    280   the impact depends on the hosting scenario and the type of workloads.
    281   The impact of disabling SMT needs also to be weighted against the impact
    282   of other mitigation solutions like confining guests to dedicated cores.
    283
    284   The kernel provides a sysfs interface to retrieve the status of SMT and
    285   to control it. It also provides a kernel command line interface to
    286   control SMT.
    287
    288   The kernel command line interface consists of the following options:
    289
    290     =========== ==========================================================
    291     nosmt	 Affects the bring up of the secondary CPUs during boot. The
    292		 kernel tries to bring all present CPUs online during the
    293		 boot process. "nosmt" makes sure that from each physical
    294		 core only one - the so called primary (hyper) thread is
    295		 activated. Due to a design flaw of Intel processors related
    296		 to Machine Check Exceptions the non primary siblings have
    297		 to be brought up at least partially and are then shut down
    298		 again.  "nosmt" can be undone via the sysfs interface.
    299
    300     nosmt=force Has the same effect as "nosmt" but it does not allow to
    301		 undo the SMT disable via the sysfs interface.
    302     =========== ==========================================================
    303
    304   The sysfs interface provides two files:
    305
    306   - /sys/devices/system/cpu/smt/control
    307   - /sys/devices/system/cpu/smt/active
    308
    309   /sys/devices/system/cpu/smt/control:
    310
    311     This file allows to read out the SMT control state and provides the
    312     ability to disable or (re)enable SMT. The possible states are:
    313
    314	==============  ===================================================
    315	on		SMT is supported by the CPU and enabled. All
    316			logical CPUs can be onlined and offlined without
    317			restrictions.
    318
    319	off		SMT is supported by the CPU and disabled. Only
    320			the so called primary SMT threads can be onlined
    321			and offlined without restrictions. An attempt to
    322			online a non-primary sibling is rejected
    323
    324	forceoff	Same as 'off' but the state cannot be controlled.
    325			Attempts to write to the control file are rejected.
    326
    327	notsupported	The processor does not support SMT. It's therefore
    328			not affected by the SMT implications of L1TF.
    329			Attempts to write to the control file are rejected.
    330	==============  ===================================================
    331
    332     The possible states which can be written into this file to control SMT
    333     state are:
    334
    335     - on
    336     - off
    337     - forceoff
    338
    339   /sys/devices/system/cpu/smt/active:
    340
    341     This file reports whether SMT is enabled and active, i.e. if on any
    342     physical core two or more sibling threads are online.
    343
    344   SMT control is also possible at boot time via the l1tf kernel command
    345   line parameter in combination with L1D flush control. See
    346   :ref:`mitigation_control_command_line`.
    347
    3485. Disabling EPT
    349^^^^^^^^^^^^^^^^
    350
    351  Disabling EPT for virtual machines provides full mitigation for L1TF even
    352  with SMT enabled, because the effective page tables for guests are
    353  managed and sanitized by the hypervisor. Though disabling EPT has a
    354  significant performance impact especially when the Meltdown mitigation
    355  KPTI is enabled.
    356
    357  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
    358
    359There is ongoing research and development for new mitigation mechanisms to
    360address the performance impact of disabling SMT or EPT.
    361
    362.. _mitigation_control_command_line:
    363
    364Mitigation control on the kernel command line
    365---------------------------------------------
    366
    367The kernel command line allows to control the L1TF mitigations at boot
    368time with the option "l1tf=". The valid arguments for this option are:
    369
    370  ============  =============================================================
    371  full		Provides all available mitigations for the L1TF
    372		vulnerability. Disables SMT and enables all mitigations in
    373		the hypervisors, i.e. unconditional L1D flushing
    374
    375		SMT control and L1D flush control via the sysfs interface
    376		is still possible after boot.  Hypervisors will issue a
    377		warning when the first VM is started in a potentially
    378		insecure configuration, i.e. SMT enabled or L1D flush
    379		disabled.
    380
    381  full,force	Same as 'full', but disables SMT and L1D flush runtime
    382		control. Implies the 'nosmt=force' command line option.
    383		(i.e. sysfs control of SMT is disabled.)
    384
    385  flush		Leaves SMT enabled and enables the default hypervisor
    386		mitigation, i.e. conditional L1D flushing
    387
    388		SMT control and L1D flush control via the sysfs interface
    389		is still possible after boot.  Hypervisors will issue a
    390		warning when the first VM is started in a potentially
    391		insecure configuration, i.e. SMT enabled or L1D flush
    392		disabled.
    393
    394  flush,nosmt	Disables SMT and enables the default hypervisor mitigation,
    395		i.e. conditional L1D flushing.
    396
    397		SMT control and L1D flush control via the sysfs interface
    398		is still possible after boot.  Hypervisors will issue a
    399		warning when the first VM is started in a potentially
    400		insecure configuration, i.e. SMT enabled or L1D flush
    401		disabled.
    402
    403  flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is
    404		started in a potentially insecure configuration.
    405
    406  off		Disables hypervisor mitigations and doesn't emit any
    407		warnings.
    408		It also drops the swap size and available RAM limit restrictions
    409		on both hypervisor and bare metal.
    410
    411  ============  =============================================================
    412
    413The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
    414
    415
    416.. _mitigation_control_kvm:
    417
    418Mitigation control for KVM - module parameter
    419-------------------------------------------------------------
    420
    421The KVM hypervisor mitigation mechanism, flushing the L1D cache when
    422entering a guest, can be controlled with a module parameter.
    423
    424The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
    425following arguments:
    426
    427  ============  ==============================================================
    428  always	L1D cache flush on every VMENTER.
    429
    430  cond		Flush L1D on VMENTER only when the code between VMEXIT and
    431		VMENTER can leak host memory which is considered
    432		interesting for an attacker. This still can leak host memory
    433		which allows e.g. to determine the hosts address space layout.
    434
    435  never		Disables the mitigation
    436  ============  ==============================================================
    437
    438The parameter can be provided on the kernel command line, as a module
    439parameter when loading the modules and at runtime modified via the sysfs
    440file:
    441
    442/sys/module/kvm_intel/parameters/vmentry_l1d_flush
    443
    444The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
    445line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
    446module parameter is ignored and writes to the sysfs file are rejected.
    447
    448.. _mitigation_selection:
    449
    450Mitigation selection guide
    451--------------------------
    452
    4531. No virtualization in use
    454^^^^^^^^^^^^^^^^^^^^^^^^^^^
    455
    456   The system is protected by the kernel unconditionally and no further
    457   action is required.
    458
    4592. Virtualization with trusted guests
    460^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    461
    462   If the guest comes from a trusted source and the guest OS kernel is
    463   guaranteed to have the L1TF mitigations in place the system is fully
    464   protected against L1TF and no further action is required.
    465
    466   To avoid the overhead of the default L1D flushing on VMENTER the
    467   administrator can disable the flushing via the kernel command line and
    468   sysfs control files. See :ref:`mitigation_control_command_line` and
    469   :ref:`mitigation_control_kvm`.
    470
    471
    4723. Virtualization with untrusted guests
    473^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    474
    4753.1. SMT not supported or disabled
    476""""""""""""""""""""""""""""""""""
    477
    478  If SMT is not supported by the processor or disabled in the BIOS or by
    479  the kernel, it's only required to enforce L1D flushing on VMENTER.
    480
    481  Conditional L1D flushing is the default behaviour and can be tuned. See
    482  :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
    483
    4843.2. EPT not supported or disabled
    485""""""""""""""""""""""""""""""""""
    486
    487  If EPT is not supported by the processor or disabled in the hypervisor,
    488  the system is fully protected. SMT can stay enabled and L1D flushing on
    489  VMENTER is not required.
    490
    491  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
    492
    4933.3. SMT and EPT supported and active
    494"""""""""""""""""""""""""""""""""""""
    495
    496  If SMT and EPT are supported and active then various degrees of
    497  mitigations can be employed:
    498
    499  - L1D flushing on VMENTER:
    500
    501    L1D flushing on VMENTER is the minimal protection requirement, but it
    502    is only potent in combination with other mitigation methods.
    503
    504    Conditional L1D flushing is the default behaviour and can be tuned. See
    505    :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
    506
    507  - Guest confinement:
    508
    509    Confinement of guests to a single or a group of physical cores which
    510    are not running any other processes, can reduce the attack surface
    511    significantly, but interrupts, soft interrupts and kernel threads can
    512    still expose valuable data to a potential attacker. See
    513    :ref:`guest_confinement`.
    514
    515  - Interrupt isolation:
    516
    517    Isolating the guest CPUs from interrupts can reduce the attack surface
    518    further, but still allows a malicious guest to explore a limited amount
    519    of host physical memory. This can at least be used to gain knowledge
    520    about the host address space layout. The interrupts which have a fixed
    521    affinity to the CPUs which run the untrusted guests can depending on
    522    the scenario still trigger soft interrupts and schedule kernel threads
    523    which might expose valuable information. See
    524    :ref:`interrupt_isolation`.
    525
    526The above three mitigation methods combined can provide protection to a
    527certain degree, but the risk of the remaining attack surface has to be
    528carefully analyzed. For full protection the following methods are
    529available:
    530
    531  - Disabling SMT:
    532
    533    Disabling SMT and enforcing the L1D flushing provides the maximum
    534    amount of protection. This mitigation is not depending on any of the
    535    above mitigation methods.
    536
    537    SMT control and L1D flushing can be tuned by the command line
    538    parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
    539    time with the matching sysfs control files. See :ref:`smt_control`,
    540    :ref:`mitigation_control_command_line` and
    541    :ref:`mitigation_control_kvm`.
    542
    543  - Disabling EPT:
    544
    545    Disabling EPT provides the maximum amount of protection as well. It is
    546    not depending on any of the above mitigation methods. SMT can stay
    547    enabled and L1D flushing is not required, but the performance impact is
    548    significant.
    549
    550    EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
    551    parameter.
    552
    5533.4. Nested virtual machines
    554""""""""""""""""""""""""""""
    555
    556When nested virtualization is in use, three operating systems are involved:
    557the bare metal hypervisor, the nested hypervisor and the nested virtual
    558machine.  VMENTER operations from the nested hypervisor into the nested
    559guest will always be processed by the bare metal hypervisor. If KVM is the
    560bare metal hypervisor it will:
    561
    562 - Flush the L1D cache on every switch from the nested hypervisor to the
    563   nested virtual machine, so that the nested hypervisor's secrets are not
    564   exposed to the nested virtual machine;
    565
    566 - Flush the L1D cache on every switch from the nested virtual machine to
    567   the nested hypervisor; this is a complex operation, and flushing the L1D
    568   cache avoids that the bare metal hypervisor's secrets are exposed to the
    569   nested virtual machine;
    570
    571 - Instruct the nested hypervisor to not perform any L1D cache flush. This
    572   is an optimization to avoid double L1D flushing.
    573
    574
    575.. _default_mitigations:
    576
    577Default mitigations
    578-------------------
    579
    580  The kernel default mitigations for vulnerable processors are:
    581
    582  - PTE inversion to protect against malicious user space. This is done
    583    unconditionally and cannot be controlled. The swap storage is limited
    584    to ~16TB.
    585
    586  - L1D conditional flushing on VMENTER when EPT is enabled for
    587    a guest.
    588
    589  The kernel does not by default enforce the disabling of SMT, which leaves
    590  SMT systems vulnerable when running untrusted guests with EPT enabled.
    591
    592  The rationale for this choice is:
    593
    594  - Force disabling SMT can break existing setups, especially with
    595    unattended updates.
    596
    597  - If regular users run untrusted guests on their machine, then L1TF is
    598    just an add on to other malware which might be embedded in an untrusted
    599    guest, e.g. spam-bots or attacks on the local network.
    600
    601    There is no technical way to prevent a user from running untrusted code
    602    on their machines blindly.
    603
    604  - It's technically extremely unlikely and from today's knowledge even
    605    impossible that L1TF can be exploited via the most popular attack
    606    mechanisms like JavaScript because these mechanisms have no way to
    607    control PTEs. If this would be possible and not other mitigation would
    608    be possible, then the default might be different.
    609
    610  - The administrators of cloud and hosting setups have to carefully
    611    analyze the risk for their scenarios and make the appropriate
    612    mitigation choices, which might even vary across their deployed
    613    machines and also result in other changes of their overall setup.
    614    There is no way for the kernel to provide a sensible default for this
    615    kind of scenarios.