cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

locktypes.rst (17159B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3.. _kernel_hacking_locktypes:
      4
      5==========================
      6Lock types and their rules
      7==========================
      8
      9Introduction
     10============
     11
     12The kernel provides a variety of locking primitives which can be divided
     13into three categories:
     14
     15 - Sleeping locks
     16 - CPU local locks
     17 - Spinning locks
     18
     19This document conceptually describes these lock types and provides rules
     20for their nesting, including the rules for use under PREEMPT_RT.
     21
     22
     23Lock categories
     24===============
     25
     26Sleeping locks
     27--------------
     28
     29Sleeping locks can only be acquired in preemptible task context.
     30
     31Although implementations allow try_lock() from other contexts, it is
     32necessary to carefully evaluate the safety of unlock() as well as of
     33try_lock().  Furthermore, it is also necessary to evaluate the debugging
     34versions of these primitives.  In short, don't acquire sleeping locks from
     35other contexts unless there is no other option.
     36
     37Sleeping lock types:
     38
     39 - mutex
     40 - rt_mutex
     41 - semaphore
     42 - rw_semaphore
     43 - ww_mutex
     44 - percpu_rw_semaphore
     45
     46On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
     47
     48 - local_lock
     49 - spinlock_t
     50 - rwlock_t
     51
     52
     53CPU local locks
     54---------------
     55
     56 - local_lock
     57
     58On non-PREEMPT_RT kernels, local_lock functions are wrappers around
     59preemption and interrupt disabling primitives. Contrary to other locking
     60mechanisms, disabling preemption or interrupts are pure CPU local
     61concurrency control mechanisms and not suited for inter-CPU concurrency
     62control.
     63
     64
     65Spinning locks
     66--------------
     67
     68 - raw_spinlock_t
     69 - bit spinlocks
     70
     71On non-PREEMPT_RT kernels, these lock types are also spinning locks:
     72
     73 - spinlock_t
     74 - rwlock_t
     75
     76Spinning locks implicitly disable preemption and the lock / unlock functions
     77can have suffixes which apply further protections:
     78
     79 ===================  ====================================================
     80 _bh()                Disable / enable bottom halves (soft interrupts)
     81 _irq()               Disable / enable interrupts
     82 _irqsave/restore()   Save and disable / restore interrupt disabled state
     83 ===================  ====================================================
     84
     85
     86Owner semantics
     87===============
     88
     89The aforementioned lock types except semaphores have strict owner
     90semantics:
     91
     92  The context (task) that acquired the lock must release it.
     93
     94rw_semaphores have a special interface which allows non-owner release for
     95readers.
     96
     97
     98rtmutex
     99=======
    100
    101RT-mutexes are mutexes with support for priority inheritance (PI).
    102
    103PI has limitations on non-PREEMPT_RT kernels due to preemption and
    104interrupt disabled sections.
    105
    106PI clearly cannot preempt preemption-disabled or interrupt-disabled
    107regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
    108execute most such regions of code in preemptible task context, especially
    109interrupt handlers and soft interrupts.  This conversion allows spinlock_t
    110and rwlock_t to be implemented via RT-mutexes.
    111
    112
    113semaphore
    114=========
    115
    116semaphore is a counting semaphore implementation.
    117
    118Semaphores are often used for both serialization and waiting, but new use
    119cases should instead use separate serialization and wait mechanisms, such
    120as mutexes and completions.
    121
    122semaphores and PREEMPT_RT
    123----------------------------
    124
    125PREEMPT_RT does not change the semaphore implementation because counting
    126semaphores have no concept of owners, thus preventing PREEMPT_RT from
    127providing priority inheritance for semaphores.  After all, an unknown
    128owner cannot be boosted. As a consequence, blocking on semaphores can
    129result in priority inversion.
    130
    131
    132rw_semaphore
    133============
    134
    135rw_semaphore is a multiple readers and single writer lock mechanism.
    136
    137On non-PREEMPT_RT kernels the implementation is fair, thus preventing
    138writer starvation.
    139
    140rw_semaphore complies by default with the strict owner semantics, but there
    141exist special-purpose interfaces that allow non-owner release for readers.
    142These interfaces work independent of the kernel configuration.
    143
    144rw_semaphore and PREEMPT_RT
    145---------------------------
    146
    147PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
    148implementation, thus changing the fairness:
    149
    150 Because an rw_semaphore writer cannot grant its priority to multiple
    151 readers, a preempted low-priority reader will continue holding its lock,
    152 thus starving even high-priority writers.  In contrast, because readers
    153 can grant their priority to a writer, a preempted low-priority writer will
    154 have its priority boosted until it releases the lock, thus preventing that
    155 writer from starving readers.
    156
    157
    158local_lock
    159==========
    160
    161local_lock provides a named scope to critical sections which are protected
    162by disabling preemption or interrupts.
    163
    164On non-PREEMPT_RT kernels local_lock operations map to the preemption and
    165interrupt disabling and enabling primitives:
    166
    167 ===============================  ======================
    168 local_lock(&llock)               preempt_disable()
    169 local_unlock(&llock)             preempt_enable()
    170 local_lock_irq(&llock)           local_irq_disable()
    171 local_unlock_irq(&llock)         local_irq_enable()
    172 local_lock_irqsave(&llock)       local_irq_save()
    173 local_unlock_irqrestore(&llock)  local_irq_restore()
    174 ===============================  ======================
    175
    176The named scope of local_lock has two advantages over the regular
    177primitives:
    178
    179  - The lock name allows static analysis and is also a clear documentation
    180    of the protection scope while the regular primitives are scopeless and
    181    opaque.
    182
    183  - If lockdep is enabled the local_lock gains a lockmap which allows to
    184    validate the correctness of the protection. This can detect cases where
    185    e.g. a function using preempt_disable() as protection mechanism is
    186    invoked from interrupt or soft-interrupt context. Aside of that
    187    lockdep_assert_held(&llock) works as with any other locking primitive.
    188
    189local_lock and PREEMPT_RT
    190-------------------------
    191
    192PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing
    193semantics:
    194
    195  - All spinlock_t changes also apply to local_lock.
    196
    197local_lock usage
    198----------------
    199
    200local_lock should be used in situations where disabling preemption or
    201interrupts is the appropriate form of concurrency control to protect
    202per-CPU data structures on a non PREEMPT_RT kernel.
    203
    204local_lock is not suitable to protect against preemption or interrupts on a
    205PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
    206
    207
    208raw_spinlock_t and spinlock_t
    209=============================
    210
    211raw_spinlock_t
    212--------------
    213
    214raw_spinlock_t is a strict spinning lock implementation in all kernels,
    215including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
    216core code, low-level interrupt handling and places where disabling
    217preemption or interrupts is required, for example, to safely access
    218hardware state.  raw_spinlock_t can sometimes also be used when the
    219critical section is tiny, thus avoiding RT-mutex overhead.
    220
    221spinlock_t
    222----------
    223
    224The semantics of spinlock_t change with the state of PREEMPT_RT.
    225
    226On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
    227exactly the same semantics.
    228
    229spinlock_t and PREEMPT_RT
    230-------------------------
    231
    232On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
    233based on rt_mutex which changes the semantics:
    234
    235 - Preemption is not disabled.
    236
    237 - The hard interrupt related suffixes for spin_lock / spin_unlock
    238   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
    239   interrupt disabled state.
    240
    241 - The soft interrupt related suffix (_bh()) still disables softirq
    242   handlers.
    243
    244   Non-PREEMPT_RT kernels disable preemption to get this effect.
    245
    246   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
    247   preemption enabled. The lock disables softirq handlers and also
    248   prevents reentrancy due to task preemption.
    249
    250PREEMPT_RT kernels preserve all other spinlock_t semantics:
    251
    252 - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
    253   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
    254   disable migration, which ensures that pointers to per-CPU variables
    255   remain valid even if the task is preempted.
    256
    257 - Task state is preserved across spinlock acquisition, ensuring that the
    258   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
    259   kernels leave task state untouched.  However, PREEMPT_RT must change
    260   task state if the task blocks during acquisition.  Therefore, it saves
    261   the current task state before blocking and the corresponding lock wakeup
    262   restores it, as shown below::
    263
    264    task->state = TASK_INTERRUPTIBLE
    265     lock()
    266       block()
    267         task->saved_state = task->state
    268	 task->state = TASK_UNINTERRUPTIBLE
    269	 schedule()
    270					lock wakeup
    271					  task->state = task->saved_state
    272
    273   Other types of wakeups would normally unconditionally set the task state
    274   to RUNNING, but that does not work here because the task must remain
    275   blocked until the lock becomes available.  Therefore, when a non-lock
    276   wakeup attempts to awaken a task blocked waiting for a spinlock, it
    277   instead sets the saved state to RUNNING.  Then, when the lock
    278   acquisition completes, the lock wakeup sets the task state to the saved
    279   state, in this case setting it to RUNNING::
    280
    281    task->state = TASK_INTERRUPTIBLE
    282     lock()
    283       block()
    284         task->saved_state = task->state
    285	 task->state = TASK_UNINTERRUPTIBLE
    286	 schedule()
    287					non lock wakeup
    288					  task->saved_state = TASK_RUNNING
    289
    290					lock wakeup
    291					  task->state = task->saved_state
    292
    293   This ensures that the real wakeup cannot be lost.
    294
    295
    296rwlock_t
    297========
    298
    299rwlock_t is a multiple readers and single writer lock mechanism.
    300
    301Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
    302suffix rules of spinlock_t apply accordingly. The implementation is fair,
    303thus preventing writer starvation.
    304
    305rwlock_t and PREEMPT_RT
    306-----------------------
    307
    308PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
    309implementation, thus changing semantics:
    310
    311 - All the spinlock_t changes also apply to rwlock_t.
    312
    313 - Because an rwlock_t writer cannot grant its priority to multiple
    314   readers, a preempted low-priority reader will continue holding its lock,
    315   thus starving even high-priority writers.  In contrast, because readers
    316   can grant their priority to a writer, a preempted low-priority writer
    317   will have its priority boosted until it releases the lock, thus
    318   preventing that writer from starving readers.
    319
    320
    321PREEMPT_RT caveats
    322==================
    323
    324local_lock on RT
    325----------------
    326
    327The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few
    328implications. For example, on a non-PREEMPT_RT kernel the following code
    329sequence works as expected::
    330
    331  local_lock_irq(&local_lock);
    332  raw_spin_lock(&lock);
    333
    334and is fully equivalent to::
    335
    336   raw_spin_lock_irq(&lock);
    337
    338On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()
    339is mapped to a per-CPU spinlock_t which neither disables interrupts nor
    340preemption. The following code sequence works perfectly correct on both
    341PREEMPT_RT and non-PREEMPT_RT kernels::
    342
    343  local_lock_irq(&local_lock);
    344  spin_lock(&lock);
    345
    346Another caveat with local locks is that each local_lock has a specific
    347protection scope. So the following substitution is wrong::
    348
    349  func1()
    350  {
    351    local_irq_save(flags);    -> local_lock_irqsave(&local_lock_1, flags);
    352    func3();
    353    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);
    354  }
    355
    356  func2()
    357  {
    358    local_irq_save(flags);    -> local_lock_irqsave(&local_lock_2, flags);
    359    func3();
    360    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);
    361  }
    362
    363  func3()
    364  {
    365    lockdep_assert_irqs_disabled();
    366    access_protected_data();
    367  }
    368
    369On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel
    370local_lock_1 and local_lock_2 are distinct and cannot serialize the callers
    371of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel
    372because local_lock_irqsave() does not disable interrupts due to the
    373PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
    374
    375  func1()
    376  {
    377    local_irq_save(flags);    -> local_lock_irqsave(&local_lock, flags);
    378    func3();
    379    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
    380  }
    381
    382  func2()
    383  {
    384    local_irq_save(flags);    -> local_lock_irqsave(&local_lock, flags);
    385    func3();
    386    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
    387  }
    388
    389  func3()
    390  {
    391    lockdep_assert_held(&local_lock);
    392    access_protected_data();
    393  }
    394
    395
    396spinlock_t and rwlock_t
    397-----------------------
    398
    399The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
    400have a few implications.  For example, on a non-PREEMPT_RT kernel the
    401following code sequence works as expected::
    402
    403   local_irq_disable();
    404   spin_lock(&lock);
    405
    406and is fully equivalent to::
    407
    408   spin_lock_irq(&lock);
    409
    410Same applies to rwlock_t and the _irqsave() suffix variants.
    411
    412On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
    413fully preemptible context.  Instead, use spin_lock_irq() or
    414spin_lock_irqsave() and their unlock counterparts.  In cases where the
    415interrupt disabling and locking must remain separate, PREEMPT_RT offers a
    416local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
    417allowing things like per-CPU interrupt disabled locks to be acquired.
    418However, this approach should be used only where absolutely necessary.
    419
    420A typical scenario is protection of per-CPU variables in thread context::
    421
    422  struct foo *p = get_cpu_ptr(&var1);
    423
    424  spin_lock(&p->lock);
    425  p->count += this_cpu_read(var2);
    426
    427This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel
    428this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does
    429not allow to acquire p->lock because get_cpu_ptr() implicitly disables
    430preemption. The following substitution works on both kernels::
    431
    432  struct foo *p;
    433
    434  migrate_disable();
    435  p = this_cpu_ptr(&var1);
    436  spin_lock(&p->lock);
    437  p->count += this_cpu_read(var2);
    438
    439migrate_disable() ensures that the task is pinned on the current CPU which
    440in turn guarantees that the per-CPU access to var1 and var2 are staying on
    441the same CPU while the task remains preemptible.
    442
    443The migrate_disable() substitution is not valid for the following
    444scenario::
    445
    446  func()
    447  {
    448    struct foo *p;
    449
    450    migrate_disable();
    451    p = this_cpu_ptr(&var1);
    452    p->val = func2();
    453
    454This breaks because migrate_disable() does not protect against reentrancy from
    455a preempting task. A correct substitution for this case is::
    456
    457  func()
    458  {
    459    struct foo *p;
    460
    461    local_lock(&foo_lock);
    462    p = this_cpu_ptr(&var1);
    463    p->val = func2();
    464
    465On a non-PREEMPT_RT kernel this protects against reentrancy by disabling
    466preemption. On a PREEMPT_RT kernel this is achieved by acquiring the
    467underlying per-CPU spinlock.
    468
    469
    470raw_spinlock_t on RT
    471--------------------
    472
    473Acquiring a raw_spinlock_t disables preemption and possibly also
    474interrupts, so the critical section must avoid acquiring a regular
    475spinlock_t or rwlock_t, for example, the critical section must avoid
    476allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
    477works perfectly::
    478
    479  raw_spin_lock(&lock);
    480  p = kmalloc(sizeof(*p), GFP_ATOMIC);
    481
    482But this code fails on PREEMPT_RT kernels because the memory allocator is
    483fully preemptible and therefore cannot be invoked from truly atomic
    484contexts.  However, it is perfectly fine to invoke the memory allocator
    485while holding normal non-raw spinlocks because they do not disable
    486preemption on PREEMPT_RT kernels::
    487
    488  spin_lock(&lock);
    489  p = kmalloc(sizeof(*p), GFP_ATOMIC);
    490
    491
    492bit spinlocks
    493-------------
    494
    495PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
    496small to accommodate an RT-mutex.  Therefore, the semantics of bit
    497spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
    498caveats also apply to bit spinlocks.
    499
    500Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
    501using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
    502usage-site changes are not needed for the spinlock_t substitution.
    503Instead, conditionals in header files and the core locking implemementation
    504enable the compiler to do the substitution transparently.
    505
    506
    507Lock type nesting rules
    508=======================
    509
    510The most basic rules are:
    511
    512  - Lock types of the same lock category (sleeping, CPU local, spinning)
    513    can nest arbitrarily as long as they respect the general lock ordering
    514    rules to prevent deadlocks.
    515
    516  - Sleeping lock types cannot nest inside CPU local and spinning lock types.
    517
    518  - CPU local and spinning lock types can nest inside sleeping lock types.
    519
    520  - Spinning lock types can nest inside all lock types
    521
    522These constraints apply both in PREEMPT_RT and otherwise.
    523
    524The fact that PREEMPT_RT changes the lock category of spinlock_t and
    525rwlock_t from spinning to sleeping and substitutes local_lock with a
    526per-CPU spinlock_t means that they cannot be acquired while holding a raw
    527spinlock.  This results in the following nesting ordering:
    528
    529  1) Sleeping locks
    530  2) spinlock_t, rwlock_t, local_lock
    531  3) raw_spinlock_t and bit spinlocks
    532
    533Lockdep will complain if these constraints are violated, both in
    534PREEMPT_RT and otherwise.