cachepc-qemu

Fork of AMDESE/qemu with changes for cachepc side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-qemu
Log | Files | Refs | Submodules | LICENSE | sfeed.txt

lockcnt.txt (10015B)


      1DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt)
      2===================================================
      3
      4QEMU often uses reference counts to track data structures that are being
      5accessed and should not be freed.  For example, a loop that invoke
      6callbacks like this is not safe:
      7
      8    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
      9        if (ioh->revents & G_IO_OUT) {
     10            ioh->fd_write(ioh->opaque);
     11        }
     12    }
     13
     14QLIST_FOREACH_SAFE protects against deletion of the current node (ioh)
     15by stashing away its "next" pointer.  However, ioh->fd_write could
     16actually delete the next node from the list.  The simplest way to
     17avoid this is to mark the node as deleted, and remove it from the
     18list in the above loop:
     19
     20    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
     21        if (ioh->deleted) {
     22            QLIST_REMOVE(ioh, next);
     23            g_free(ioh);
     24        } else {
     25            if (ioh->revents & G_IO_OUT) {
     26                ioh->fd_write(ioh->opaque);
     27            }
     28        }
     29    }
     30
     31If however this loop must also be reentrant, i.e. it is possible that
     32ioh->fd_write invokes the loop again, some kind of counting is needed:
     33
     34    walking_handlers++;
     35    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
     36        if (ioh->deleted) {
     37            if (walking_handlers == 1) {
     38                QLIST_REMOVE(ioh, next);
     39                g_free(ioh);
     40            }
     41        } else {
     42            if (ioh->revents & G_IO_OUT) {
     43                ioh->fd_write(ioh->opaque);
     44            }
     45        }
     46    }
     47    walking_handlers--;
     48
     49One may think of using the RCU primitives, rcu_read_lock() and
     50rcu_read_unlock(); effectively, the RCU nesting count would take
     51the place of the walking_handlers global variable.  Indeed,
     52reference counting and RCU have similar purposes, but their usage in
     53general is complementary:
     54
     55- reference counting is fine-grained and limited to a single data
     56  structure; RCU delays reclamation of *all* RCU-protected data
     57  structures;
     58
     59- reference counting works even in the presence of code that keeps
     60  a reference for a long time; RCU critical sections in principle
     61  should be kept short;
     62
     63- reference counting is often applied to code that is not thread-safe
     64  but is reentrant; in fact, usage of reference counting in QEMU predates
     65  the introduction of threads by many years.  RCU is generally used to
     66  protect readers from other threads freeing memory after concurrent
     67  modifications to a data structure.
     68
     69- reclaiming data can be done by a separate thread in the case of RCU;
     70  this can improve performance, but also delay reclamation undesirably.
     71  With reference counting, reclamation is deterministic.
     72
     73This file documents QemuLockCnt, an abstraction for using reference
     74counting in code that has to be both thread-safe and reentrant.
     75
     76
     77QemuLockCnt concepts
     78--------------------
     79
     80A QemuLockCnt comprises both a counter and a mutex; it has primitives
     81to increment and decrement the counter, and to take and release the
     82mutex.  The counter notes how many visits to the data structures are
     83taking place (the visits could be from different threads, or there could
     84be multiple reentrant visits from the same thread).  The basic rules
     85governing the counter/mutex pair then are the following:
     86
     87- Data protected by the QemuLockCnt must not be freed unless the
     88  counter is zero and the mutex is taken.
     89
     90- A new visit cannot be started while the counter is zero and the
     91  mutex is taken.
     92
     93Most of the time, the mutex protects all writes to the data structure,
     94not just frees, though there could be cases where this is not necessary.
     95
     96Reads, instead, can be done without taking the mutex, as long as the
     97readers and writers use the same macros that are used for RCU, for
     98example qatomic_rcu_read, qatomic_rcu_set, QLIST_FOREACH_RCU, etc.  This is
     99because the reads are done outside a lock and a set or QLIST_INSERT_HEAD
    100can happen concurrently with the read.  The RCU API ensures that the
    101processor and the compiler see all required memory barriers.
    102
    103This could be implemented simply by protecting the counter with the
    104mutex, for example:
    105
    106    // (1)
    107    qemu_mutex_lock(&walking_handlers_mutex);
    108    walking_handlers++;
    109    qemu_mutex_unlock(&walking_handlers_mutex);
    110
    111    ...
    112
    113    // (2)
    114    qemu_mutex_lock(&walking_handlers_mutex);
    115    if (--walking_handlers == 0) {
    116        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
    117            if (ioh->deleted) {
    118                QLIST_REMOVE(ioh, next);
    119                g_free(ioh);
    120            }
    121        }
    122    }
    123    qemu_mutex_unlock(&walking_handlers_mutex);
    124
    125Here, no frees can happen in the code represented by the ellipsis.
    126If another thread is executing critical section (2), that part of
    127the code cannot be entered, because the thread will not be able
    128to increment the walking_handlers variable.  And of course
    129during the visit any other thread will see a nonzero value for
    130walking_handlers, as in the single-threaded code.
    131
    132Note that it is possible for multiple concurrent accesses to delay
    133the cleanup arbitrarily; in other words, for the walking_handlers
    134counter to never become zero.  For this reason, this technique is
    135more easily applicable if concurrent access to the structure is rare.
    136
    137However, critical sections are easy to forget since you have to do
    138them for each modification of the counter.  QemuLockCnt ensures that
    139all modifications of the counter take the lock appropriately, and it
    140can also be more efficient in two ways:
    141
    142- it avoids taking the lock for many operations (for example
    143  incrementing the counter while it is non-zero);
    144
    145- on some platforms, one can implement QemuLockCnt to hold the lock
    146  and the mutex in a single word, making the fast path no more expensive
    147  than simply managing a counter using atomic operations (see
    148  docs/devel/atomics.rst).  This can be very helpful if concurrent access to
    149  the data structure is expected to be rare.
    150
    151
    152Using the same mutex for frees and writes can still incur some small
    153inefficiencies; for example, a visit can never start if the counter is
    154zero and the mutex is taken---even if the mutex is taken by a write,
    155which in principle need not block a visit of the data structure.
    156However, these are usually not a problem if any of the following
    157assumptions are valid:
    158
    159- concurrent access is possible but rare
    160
    161- writes are rare
    162
    163- writes are frequent, but this kind of write (e.g. appending to a
    164  list) has a very small critical section.
    165
    166For example, QEMU uses QemuLockCnt to manage an AioContext's list of
    167bottom halves and file descriptor handlers.  Modifications to the list
    168of file descriptor handlers are rare.  Creation of a new bottom half is
    169frequent and can happen on a fast path; however: 1) it is almost never
    170concurrent with a visit to the list of bottom halves; 2) it only has
    171three instructions in the critical path, two assignments and a smp_wmb().
    172
    173
    174QemuLockCnt API
    175---------------
    176
    177The QemuLockCnt API is described in include/qemu/thread.h.
    178
    179
    180QemuLockCnt usage
    181-----------------
    182
    183This section explains the typical usage patterns for QemuLockCnt functions.
    184
    185Setting a variable to a non-NULL value can be done between
    186qemu_lockcnt_lock and qemu_lockcnt_unlock:
    187
    188    qemu_lockcnt_lock(&xyz_lockcnt);
    189    if (!xyz) {
    190        new_xyz = g_new(XYZ, 1);
    191        ...
    192        qatomic_rcu_set(&xyz, new_xyz);
    193    }
    194    qemu_lockcnt_unlock(&xyz_lockcnt);
    195
    196Accessing the value can be done between qemu_lockcnt_inc and
    197qemu_lockcnt_dec:
    198
    199    qemu_lockcnt_inc(&xyz_lockcnt);
    200    if (xyz) {
    201        XYZ *p = qatomic_rcu_read(&xyz);
    202        ...
    203        /* Accesses can now be done through "p".  */
    204    }
    205    qemu_lockcnt_dec(&xyz_lockcnt);
    206
    207Freeing the object can similarly use qemu_lockcnt_lock and
    208qemu_lockcnt_unlock, but you also need to ensure that the count
    209is zero (i.e. there is no concurrent visit).  Because qemu_lockcnt_inc
    210takes the QemuLockCnt's lock, the count cannot become non-zero while
    211the object is being freed.  Freeing an object looks like this:
    212
    213    qemu_lockcnt_lock(&xyz_lockcnt);
    214    if (!qemu_lockcnt_count(&xyz_lockcnt)) {
    215        g_free(xyz);
    216        xyz = NULL;
    217    }
    218    qemu_lockcnt_unlock(&xyz_lockcnt);
    219
    220If an object has to be freed right after a visit, you can combine
    221the decrement, the locking and the check on count as follows:
    222
    223    qemu_lockcnt_inc(&xyz_lockcnt);
    224    if (xyz) {
    225        XYZ *p = qatomic_rcu_read(&xyz);
    226        ...
    227        /* Accesses can now be done through "p".  */
    228    }
    229    if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
    230        g_free(xyz);
    231        xyz = NULL;
    232        qemu_lockcnt_unlock(&xyz_lockcnt);
    233    }
    234
    235QemuLockCnt can also be used to access a list as follows:
    236
    237    qemu_lockcnt_inc(&io_handlers_lockcnt);
    238    QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
    239        if (ioh->revents & G_IO_OUT) {
    240            ioh->fd_write(ioh->opaque);
    241        }
    242    }
    243
    244    if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
    245        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
    246            if (ioh->deleted) {
    247                QLIST_REMOVE(ioh, next);
    248                g_free(ioh);
    249            }
    250        }
    251        qemu_lockcnt_unlock(&io_handlers_lockcnt);
    252    }
    253
    254Again, the RCU primitives are used because new items can be added to the
    255list during the walk.  QLIST_FOREACH_RCU ensures that the processor and
    256the compiler see the appropriate memory barriers.
    257
    258An alternative pattern uses qemu_lockcnt_dec_if_lock:
    259
    260    qemu_lockcnt_inc(&io_handlers_lockcnt);
    261    QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
    262        if (ioh->deleted) {
    263            if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
    264                QLIST_REMOVE(ioh, next);
    265                g_free(ioh);
    266                qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
    267            }
    268        } else {
    269            if (ioh->revents & G_IO_OUT) {
    270                ioh->fd_write(ioh->opaque);
    271            }
    272        }
    273    }
    274    qemu_lockcnt_dec(&io_handlers_lockcnt);
    275
    276Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock,
    277because there is no special task to do if the count goes from 1 to 0.