cachepc-qemu

Fork of AMDESE/qemu with changes for cachepc side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-qemu
Log | Files | Refs | Submodules | LICENSE | sfeed.txt

multiple-iothreads.txt (6668B)


      1Copyright (c) 2014-2017 Red Hat Inc.
      2
      3This work is licensed under the terms of the GNU GPL, version 2 or later.  See
      4the COPYING file in the top-level directory.
      5
      6
      7This document explains the IOThread feature and how to write code that runs
      8outside the QEMU global mutex.
      9
     10The main loop and IOThreads
     11---------------------------
     12QEMU is an event-driven program that can do several things at once using an
     13event loop.  The VNC server and the QMP monitor are both processed from the
     14same event loop, which monitors their file descriptors until they become
     15readable and then invokes a callback.
     16
     17The default event loop is called the main loop (see main-loop.c).  It is
     18possible to create additional event loop threads using -object
     19iothread,id=my-iothread.
     20
     21Side note: The main loop and IOThread are both event loops but their code is
     22not shared completely.  Sometimes it is useful to remember that although they
     23are conceptually similar they are currently not interchangeable.
     24
     25Why IOThreads are useful
     26------------------------
     27IOThreads allow the user to control the placement of work.  The main loop is a
     28scalability bottleneck on hosts with many CPUs.  Work can be spread across
     29several IOThreads instead of just one main loop.  When set up correctly this
     30can improve I/O latency and reduce jitter seen by the guest.
     31
     32The main loop is also deeply associated with the QEMU global mutex, which is a
     33scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
     34global mutex to serialize execution of QEMU code.  This mutex is necessary
     35because a lot of QEMU's code historically was not thread-safe.
     36
     37The fact that all I/O processing is done in a single main loop and that the
     38QEMU global mutex is contended by all vCPU threads and the main loop explain
     39why it is desirable to place work into IOThreads.
     40
     41The experimental virtio-blk data-plane implementation has been benchmarked and
     42shows these effects:
     43ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
     44
     45How to program for IOThreads
     46----------------------------
     47The main difference between legacy code and new code that can run in an
     48IOThread is dealing explicitly with the event loop object, AioContext
     49(see include/block/aio.h).  Code that only works in the main loop
     50implicitly uses the main loop's AioContext.  Code that supports running
     51in IOThreads must be aware of its AioContext.
     52
     53AioContext supports the following services:
     54 * File descriptor monitoring (read/write/error on POSIX hosts)
     55 * Event notifiers (inter-thread signalling)
     56 * Timers
     57 * Bottom Halves (BH) deferred callbacks
     58
     59There are several old APIs that use the main loop AioContext:
     60 * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
     61 * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
     62 * LEGACY timer_new_ms() - create a timer
     63 * LEGACY qemu_bh_new() - create a BH
     64 * LEGACY qemu_aio_wait() - run an event loop iteration
     65
     66Since they implicitly work on the main loop they cannot be used in code that
     67runs in an IOThread.  They might cause a crash or deadlock if called from an
     68IOThread since the QEMU global mutex is not held.
     69
     70Instead, use the AioContext functions directly (see include/block/aio.h):
     71 * aio_set_fd_handler() - monitor a file descriptor
     72 * aio_set_event_notifier() - monitor an event notifier
     73 * aio_timer_new() - create a timer
     74 * aio_bh_new() - create a BH
     75 * aio_poll() - run an event loop iteration
     76
     77The AioContext can be obtained from the IOThread using
     78iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
     79Code that takes an AioContext argument works both in IOThreads or the main
     80loop, depending on which AioContext instance the caller passes in.
     81
     82How to synchronize with an IOThread
     83-----------------------------------
     84AioContext is not thread-safe so some rules must be followed when using file
     85descriptors, event notifiers, timers, or BHs across threads:
     86
     871. AioContext functions can always be called safely.  They handle their
     88own locking internally.
     89
     902. Other threads wishing to access the AioContext must use
     91aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
     92context is acquired no other thread can access it or run event loop iterations
     93in this AioContext.
     94
     95Legacy code sometimes nests aio_context_acquire()/aio_context_release() calls.
     96Do not use nesting anymore, it is incompatible with the BDRV_POLL_WHILE() macro
     97used in the block layer and can lead to hangs.
     98
     99There is currently no lock ordering rule if a thread needs to acquire multiple
    100AioContexts simultaneously.  Therefore, it is only safe for code holding the
    101QEMU global mutex to acquire other AioContexts.
    102
    103Side note: the best way to schedule a function call across threads is to call
    104aio_bh_schedule_oneshot().  No acquire/release or locking is needed.
    105
    106AioContext and the block layer
    107------------------------------
    108The AioContext originates from the QEMU block layer, even though nowadays
    109AioContext is a generic event loop that can be used by any QEMU subsystem.
    110
    111The block layer has support for AioContext integrated.  Each BlockDriverState
    112is associated with an AioContext using bdrv_try_set_aio_context() and
    113bdrv_get_aio_context().  This allows block layer code to process I/O inside the
    114right AioContext.  Other subsystems may wish to follow a similar approach.
    115
    116Block layer code must therefore expect to run in an IOThread and avoid using
    117old APIs that implicitly use the main loop.  See the "How to program for
    118IOThreads" above for information on how to do that.
    119
    120If main loop code such as a QMP function wishes to access a BlockDriverState
    121it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
    122that callbacks in the IOThread do not run in parallel.
    123
    124Code running in the monitor typically needs to ensure that past
    125requests from the guest are completed.  When a block device is running
    126in an IOThread, the IOThread can also process requests from the guest
    127(via ioeventfd).  To achieve both objects, wrap the code between
    128bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
    129section".  The functions must be called between aio_context_acquire()
    130and aio_context_release().  You can freely release and re-acquire the
    131AioContext within a drained section.
    132
    133Long-running jobs (usually in the form of coroutines) are best scheduled in
    134the BlockDriverState's AioContext to avoid the need to acquire/release around
    135each bdrv_*() call.  The functions bdrv_add/remove_aio_context_notifier,
    136or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
    137can be used to get a notification whenever bdrv_try_set_aio_context() moves a
    138BlockDriverState to a different AioContext.