cachepc-qemu

Fork of AMDESE/qemu with changes for cachepc side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-qemu
Log | Files | Refs | Submodules | LICENSE | sfeed.txt

virtiofsd.rst (11423B)


      1QEMU virtio-fs shared file system daemon
      2========================================
      3
      4Synopsis
      5--------
      6
      7**virtiofsd** [*OPTIONS*]
      8
      9Description
     10-----------
     11
     12Share a host directory tree with a guest through a virtio-fs device.  This
     13program is a vhost-user backend that implements the virtio-fs device.  Each
     14virtio-fs device instance requires its own virtiofsd process.
     15
     16This program is designed to work with QEMU's ``--device vhost-user-fs-pci``
     17but should work with any virtual machine monitor (VMM) that supports
     18vhost-user.  See the Examples section below.
     19
     20This program must be run as the root user.  The program drops privileges where
     21possible during startup although it must be able to create and access files
     22with any uid/gid:
     23
     24* The ability to invoke syscalls is limited using seccomp(2).
     25* Linux capabilities(7) are dropped.
     26
     27In "namespace" sandbox mode the program switches into a new file system
     28namespace and invokes pivot_root(2) to make the shared directory tree its root.
     29A new pid and net namespace is also created to isolate the process.
     30
     31In "chroot" sandbox mode the program invokes chroot(2) to make the shared
     32directory tree its root. This mode is intended for container environments where
     33the container runtime has already set up the namespaces and the program does
     34not have permission to create namespaces itself.
     35
     36Both sandbox modes prevent "file system escapes" due to symlinks and other file
     37system objects that might lead to files outside the shared directory.
     38
     39Options
     40-------
     41
     42.. program:: virtiofsd
     43
     44.. option:: -h, --help
     45
     46  Print help.
     47
     48.. option:: -V, --version
     49
     50  Print version.
     51
     52.. option:: -d
     53
     54  Enable debug output.
     55
     56.. option:: --syslog
     57
     58  Print log messages to syslog instead of stderr.
     59
     60.. option:: -o OPTION
     61
     62  * debug -
     63    Enable debug output.
     64
     65  * flock|no_flock -
     66    Enable/disable flock.  The default is ``no_flock``.
     67
     68  * modcaps=CAPLIST
     69    Modify the list of capabilities allowed; CAPLIST is a colon separated
     70    list of capabilities, each preceded by either + or -, e.g.
     71    ''+sys_admin:-chown''.
     72
     73  * log_level=LEVEL -
     74    Print only log messages matching LEVEL or more severe.  LEVEL is one of
     75    ``err``, ``warn``, ``info``, or ``debug``.  The default is ``info``.
     76
     77  * posix_lock|no_posix_lock -
     78    Enable/disable remote POSIX locks.  The default is ``no_posix_lock``.
     79
     80  * readdirplus|no_readdirplus -
     81    Enable/disable readdirplus.  The default is ``readdirplus``.
     82
     83  * sandbox=namespace|chroot -
     84    Sandbox mode:
     85    - namespace: Create mount, pid, and net namespaces and pivot_root(2) into
     86    the shared directory.
     87    - chroot: chroot(2) into shared directory (use in containers).
     88    The default is "namespace".
     89
     90  * source=PATH -
     91    Share host directory tree located at PATH.  This option is required.
     92
     93  * timeout=TIMEOUT -
     94    I/O timeout in seconds.  The default depends on cache= option.
     95
     96  * writeback|no_writeback -
     97    Enable/disable writeback cache. The cache allows the FUSE client to buffer
     98    and merge write requests.  The default is ``no_writeback``.
     99
    100  * xattr|no_xattr -
    101    Enable/disable extended attributes (xattr) on files and directories.  The
    102    default is ``no_xattr``.
    103
    104  * posix_acl|no_posix_acl -
    105    Enable/disable posix acl support.  Posix ACLs are disabled by default.
    106
    107.. option:: --socket-path=PATH
    108
    109  Listen on vhost-user UNIX domain socket at PATH.
    110
    111.. option:: --socket-group=GROUP
    112
    113  Set the vhost-user UNIX domain socket gid to GROUP.
    114
    115.. option:: --fd=FDNUM
    116
    117  Accept connections from vhost-user UNIX domain socket file descriptor FDNUM.
    118  The file descriptor must already be listening for connections.
    119
    120.. option:: --thread-pool-size=NUM
    121
    122  Restrict the number of worker threads per request queue to NUM.  The default
    123  is 64.
    124
    125.. option:: --cache=none|auto|always
    126
    127  Select the desired trade-off between coherency and performance.  ``none``
    128  forbids the FUSE client from caching to achieve best coherency at the cost of
    129  performance.  ``auto`` acts similar to NFS with a 1 second metadata cache
    130  timeout.  ``always`` sets a long cache lifetime at the expense of coherency.
    131  The default is ``auto``.
    132
    133Extended attribute (xattr) mapping
    134----------------------------------
    135
    136By default the name of xattr's used by the client are passed through to the server
    137file system.  This can be a problem where either those xattr names are used
    138by something on the server (e.g. selinux client/server confusion) or if the
    139virtiofsd is running in a container with restricted privileges where it cannot
    140access some attributes.
    141
    142Mapping syntax
    143~~~~~~~~~~~~~~
    144
    145A mapping of xattr names can be made using -o xattrmap=mapping where the ``mapping``
    146string consists of a series of rules.
    147
    148The first matching rule terminates the mapping.
    149The set of rules must include a terminating rule to match any remaining attributes
    150at the end.
    151
    152Each rule consists of a number of fields separated with a separator that is the
    153first non-white space character in the rule.  This separator must then be used
    154for the whole rule.
    155White space may be added before and after each rule.
    156
    157Using ':' as the separator a rule is of the form:
    158
    159``:type:scope:key:prepend:``
    160
    161**scope** is:
    162
    163- 'client' - match 'key' against a xattr name from the client for
    164             setxattr/getxattr/removexattr
    165- 'server' - match 'prepend' against a xattr name from the server
    166             for listxattr
    167- 'all' - can be used to make a single rule where both the server
    168          and client matches are triggered.
    169
    170**type** is one of:
    171
    172- 'prefix' - is designed to prepend and strip a prefix;  the modified
    173  attributes then being passed on to the client/server.
    174
    175- 'ok' - Causes the rule set to be terminated when a match is found
    176  while allowing matching xattr's through unchanged.
    177  It is intended both as a way of explicitly terminating
    178  the list of rules, and to allow some xattr's to skip following rules.
    179
    180- 'bad' - If a client tries to use a name matching 'key' it's
    181  denied using EPERM; when the server passes an attribute
    182  name matching 'prepend' it's hidden.  In many ways it's use is very like
    183  'ok' as either an explicit terminator or for special handling of certain
    184  patterns.
    185
    186**key** is a string tested as a prefix on an attribute name originating
    187on the client.  It maybe empty in which case a 'client' rule
    188will always match on client names.
    189
    190**prepend** is a string tested as a prefix on an attribute name originating
    191on the server, and used as a new prefix.  It may be empty
    192in which case a 'server' rule will always match on all names from
    193the server.
    194
    195e.g.:
    196
    197  ``:prefix:client:trusted.:user.virtiofs.:``
    198
    199  will match 'trusted.' attributes in client calls and prefix them before
    200  passing them to the server.
    201
    202  ``:prefix:server::user.virtiofs.:``
    203
    204  will strip 'user.virtiofs.' from all server replies.
    205
    206  ``:prefix:all:trusted.:user.virtiofs.:``
    207
    208  combines the previous two cases into a single rule.
    209
    210  ``:ok:client:user.::``
    211
    212  will allow get/set xattr for 'user.' xattr's and ignore
    213  following rules.
    214
    215  ``:ok:server::security.:``
    216
    217  will pass 'securty.' xattr's in listxattr from the server
    218  and ignore following rules.
    219
    220  ``:ok:all:::``
    221
    222  will terminate the rule search passing any remaining attributes
    223  in both directions.
    224
    225  ``:bad:server::security.:``
    226
    227  would hide 'security.' xattr's in listxattr from the server.
    228
    229A simpler 'map' type provides a shorter syntax for the common case:
    230
    231``:map:key:prepend:``
    232
    233The 'map' type adds a number of separate rules to add **prepend** as a prefix
    234to the matched **key** (or all attributes if **key** is empty).
    235There may be at most one 'map' rule and it must be the last rule in the set.
    236
    237Note: When the 'security.capability' xattr is remapped, the daemon has to do
    238extra work to remove it during many operations, which the host kernel normally
    239does itself.
    240
    241Security considerations
    242~~~~~~~~~~~~~~~~~~~~~~~
    243
    244Operating systems typically partition the xattr namespace using
    245well defined name prefixes. Each partition may have different
    246access controls applied. For example, on Linux there are multiple
    247partitions
    248
    249 * ``system.*`` - access varies depending on attribute & filesystem
    250 * ``security.*`` - only processes with CAP_SYS_ADMIN
    251 * ``trusted.*`` - only processes with CAP_SYS_ADMIN
    252 * ``user.*`` - any process granted by file permissions / ownership
    253
    254While other OS such as FreeBSD have different name prefixes
    255and access control rules.
    256
    257When remapping attributes on the host, it is important to
    258ensure that the remapping does not allow a guest user to
    259evade the guest access control rules.
    260
    261Consider if ``trusted.*`` from the guest was remapped to
    262``user.virtiofs.trusted*`` in the host. An unprivileged
    263user in a Linux guest has the ability to write to xattrs
    264under ``user.*``. Thus the user can evade the access
    265control restriction on ``trusted.*`` by instead writing
    266to ``user.virtiofs.trusted.*``.
    267
    268As noted above, the partitions used and access controls
    269applied, will vary across guest OS, so it is not wise to
    270try to predict what the guest OS will use.
    271
    272The simplest way to avoid an insecure configuration is
    273to remap all xattrs at once, to a given fixed prefix.
    274This is shown in example (1) below.
    275
    276If selectively mapping only a subset of xattr prefixes,
    277then rules must be added to explicitly block direct
    278access to the target of the remapping. This is shown
    279in example (2) below.
    280
    281Mapping examples
    282~~~~~~~~~~~~~~~~
    283
    2841) Prefix all attributes with 'user.virtiofs.'
    285
    286::
    287
    288 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
    289
    290
    291This uses two rules, using : as the field separator;
    292the first rule prefixes and strips 'user.virtiofs.',
    293the second rule hides any non-prefixed attributes that
    294the host set.
    295
    296This is equivalent to the 'map' rule:
    297
    298::
    299
    300 -o xattrmap=":map::user.virtiofs.:"
    301
    3022) Prefix 'trusted.' attributes, allow others through
    303
    304::
    305
    306   "/prefix/all/trusted./user.virtiofs./
    307    /bad/server//trusted./
    308    /bad/client/user.virtiofs.//
    309    /ok/all///"
    310
    311
    312Here there are four rules, using / as the field
    313separator, and also demonstrating that new lines can
    314be included between rules.
    315The first rule is the prefixing of 'trusted.' and
    316stripping of 'user.virtiofs.'.
    317The second rule hides unprefixed 'trusted.' attributes
    318on the host.
    319The third rule stops a guest from explicitly setting
    320the 'user.virtiofs.' path directly to prevent access
    321control bypass on the target of the earlier prefix
    322remapping.
    323Finally, the fourth rule lets all remaining attributes
    324through.
    325
    326This is equivalent to the 'map' rule:
    327
    328::
    329
    330 -o xattrmap="/map/trusted./user.virtiofs./"
    331
    3323) Hide 'security.' attributes, and allow everything else
    333
    334::
    335
    336    "/bad/all/security./security./
    337     /ok/all///'
    338
    339The first rule combines what could be separate client and server
    340rules into a single 'all' rule, matching 'security.' in either
    341client arguments or lists returned from the host.  This stops
    342the client seeing any 'security.' attributes on the server and
    343stops it setting any.
    344
    345Examples
    346--------
    347
    348Export ``/var/lib/fs/vm001/`` on vhost-user UNIX domain socket
    349``/var/run/vm001-vhost-fs.sock``:
    350
    351.. parsed-literal::
    352
    353  host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
    354  host# |qemu_system| \\
    355        -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \\
    356        -device vhost-user-fs-pci,chardev=char0,tag=myfs \\
    357        -object memory-backend-memfd,id=mem,size=4G,share=on \\
    358        -numa node,memdev=mem \\
    359        ...
    360  guest# mount -t virtiofs myfs /mnt