virtiofsd.rst (11423B)
1QEMU virtio-fs shared file system daemon 2======================================== 3 4Synopsis 5-------- 6 7**virtiofsd** [*OPTIONS*] 8 9Description 10----------- 11 12Share a host directory tree with a guest through a virtio-fs device. This 13program is a vhost-user backend that implements the virtio-fs device. Each 14virtio-fs device instance requires its own virtiofsd process. 15 16This program is designed to work with QEMU's ``--device vhost-user-fs-pci`` 17but should work with any virtual machine monitor (VMM) that supports 18vhost-user. See the Examples section below. 19 20This program must be run as the root user. The program drops privileges where 21possible during startup although it must be able to create and access files 22with any uid/gid: 23 24* The ability to invoke syscalls is limited using seccomp(2). 25* Linux capabilities(7) are dropped. 26 27In "namespace" sandbox mode the program switches into a new file system 28namespace and invokes pivot_root(2) to make the shared directory tree its root. 29A new pid and net namespace is also created to isolate the process. 30 31In "chroot" sandbox mode the program invokes chroot(2) to make the shared 32directory tree its root. This mode is intended for container environments where 33the container runtime has already set up the namespaces and the program does 34not have permission to create namespaces itself. 35 36Both sandbox modes prevent "file system escapes" due to symlinks and other file 37system objects that might lead to files outside the shared directory. 38 39Options 40------- 41 42.. program:: virtiofsd 43 44.. option:: -h, --help 45 46 Print help. 47 48.. option:: -V, --version 49 50 Print version. 51 52.. option:: -d 53 54 Enable debug output. 55 56.. option:: --syslog 57 58 Print log messages to syslog instead of stderr. 59 60.. option:: -o OPTION 61 62 * debug - 63 Enable debug output. 64 65 * flock|no_flock - 66 Enable/disable flock. The default is ``no_flock``. 67 68 * modcaps=CAPLIST 69 Modify the list of capabilities allowed; CAPLIST is a colon separated 70 list of capabilities, each preceded by either + or -, e.g. 71 ''+sys_admin:-chown''. 72 73 * log_level=LEVEL - 74 Print only log messages matching LEVEL or more severe. LEVEL is one of 75 ``err``, ``warn``, ``info``, or ``debug``. The default is ``info``. 76 77 * posix_lock|no_posix_lock - 78 Enable/disable remote POSIX locks. The default is ``no_posix_lock``. 79 80 * readdirplus|no_readdirplus - 81 Enable/disable readdirplus. The default is ``readdirplus``. 82 83 * sandbox=namespace|chroot - 84 Sandbox mode: 85 - namespace: Create mount, pid, and net namespaces and pivot_root(2) into 86 the shared directory. 87 - chroot: chroot(2) into shared directory (use in containers). 88 The default is "namespace". 89 90 * source=PATH - 91 Share host directory tree located at PATH. This option is required. 92 93 * timeout=TIMEOUT - 94 I/O timeout in seconds. The default depends on cache= option. 95 96 * writeback|no_writeback - 97 Enable/disable writeback cache. The cache allows the FUSE client to buffer 98 and merge write requests. The default is ``no_writeback``. 99 100 * xattr|no_xattr - 101 Enable/disable extended attributes (xattr) on files and directories. The 102 default is ``no_xattr``. 103 104 * posix_acl|no_posix_acl - 105 Enable/disable posix acl support. Posix ACLs are disabled by default. 106 107.. option:: --socket-path=PATH 108 109 Listen on vhost-user UNIX domain socket at PATH. 110 111.. option:: --socket-group=GROUP 112 113 Set the vhost-user UNIX domain socket gid to GROUP. 114 115.. option:: --fd=FDNUM 116 117 Accept connections from vhost-user UNIX domain socket file descriptor FDNUM. 118 The file descriptor must already be listening for connections. 119 120.. option:: --thread-pool-size=NUM 121 122 Restrict the number of worker threads per request queue to NUM. The default 123 is 64. 124 125.. option:: --cache=none|auto|always 126 127 Select the desired trade-off between coherency and performance. ``none`` 128 forbids the FUSE client from caching to achieve best coherency at the cost of 129 performance. ``auto`` acts similar to NFS with a 1 second metadata cache 130 timeout. ``always`` sets a long cache lifetime at the expense of coherency. 131 The default is ``auto``. 132 133Extended attribute (xattr) mapping 134---------------------------------- 135 136By default the name of xattr's used by the client are passed through to the server 137file system. This can be a problem where either those xattr names are used 138by something on the server (e.g. selinux client/server confusion) or if the 139virtiofsd is running in a container with restricted privileges where it cannot 140access some attributes. 141 142Mapping syntax 143~~~~~~~~~~~~~~ 144 145A mapping of xattr names can be made using -o xattrmap=mapping where the ``mapping`` 146string consists of a series of rules. 147 148The first matching rule terminates the mapping. 149The set of rules must include a terminating rule to match any remaining attributes 150at the end. 151 152Each rule consists of a number of fields separated with a separator that is the 153first non-white space character in the rule. This separator must then be used 154for the whole rule. 155White space may be added before and after each rule. 156 157Using ':' as the separator a rule is of the form: 158 159``:type:scope:key:prepend:`` 160 161**scope** is: 162 163- 'client' - match 'key' against a xattr name from the client for 164 setxattr/getxattr/removexattr 165- 'server' - match 'prepend' against a xattr name from the server 166 for listxattr 167- 'all' - can be used to make a single rule where both the server 168 and client matches are triggered. 169 170**type** is one of: 171 172- 'prefix' - is designed to prepend and strip a prefix; the modified 173 attributes then being passed on to the client/server. 174 175- 'ok' - Causes the rule set to be terminated when a match is found 176 while allowing matching xattr's through unchanged. 177 It is intended both as a way of explicitly terminating 178 the list of rules, and to allow some xattr's to skip following rules. 179 180- 'bad' - If a client tries to use a name matching 'key' it's 181 denied using EPERM; when the server passes an attribute 182 name matching 'prepend' it's hidden. In many ways it's use is very like 183 'ok' as either an explicit terminator or for special handling of certain 184 patterns. 185 186**key** is a string tested as a prefix on an attribute name originating 187on the client. It maybe empty in which case a 'client' rule 188will always match on client names. 189 190**prepend** is a string tested as a prefix on an attribute name originating 191on the server, and used as a new prefix. It may be empty 192in which case a 'server' rule will always match on all names from 193the server. 194 195e.g.: 196 197 ``:prefix:client:trusted.:user.virtiofs.:`` 198 199 will match 'trusted.' attributes in client calls and prefix them before 200 passing them to the server. 201 202 ``:prefix:server::user.virtiofs.:`` 203 204 will strip 'user.virtiofs.' from all server replies. 205 206 ``:prefix:all:trusted.:user.virtiofs.:`` 207 208 combines the previous two cases into a single rule. 209 210 ``:ok:client:user.::`` 211 212 will allow get/set xattr for 'user.' xattr's and ignore 213 following rules. 214 215 ``:ok:server::security.:`` 216 217 will pass 'securty.' xattr's in listxattr from the server 218 and ignore following rules. 219 220 ``:ok:all:::`` 221 222 will terminate the rule search passing any remaining attributes 223 in both directions. 224 225 ``:bad:server::security.:`` 226 227 would hide 'security.' xattr's in listxattr from the server. 228 229A simpler 'map' type provides a shorter syntax for the common case: 230 231``:map:key:prepend:`` 232 233The 'map' type adds a number of separate rules to add **prepend** as a prefix 234to the matched **key** (or all attributes if **key** is empty). 235There may be at most one 'map' rule and it must be the last rule in the set. 236 237Note: When the 'security.capability' xattr is remapped, the daemon has to do 238extra work to remove it during many operations, which the host kernel normally 239does itself. 240 241Security considerations 242~~~~~~~~~~~~~~~~~~~~~~~ 243 244Operating systems typically partition the xattr namespace using 245well defined name prefixes. Each partition may have different 246access controls applied. For example, on Linux there are multiple 247partitions 248 249 * ``system.*`` - access varies depending on attribute & filesystem 250 * ``security.*`` - only processes with CAP_SYS_ADMIN 251 * ``trusted.*`` - only processes with CAP_SYS_ADMIN 252 * ``user.*`` - any process granted by file permissions / ownership 253 254While other OS such as FreeBSD have different name prefixes 255and access control rules. 256 257When remapping attributes on the host, it is important to 258ensure that the remapping does not allow a guest user to 259evade the guest access control rules. 260 261Consider if ``trusted.*`` from the guest was remapped to 262``user.virtiofs.trusted*`` in the host. An unprivileged 263user in a Linux guest has the ability to write to xattrs 264under ``user.*``. Thus the user can evade the access 265control restriction on ``trusted.*`` by instead writing 266to ``user.virtiofs.trusted.*``. 267 268As noted above, the partitions used and access controls 269applied, will vary across guest OS, so it is not wise to 270try to predict what the guest OS will use. 271 272The simplest way to avoid an insecure configuration is 273to remap all xattrs at once, to a given fixed prefix. 274This is shown in example (1) below. 275 276If selectively mapping only a subset of xattr prefixes, 277then rules must be added to explicitly block direct 278access to the target of the remapping. This is shown 279in example (2) below. 280 281Mapping examples 282~~~~~~~~~~~~~~~~ 283 2841) Prefix all attributes with 'user.virtiofs.' 285 286:: 287 288 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::" 289 290 291This uses two rules, using : as the field separator; 292the first rule prefixes and strips 'user.virtiofs.', 293the second rule hides any non-prefixed attributes that 294the host set. 295 296This is equivalent to the 'map' rule: 297 298:: 299 300 -o xattrmap=":map::user.virtiofs.:" 301 3022) Prefix 'trusted.' attributes, allow others through 303 304:: 305 306 "/prefix/all/trusted./user.virtiofs./ 307 /bad/server//trusted./ 308 /bad/client/user.virtiofs.// 309 /ok/all///" 310 311 312Here there are four rules, using / as the field 313separator, and also demonstrating that new lines can 314be included between rules. 315The first rule is the prefixing of 'trusted.' and 316stripping of 'user.virtiofs.'. 317The second rule hides unprefixed 'trusted.' attributes 318on the host. 319The third rule stops a guest from explicitly setting 320the 'user.virtiofs.' path directly to prevent access 321control bypass on the target of the earlier prefix 322remapping. 323Finally, the fourth rule lets all remaining attributes 324through. 325 326This is equivalent to the 'map' rule: 327 328:: 329 330 -o xattrmap="/map/trusted./user.virtiofs./" 331 3323) Hide 'security.' attributes, and allow everything else 333 334:: 335 336 "/bad/all/security./security./ 337 /ok/all///' 338 339The first rule combines what could be separate client and server 340rules into a single 'all' rule, matching 'security.' in either 341client arguments or lists returned from the host. This stops 342the client seeing any 'security.' attributes on the server and 343stops it setting any. 344 345Examples 346-------- 347 348Export ``/var/lib/fs/vm001/`` on vhost-user UNIX domain socket 349``/var/run/vm001-vhost-fs.sock``: 350 351.. parsed-literal:: 352 353 host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001 354 host# |qemu_system| \\ 355 -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \\ 356 -device vhost-user-fs-pci,chardev=char0,tag=myfs \\ 357 -object memory-backend-memfd,id=mem,size=4G,share=on \\ 358 -numa node,memdev=mem \\ 359 ... 360 guest# mount -t virtiofs myfs /mnt