vhost-user.rst (54485B)
1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6 7.. 8 Copyright 2014 Virtual Open Systems Sarl. 9 Copyright 2019 Intel Corporation 10 Licence: This work is licensed under the terms of the GNU GPL, 11 version 2 or later. See the COPYING file in the top-level 12 directory. 13 14.. contents:: Table of Contents 15 16Introduction 17============ 18 19This protocol is aiming to complement the ``ioctl`` interface used to 20control the vhost implementation in the Linux kernel. It implements 21the control plane needed to establish virtqueue sharing with a user 22space process on the same host. It uses communication over a Unix 23domain socket to share file descriptors in the ancillary data of the 24message. 25 26The protocol defines 2 sides of the communication, *master* and 27*slave*. *Master* is the application that shares its virtqueues, in 28our case QEMU. *Slave* is the consumer of the virtqueues. 29 30In the current implementation QEMU is the *master*, and the *slave* is 31the external process consuming the virtio queues, for example a 32software Ethernet switch running in user space, such as Snabbswitch, 33or a block device backend processing read & write to a virtual 34disk. In order to facilitate interoperability between various backend 35implementations, it is recommended to follow the :ref:`Backend program 36conventions <backend_conventions>`. 37 38*Master* and *slave* can be either a client (i.e. connecting) or 39server (listening) in the socket communication. 40 41Message Specification 42===================== 43 44.. Note:: All numbers are in the machine native byte order. 45 46A vhost-user message consists of 3 header fields and a payload. 47 48+---------+-------+------+---------+ 49| request | flags | size | payload | 50+---------+-------+------+---------+ 51 52Header 53------ 54 55:request: 32-bit type of the request 56 57:flags: 32-bit bit field 58 59- Lower 2 bits are the version (currently 0x01) 60- Bit 2 is the reply flag - needs to be sent on each reply from the slave 61- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 62 details. 63 64:size: 32-bit size of the payload 65 66Payload 67------- 68 69Depending on the request type, **payload** can be: 70 71A single 64-bit integer 72^^^^^^^^^^^^^^^^^^^^^^^ 73 74+-----+ 75| u64 | 76+-----+ 77 78:u64: a 64-bit unsigned integer 79 80A vring state description 81^^^^^^^^^^^^^^^^^^^^^^^^^ 82 83+-------+-----+ 84| index | num | 85+-------+-----+ 86 87:index: a 32-bit index 88 89:num: a 32-bit number 90 91A vring address description 92^^^^^^^^^^^^^^^^^^^^^^^^^^^ 93 94+-------+-------+------+------------+------+-----------+-----+ 95| index | flags | size | descriptor | used | available | log | 96+-------+-------+------+------------+------+-----------+-----+ 97 98:index: a 32-bit vring index 99 100:flags: a 32-bit vring flags 101 102:descriptor: a 64-bit ring address of the vring descriptor table 103 104:used: a 64-bit ring address of the vring used ring 105 106:available: a 64-bit ring address of the vring available ring 107 108:log: a 64-bit guest address for logging 109 110Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 111been negotiated. Otherwise it is a user address. 112 113Memory regions description 114^^^^^^^^^^^^^^^^^^^^^^^^^^ 115 116+-------------+---------+---------+-----+---------+ 117| num regions | padding | region0 | ... | region7 | 118+-------------+---------+---------+-----+---------+ 119 120:num regions: a 32-bit number of regions 121 122:padding: 32-bit 123 124A region is: 125 126+---------------+------+--------------+-------------+ 127| guest address | size | user address | mmap offset | 128+---------------+------+--------------+-------------+ 129 130:guest address: a 64-bit guest address of the region 131 132:size: a 64-bit size 133 134:user address: a 64-bit user address 135 136:mmap offset: 64-bit offset where region starts in the mapped memory 137 138Single memory region description 139^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 140 141+---------+---------------+------+--------------+-------------+ 142| padding | guest address | size | user address | mmap offset | 143+---------+---------------+------+--------------+-------------+ 144 145:padding: 64-bit 146 147:guest address: a 64-bit guest address of the region 148 149:size: a 64-bit size 150 151:user address: a 64-bit user address 152 153:mmap offset: 64-bit offset where region starts in the mapped memory 154 155Log description 156^^^^^^^^^^^^^^^ 157 158+----------+------------+ 159| log size | log offset | 160+----------+------------+ 161 162:log size: size of area used for logging 163 164:log offset: offset from start of supplied file descriptor where 165 logging starts (i.e. where guest address 0 would be 166 logged) 167 168An IOTLB message 169^^^^^^^^^^^^^^^^ 170 171+------+------+--------------+-------------------+------+ 172| iova | size | user address | permissions flags | type | 173+------+------+--------------+-------------------+------+ 174 175:iova: a 64-bit I/O virtual address programmed by the guest 176 177:size: a 64-bit size 178 179:user address: a 64-bit user address 180 181:permissions flags: an 8-bit value: 182 - 0: No access 183 - 1: Read access 184 - 2: Write access 185 - 3: Read/Write access 186 187:type: an 8-bit IOTLB message type: 188 - 1: IOTLB miss 189 - 2: IOTLB update 190 - 3: IOTLB invalidate 191 - 4: IOTLB access fail 192 193Virtio device config space 194^^^^^^^^^^^^^^^^^^^^^^^^^^ 195 196+--------+------+-------+---------+ 197| offset | size | flags | payload | 198+--------+------+-------+---------+ 199 200:offset: a 32-bit offset of virtio device's configuration space 201 202:size: a 32-bit configuration space access size in bytes 203 204:flags: a 32-bit value: 205 - 0: Vhost master messages used for writeable fields 206 - 1: Vhost master messages used for live migration 207 208:payload: Size bytes array holding the contents of the virtio 209 device's configuration space 210 211Vring area description 212^^^^^^^^^^^^^^^^^^^^^^ 213 214+-----+------+--------+ 215| u64 | size | offset | 216+-----+------+--------+ 217 218:u64: a 64-bit integer contains vring index and flags 219 220:size: a 64-bit size of this area 221 222:offset: a 64-bit offset of this area from the start of the 223 supplied file descriptor 224 225Inflight description 226^^^^^^^^^^^^^^^^^^^^ 227 228+-----------+-------------+------------+------------+ 229| mmap size | mmap offset | num queues | queue size | 230+-----------+-------------+------------+------------+ 231 232:mmap size: a 64-bit size of area to track inflight I/O 233 234:mmap offset: a 64-bit offset of this area from the start 235 of the supplied file descriptor 236 237:num queues: a 16-bit number of virtqueues 238 239:queue size: a 16-bit size of virtqueues 240 241C structure 242----------- 243 244In QEMU the vhost-user message is implemented with the following struct: 245 246.. code:: c 247 248 typedef struct VhostUserMsg { 249 VhostUserRequest request; 250 uint32_t flags; 251 uint32_t size; 252 union { 253 uint64_t u64; 254 struct vhost_vring_state state; 255 struct vhost_vring_addr addr; 256 VhostUserMemory memory; 257 VhostUserLog log; 258 struct vhost_iotlb_msg iotlb; 259 VhostUserConfig config; 260 VhostUserVringArea area; 261 VhostUserInflight inflight; 262 }; 263 } QEMU_PACKED VhostUserMsg; 264 265Communication 266============= 267 268The protocol for vhost-user is based on the existing implementation of 269vhost for the Linux Kernel. Most messages that can be sent via the 270Unix domain socket implementing vhost-user have an equivalent ioctl to 271the kernel implementation. 272 273The communication consists of *master* sending message requests and 274*slave* sending message replies. Most of the requests don't require 275replies. Here is a list of the ones that do: 276 277* ``VHOST_USER_GET_FEATURES`` 278* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 279* ``VHOST_USER_GET_VRING_BASE`` 280* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 281* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 282 283.. seealso:: 284 285 :ref:`REPLY_ACK <reply_ack>` 286 The section on ``REPLY_ACK`` protocol extension. 287 288There are several messages that the master sends with file descriptors passed 289in the ancillary data: 290 291* ``VHOST_USER_SET_MEM_TABLE`` 292* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 293* ``VHOST_USER_SET_LOG_FD`` 294* ``VHOST_USER_SET_VRING_KICK`` 295* ``VHOST_USER_SET_VRING_CALL`` 296* ``VHOST_USER_SET_VRING_ERR`` 297* ``VHOST_USER_SET_SLAVE_REQ_FD`` 298* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 299 300If *master* is unable to send the full message or receives a wrong 301reply it will close the connection. An optional reconnection mechanism 302can be implemented. 303 304If *slave* detects some error such as incompatible features, it may also 305close the connection. This should only happen in exceptional circumstances. 306 307Any protocol extensions are gated by protocol feature bits, which 308allows full backwards compatibility on both master and slave. As 309older slaves don't support negotiating protocol features, a feature 310bit was dedicated for this purpose:: 311 312 #define VHOST_USER_F_PROTOCOL_FEATURES 30 313 314Starting and stopping rings 315--------------------------- 316 317Client must only process each ring when it is started. 318 319Client must only pass data between the ring and the backend, when the 320ring is enabled. 321 322If ring is started but disabled, client must process the ring without 323talking to the backend. 324 325For example, for a networking device, in the disabled state client 326must not supply any new RX packets, but must process and discard any 327TX packets. 328 329If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the 330ring is initialized in an enabled state. 331 332If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is 333initialized in a disabled state. Client must not pass data to/from the 334backend until ring is enabled by ``VHOST_USER_SET_VRING_ENABLE`` with 335parameter 1, or after it has been disabled by 336``VHOST_USER_SET_VRING_ENABLE`` with parameter 0. 337 338Each ring is initialized in a stopped state, client must not process 339it until ring is started, or after it has been stopped. 340 341Client must start ring upon receiving a kick (that is, detecting that 342file descriptor is readable) on the descriptor specified by 343``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message 344``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving 345``VHOST_USER_GET_VRING_BASE``. 346 347While processing the rings (whether they are enabled or not), client 348must support changing some configuration aspects on the fly. 349 350Multiple queue support 351---------------------- 352 353Many devices have a fixed number of virtqueues. In this case the master 354already knows the number of available virtqueues without communicating with the 355slave. 356 357Some devices do not have a fixed number of virtqueues. Instead the maximum 358number of virtqueues is chosen by the slave. The number can depend on host 359resource availability or slave implementation details. Such devices are called 360multiple queue devices. 361 362Multiple queue support allows the slave to advertise the maximum number of 363queues. This is treated as a protocol extension, hence the slave has to 364implement protocol features first. The multiple queues feature is supported 365only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 366 367The max number of queues the slave supports can be queried with message 368``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested 369queues is bigger than that. 370 371As all queues share one connection, the master uses a unique index for each 372queue in the sent message to identify a specified queue. 373 374The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 375vhost-user-net has historically automatically enabled the first queue pair. 376 377Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 378feature, even for devices with a fixed number of virtqueues, since it is simple 379to implement and offers a degree of introspection. 380 381Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 382devices with a fixed number of virtqueues. Only true multiqueue devices 383require this protocol feature. 384 385Migration 386--------- 387 388During live migration, the master may need to track the modifications 389the slave makes to the memory mapped regions. The client should mark 390the dirty pages in a log. Once it complies to this logging, it may 391declare the ``VHOST_F_LOG_ALL`` vhost feature. 392 393To start/stop logging of data/used ring writes, server may send 394messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 395``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 396flags set to 1/0, respectively. 397 398All the modifications to memory pointed by vring "descriptor" should 399be marked. Modifications to "used" vring should be marked if 400``VHOST_VRING_F_LOG`` is part of ring's flags. 401 402Dirty pages are of size:: 403 404 #define VHOST_LOG_PAGE 0x1000 405 406The log memory fd is provided in the ancillary data of 407``VHOST_USER_SET_LOG_BASE`` message when the slave has 408``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 409 410The size of the log is supplied as part of ``VhostUserMsg`` which 411should be large enough to cover all known guest addresses. Log starts 412at the supplied offset in the supplied file descriptor. The log 413covers from address 0 to the maximum of guest regions. In pseudo-code, 414to mark page at ``addr`` as dirty:: 415 416 page = addr / VHOST_LOG_PAGE 417 log[page / 8] |= 1 << page % 8 418 419Where ``addr`` is the guest physical address. 420 421Use atomic operations, as the log may be concurrently manipulated. 422 423Note that when logging modifications to the used ring (when 424``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 425be used to calculate the log offset: the write to first byte of the 426used ring is logged at this offset from log start. Also note that this 427value might be outside the legal guest physical address range 428(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 429the bit offset of the last byte of the ring must fall within the size 430supplied by ``VhostUserLog``. 431 432``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 433ancillary data, it may be used to inform the master that the log has 434been modified. 435 436Once the source has finished migration, rings will be stopped by the 437source. No further update must be done before rings are restarted. 438 439In postcopy migration the slave is started before all the memory has 440been received from the source host, and care must be taken to avoid 441accessing pages that have yet to be received. The slave opens a 442'userfault'-fd and registers the memory with it; this fd is then 443passed back over to the master. The master services requests on the 444userfaultfd for pages that are accessed and when the page is available 445it performs WAKE ioctl's on the userfaultfd to wake the stalled 446slave. The client indicates support for this via the 447``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 448 449Memory access 450------------- 451 452The master sends a list of vhost memory regions to the slave using the 453``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 454addresses: a guest address and a user address. 455 456Messages contain guest addresses and/or user addresses to reference locations 457within the shared memory. The mapping of these addresses works as follows. 458 459User addresses map to the vhost memory region containing that user address. 460 461When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 462 463* Guest addresses map to the vhost memory region containing that guest 464 address. 465 466When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 467 468* Guest addresses are also called I/O virtual addresses (IOVAs). They are 469 translated to user addresses via the IOTLB. 470 471* The vhost memory region guest address is not used. 472 473IOMMU support 474------------- 475 476When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 477master sends IOTLB entries update & invalidation by sending 478``VHOST_USER_IOTLB_MSG`` requests to the slave with a ``struct 479vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 480has to be filled with the update message type (2), the I/O virtual 481address, the size, the user virtual address, and the permissions 482flags. Addresses and size must be within vhost memory regions set via 483the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 484``iotlb`` payload has to be filled with the invalidation message type 485(3), the I/O virtual address and the size. On success, the slave is 486expected to reply with a zero payload, non-zero otherwise. 487 488The slave relies on the slave communication channel (see :ref:`Slave 489communication <slave_communication>` section below) to send IOTLB miss 490and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG`` 491requests to the master with a ``struct vhost_iotlb_msg`` as 492payload. For miss events, the iotlb payload has to be filled with the 493miss message type (1), the I/O virtual address and the permissions 494flags. For access failure event, the iotlb payload has to be filled 495with the access failure message type (4), the I/O virtual address and 496the permissions flags. For synchronization purpose, the slave may 497rely on the reply-ack feature, so the master may send a reply when 498operation is completed if the reply-ack feature is negotiated and 499slaves requests a reply. For miss events, completed operation means 500either master sent an update message containing the IOTLB entry 501containing requested address and permission, or master sent nothing if 502the IOTLB miss message is invalid (invalid IOVA or permission). 503 504The master isn't expected to take the initiative to send IOTLB update 505messages, as the slave sends IOTLB miss messages for the guest virtual 506memory areas it needs to access. 507 508.. _slave_communication: 509 510Slave communication 511------------------- 512 513An optional communication channel is provided if the slave declares 514``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the 515slave to make requests to the master. 516 517The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data. 518 519A slave may then send ``VHOST_USER_SLAVE_*`` messages to the master 520using this fd communication channel. 521 522If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is 523negotiated, slave can send file descriptors (at most 8 descriptors in 524each message) to master via ancillary data using this fd communication 525channel. 526 527Inflight I/O tracking 528--------------------- 529 530To support reconnecting after restart or crash, slave may need to 531resubmit inflight I/Os. If virtqueue is processed in order, we can 532easily achieve that by getting the inflight descriptors from 533descriptor table (split virtqueue) or descriptor ring (packed 534virtqueue). However, it can't work when we process descriptors 535out-of-order because some entries which store the information of 536inflight descriptors in available ring (split virtqueue) or descriptor 537ring (packed virtqueue) might be overridden by new entries. To solve 538this problem, slave need to allocate an extra buffer to store this 539information of inflight descriptors and share it with master for 540persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 541``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 542between master and slave. And the format of this buffer is described 543below: 544 545+---------------+---------------+-----+---------------+ 546| queue0 region | queue1 region | ... | queueN region | 547+---------------+---------------+-----+---------------+ 548 549N is the number of available virtqueues. Slave could get it from num 550queues field of ``VhostUserInflight``. 551 552For split virtqueue, queue region can be implemented as: 553 554.. code:: c 555 556 typedef struct DescStateSplit { 557 /* Indicate whether this descriptor is inflight or not. 558 * Only available for head-descriptor. */ 559 uint8_t inflight; 560 561 /* Padding */ 562 uint8_t padding[5]; 563 564 /* Maintain a list for the last batch of used descriptors. 565 * Only available when batching is used for submitting */ 566 uint16_t next; 567 568 /* Used to preserve the order of fetching available descriptors. 569 * Only available for head-descriptor. */ 570 uint64_t counter; 571 } DescStateSplit; 572 573 typedef struct QueueRegionSplit { 574 /* The feature flags of this region. Now it's initialized to 0. */ 575 uint64_t features; 576 577 /* The version of this region. It's 1 currently. 578 * Zero value indicates an uninitialized buffer */ 579 uint16_t version; 580 581 /* The size of DescStateSplit array. It's equal to the virtqueue 582 * size. Slave could get it from queue size field of VhostUserInflight. */ 583 uint16_t desc_num; 584 585 /* The head of list that track the last batch of used descriptors. */ 586 uint16_t last_batch_head; 587 588 /* Store the idx value of used ring */ 589 uint16_t used_idx; 590 591 /* Used to track the state of each descriptor in descriptor table */ 592 DescStateSplit desc[]; 593 } QueueRegionSplit; 594 595To track inflight I/O, the queue region should be processed as follows: 596 597When receiving available buffers from the driver: 598 599#. Get the next available head-descriptor index from available ring, ``i`` 600 601#. Set ``desc[i].counter`` to the value of global counter 602 603#. Increase global counter by 1 604 605#. Set ``desc[i].inflight`` to 1 606 607When supplying used buffers to the driver: 608 6091. Get corresponding used head-descriptor index, i 610 6112. Set ``desc[i].next`` to ``last_batch_head`` 612 6133. Set ``last_batch_head`` to ``i`` 614 615#. Steps 1,2,3 may be performed repeatedly if batching is possible 616 617#. Increase the ``idx`` value of used ring by the size of the batch 618 619#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 620 621#. Set ``used_idx`` to the ``idx`` value of used ring 622 623When reconnecting: 624 625#. If the value of ``used_idx`` does not match the ``idx`` value of 626 used ring (means the inflight field of ``DescStateSplit`` entries in 627 last batch may be incorrect), 628 629 a. Subtract the value of ``used_idx`` from the ``idx`` value of 630 used ring to get last batch size of ``DescStateSplit`` entries 631 632 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 633 list which starts from ``last_batch_head`` 634 635 #. Set ``used_idx`` to the ``idx`` value of used ring 636 637#. Resubmit inflight ``DescStateSplit`` entries in order of their 638 counter value 639 640For packed virtqueue, queue region can be implemented as: 641 642.. code:: c 643 644 typedef struct DescStatePacked { 645 /* Indicate whether this descriptor is inflight or not. 646 * Only available for head-descriptor. */ 647 uint8_t inflight; 648 649 /* Padding */ 650 uint8_t padding; 651 652 /* Link to the next free entry */ 653 uint16_t next; 654 655 /* Link to the last entry of descriptor list. 656 * Only available for head-descriptor. */ 657 uint16_t last; 658 659 /* The length of descriptor list. 660 * Only available for head-descriptor. */ 661 uint16_t num; 662 663 /* Used to preserve the order of fetching available descriptors. 664 * Only available for head-descriptor. */ 665 uint64_t counter; 666 667 /* The buffer id */ 668 uint16_t id; 669 670 /* The descriptor flags */ 671 uint16_t flags; 672 673 /* The buffer length */ 674 uint32_t len; 675 676 /* The buffer address */ 677 uint64_t addr; 678 } DescStatePacked; 679 680 typedef struct QueueRegionPacked { 681 /* The feature flags of this region. Now it's initialized to 0. */ 682 uint64_t features; 683 684 /* The version of this region. It's 1 currently. 685 * Zero value indicates an uninitialized buffer */ 686 uint16_t version; 687 688 /* The size of DescStatePacked array. It's equal to the virtqueue 689 * size. Slave could get it from queue size field of VhostUserInflight. */ 690 uint16_t desc_num; 691 692 /* The head of free DescStatePacked entry list */ 693 uint16_t free_head; 694 695 /* The old head of free DescStatePacked entry list */ 696 uint16_t old_free_head; 697 698 /* The used index of descriptor ring */ 699 uint16_t used_idx; 700 701 /* The old used index of descriptor ring */ 702 uint16_t old_used_idx; 703 704 /* Device ring wrap counter */ 705 uint8_t used_wrap_counter; 706 707 /* The old device ring wrap counter */ 708 uint8_t old_used_wrap_counter; 709 710 /* Padding */ 711 uint8_t padding[7]; 712 713 /* Used to track the state of each descriptor fetched from descriptor ring */ 714 DescStatePacked desc[]; 715 } QueueRegionPacked; 716 717To track inflight I/O, the queue region should be processed as follows: 718 719When receiving available buffers from the driver: 720 721#. Get the next available descriptor entry from descriptor ring, ``d`` 722 723#. If ``d`` is head descriptor, 724 725 a. Set ``desc[old_free_head].num`` to 0 726 727 #. Set ``desc[old_free_head].counter`` to the value of global counter 728 729 #. Increase global counter by 1 730 731 #. Set ``desc[old_free_head].inflight`` to 1 732 733#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 734 ``free_head`` 735 736#. Increase ``desc[old_free_head].num`` by 1 737 738#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 739 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 740 ``d.len``, ``d.flags``, ``d.id`` 741 742#. Set ``free_head`` to ``desc[free_head].next`` 743 744#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 745 746When supplying used buffers to the driver: 747 7481. Get corresponding used head-descriptor entry from descriptor ring, 749 ``d`` 750 7512. Get corresponding ``DescStatePacked`` entry, ``e`` 752 7533. Set ``desc[e.last].next`` to ``free_head`` 754 7554. Set ``free_head`` to the index of ``e`` 756 757#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 758 759#. Increase ``used_idx`` by the size of the batch and update 760 ``used_wrap_counter`` if needed 761 762#. Update ``d.flags`` 763 764#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 765 in the batch to 0 766 767#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 768 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 769 770When reconnecting: 771 772#. If ``used_idx`` does not match ``old_used_idx`` (means the 773 ``inflight`` field of ``DescStatePacked`` entries in last batch may 774 be incorrect), 775 776 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 777 778 #. Use ``old_used_wrap_counter`` to calculate the available flags 779 780 #. If ``d.flags`` is not equal to the calculated flags value (means 781 slave has submitted the buffer to guest driver before crash, so 782 it has to commit the in-progres update), set ``old_free_head``, 783 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 784 ``used_idx``, ``used_wrap_counter`` 785 786#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 787 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 788 (roll back any in-progress update) 789 790#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 791 free list to 0 792 793#. Resubmit inflight ``DescStatePacked`` entries in order of their 794 counter value 795 796In-band notifications 797--------------------- 798 799In some limited situations (e.g. for simulation) it is desirable to 800have the kick, call and error (if used) signals done via in-band 801messages instead of asynchronous eventfd notifications. This can be 802done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 803protocol feature. 804 805Note that due to the fact that too many messages on the sockets can 806cause the sending application(s) to block, it is not advised to use 807this feature unless absolutely necessary. It is also considered an 808error to negotiate this feature without also negotiating 809``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 810the former is necessary for getting a message channel from the slave 811to the master, while the latter needs to be used with the in-band 812notification messages to block until they are processed, both to avoid 813blocking later and for proper processing (at least in the simulation 814use case.) As it has no other way of signalling this error, the slave 815should close the connection as a response to a 816``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 817notifications feature flag without the other two. 818 819Protocol features 820----------------- 821 822.. code:: c 823 824 #define VHOST_USER_PROTOCOL_F_MQ 0 825 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 826 #define VHOST_USER_PROTOCOL_F_RARP 2 827 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 828 #define VHOST_USER_PROTOCOL_F_MTU 4 829 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 830 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 831 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 832 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 833 #define VHOST_USER_PROTOCOL_F_CONFIG 9 834 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 835 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 836 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 837 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 838 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 839 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 840 #define VHOST_USER_PROTOCOL_F_STATUS 16 841 842Master message types 843-------------------- 844 845``VHOST_USER_GET_FEATURES`` 846 :id: 1 847 :equivalent ioctl: ``VHOST_GET_FEATURES`` 848 :master payload: N/A 849 :slave payload: ``u64`` 850 851 Get from the underlying vhost implementation the features bitmask. 852 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals slave support 853 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 854 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 855 856``VHOST_USER_SET_FEATURES`` 857 :id: 2 858 :equivalent ioctl: ``VHOST_SET_FEATURES`` 859 :master payload: ``u64`` 860 861 Enable features in the underlying vhost implementation using a 862 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 863 slave support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 864 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 865 866``VHOST_USER_GET_PROTOCOL_FEATURES`` 867 :id: 15 868 :equivalent ioctl: ``VHOST_GET_FEATURES`` 869 :master payload: N/A 870 :slave payload: ``u64`` 871 872 Get the protocol feature bitmask from the underlying vhost 873 implementation. Only legal if feature bit 874 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 875 ``VHOST_USER_GET_FEATURES``. 876 877.. Note:: 878 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must 879 support this message even before ``VHOST_USER_SET_FEATURES`` was 880 called. 881 882``VHOST_USER_SET_PROTOCOL_FEATURES`` 883 :id: 16 884 :equivalent ioctl: ``VHOST_SET_FEATURES`` 885 :master payload: ``u64`` 886 887 Enable protocol features in the underlying vhost implementation. 888 889 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 890 ``VHOST_USER_GET_FEATURES``. 891 892.. Note:: 893 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 894 this message even before ``VHOST_USER_SET_FEATURES`` was called. 895 896``VHOST_USER_SET_OWNER`` 897 :id: 3 898 :equivalent ioctl: ``VHOST_SET_OWNER`` 899 :master payload: N/A 900 901 Issued when a new connection is established. It sets the current 902 *master* as an owner of the session. This can be used on the *slave* 903 as a "session start" flag. 904 905``VHOST_USER_RESET_OWNER`` 906 :id: 4 907 :master payload: N/A 908 909.. admonition:: Deprecated 910 911 This is no longer used. Used to be sent to request disabling all 912 rings, but some clients interpreted it to also discard connection 913 state (this interpretation would lead to bugs). It is recommended 914 that clients either ignore this message, or use it to disable all 915 rings. 916 917``VHOST_USER_SET_MEM_TABLE`` 918 :id: 5 919 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 920 :master payload: memory regions description 921 :slave payload: (postcopy only) memory regions description 922 923 Sets the memory map regions on the slave so it can translate the 924 vring addresses. In the ancillary data there is an array of file 925 descriptors for each memory mapped region. The size and ordering of 926 the fds matches the number and ordering of memory regions. 927 928 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 929 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 930 regions to the master. The slave must have mmap'd the regions but 931 not yet accessed them and should not yet generate a userfault 932 event. 933 934.. Note:: 935 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 936 reply back to the list of mappings with an empty 937 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 938 reception of this message may the guest start accessing the memory 939 and generating faults. 940 941``VHOST_USER_SET_LOG_BASE`` 942 :id: 6 943 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 944 :master payload: u64 945 :slave payload: N/A 946 947 Sets logging shared memory space. 948 949 When slave has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 950 the log memory fd is provided in the ancillary data of 951 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 952 memory area provided in the message. 953 954``VHOST_USER_SET_LOG_FD`` 955 :id: 7 956 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 957 :master payload: N/A 958 959 Sets the logging file descriptor, which is passed as ancillary data. 960 961``VHOST_USER_SET_VRING_NUM`` 962 :id: 8 963 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 964 :master payload: vring state description 965 966 Set the size of the queue. 967 968``VHOST_USER_SET_VRING_ADDR`` 969 :id: 9 970 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 971 :master payload: vring address description 972 :slave payload: N/A 973 974 Sets the addresses of the different aspects of the vring. 975 976``VHOST_USER_SET_VRING_BASE`` 977 :id: 10 978 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 979 :master payload: vring state description 980 981 Sets the base offset in the available vring. 982 983``VHOST_USER_GET_VRING_BASE`` 984 :id: 11 985 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 986 :master payload: vring state description 987 :slave payload: vring state description 988 989 Get the available vring base offset. 990 991``VHOST_USER_SET_VRING_KICK`` 992 :id: 12 993 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 994 :master payload: ``u64`` 995 996 Set the event file descriptor for adding buffers to the vring. It is 997 passed in the ancillary data. 998 999 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1000 invalid FD flag. This flag is set when there is no file descriptor 1001 in the ancillary data. This signals that polling should be used 1002 instead of waiting for the kick. Note that if the protocol feature 1003 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1004 this message isn't necessary as the ring is also started on the 1005 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1006 set an event file descriptor (which will be preferred over the 1007 message) or to enable polling. 1008 1009``VHOST_USER_SET_VRING_CALL`` 1010 :id: 13 1011 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1012 :master payload: ``u64`` 1013 1014 Set the event file descriptor to signal when buffers are used. It is 1015 passed in the ancillary data. 1016 1017 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1018 invalid FD flag. This flag is set when there is no file descriptor 1019 in the ancillary data. This signals that polling will be used 1020 instead of waiting for the call. Note that if the protocol features 1021 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1022 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1023 isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be 1024 used, it may however still be used to set an event file descriptor 1025 or to enable polling. 1026 1027``VHOST_USER_SET_VRING_ERR`` 1028 :id: 14 1029 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1030 :master payload: ``u64`` 1031 1032 Set the event file descriptor to signal when error occurs. It is 1033 passed in the ancillary data. 1034 1035 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1036 invalid FD flag. This flag is set when there is no file descriptor 1037 in the ancillary data. Note that if the protocol features 1038 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1039 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1040 isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be 1041 used, it may however still be used to set an event file descriptor 1042 (which will be preferred over the message). 1043 1044``VHOST_USER_GET_QUEUE_NUM`` 1045 :id: 17 1046 :equivalent ioctl: N/A 1047 :master payload: N/A 1048 :slave payload: u64 1049 1050 Query how many queues the backend supports. 1051 1052 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1053 is set in queried protocol features by 1054 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1055 1056``VHOST_USER_SET_VRING_ENABLE`` 1057 :id: 18 1058 :equivalent ioctl: N/A 1059 :master payload: vring state description 1060 1061 Signal slave to enable or disable corresponding vring. 1062 1063 This request should be sent only when 1064 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1065 1066``VHOST_USER_SEND_RARP`` 1067 :id: 19 1068 :equivalent ioctl: N/A 1069 :master payload: ``u64`` 1070 1071 Ask vhost user backend to broadcast a fake RARP to notify the migration 1072 is terminated for guest that does not support GUEST_ANNOUNCE. 1073 1074 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1075 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1076 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1077 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1078 payload contain the mac address of the guest to allow the vhost user 1079 backend to construct and broadcast the fake RARP. 1080 1081``VHOST_USER_NET_SET_MTU`` 1082 :id: 20 1083 :equivalent ioctl: N/A 1084 :master payload: ``u64`` 1085 1086 Set host MTU value exposed to the guest. 1087 1088 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1089 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1090 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1091 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1092 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1093 1094 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1095 respond with zero in case the specified MTU is valid, or non-zero 1096 otherwise. 1097 1098``VHOST_USER_SET_SLAVE_REQ_FD`` 1099 :id: 21 1100 :equivalent ioctl: N/A 1101 :master payload: N/A 1102 1103 Set the socket file descriptor for slave initiated requests. It is passed 1104 in the ancillary data. 1105 1106 This request should be sent only when 1107 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1108 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in 1109 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1110 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1111 respond with zero for success, non-zero otherwise. 1112 1113``VHOST_USER_IOTLB_MSG`` 1114 :id: 22 1115 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1116 :master payload: ``struct vhost_iotlb_msg`` 1117 :slave payload: ``u64`` 1118 1119 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1120 1121 Master sends such requests to update and invalidate entries in the 1122 device IOTLB. The slave has to acknowledge the request with sending 1123 zero as ``u64`` payload for success, non-zero otherwise. 1124 1125 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1126 feature has been successfully negotiated. 1127 1128``VHOST_USER_SET_VRING_ENDIAN`` 1129 :id: 23 1130 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1131 :master payload: vring state description 1132 1133 Set the endianness of a VQ for legacy devices. Little-endian is 1134 indicated with state.num set to 0 and big-endian is indicated with 1135 state.num set to 1. Other values are invalid. 1136 1137 This request should be sent only when 1138 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1139 Backends that negotiated this feature should handle both 1140 endiannesses and expect this message once (per VQ) during device 1141 configuration (ie. before the master starts the VQ). 1142 1143``VHOST_USER_GET_CONFIG`` 1144 :id: 24 1145 :equivalent ioctl: N/A 1146 :master payload: virtio device config space 1147 :slave payload: virtio device config space 1148 1149 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1150 submitted by the vhost-user master to fetch the contents of the 1151 virtio device configuration space, vhost-user slave's payload size 1152 MUST match master's request, vhost-user slave uses zero length of 1153 payload to indicate an error to vhost-user master. The vhost-user 1154 master may cache the contents to avoid repeated 1155 ``VHOST_USER_GET_CONFIG`` calls. 1156 1157``VHOST_USER_SET_CONFIG`` 1158 :id: 25 1159 :equivalent ioctl: N/A 1160 :master payload: virtio device config space 1161 :slave payload: N/A 1162 1163 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1164 submitted by the vhost-user master when the Guest changes the virtio 1165 device configuration space and also can be used for live migration 1166 on the destination host. The vhost-user slave must check the flags 1167 field, and slaves MUST NOT accept SET_CONFIG for read-only 1168 configuration space fields unless the live migration bit is set. 1169 1170``VHOST_USER_CREATE_CRYPTO_SESSION`` 1171 :id: 26 1172 :equivalent ioctl: N/A 1173 :master payload: crypto session description 1174 :slave payload: crypto session description 1175 1176 Create a session for crypto operation. The server side must return 1177 the session id, 0 or positive for success, negative for failure. 1178 This request should be sent only when 1179 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1180 successfully negotiated. It's a required feature for crypto 1181 devices. 1182 1183``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1184 :id: 27 1185 :equivalent ioctl: N/A 1186 :master payload: ``u64`` 1187 1188 Close a session for crypto operation which was previously 1189 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1190 1191 This request should be sent only when 1192 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1193 successfully negotiated. It's a required feature for crypto 1194 devices. 1195 1196``VHOST_USER_POSTCOPY_ADVISE`` 1197 :id: 28 1198 :master payload: N/A 1199 :slave payload: userfault fd 1200 1201 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the master 1202 advises slave that a migration with postcopy enabled is underway, 1203 the slave must open a userfaultfd for later use. Note that at this 1204 stage the migration is still in precopy mode. 1205 1206``VHOST_USER_POSTCOPY_LISTEN`` 1207 :id: 29 1208 :master payload: N/A 1209 1210 Master advises slave that a transition to postcopy mode has 1211 happened. The slave must ensure that shared memory is registered 1212 with userfaultfd to cause faulting of non-present pages. 1213 1214 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1215 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1216 1217``VHOST_USER_POSTCOPY_END`` 1218 :id: 30 1219 :slave payload: ``u64`` 1220 1221 Master advises that postcopy migration has now completed. The slave 1222 must disable the userfaultfd. The response is an acknowledgement 1223 only. 1224 1225 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1226 is sent at the end of the migration, after 1227 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1228 1229 The value returned is an error indication; 0 is success. 1230 1231``VHOST_USER_GET_INFLIGHT_FD`` 1232 :id: 31 1233 :equivalent ioctl: N/A 1234 :master payload: inflight description 1235 1236 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1237 been successfully negotiated, this message is submitted by master to 1238 get a shared buffer from slave. The shared buffer will be used to 1239 track inflight I/O by slave. QEMU should retrieve a new one when vm 1240 reset. 1241 1242``VHOST_USER_SET_INFLIGHT_FD`` 1243 :id: 32 1244 :equivalent ioctl: N/A 1245 :master payload: inflight description 1246 1247 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1248 been successfully negotiated, this message is submitted by master to 1249 send the shared inflight buffer back to slave so that slave could 1250 get inflight I/O after a crash or restart. 1251 1252``VHOST_USER_GPU_SET_SOCKET`` 1253 :id: 33 1254 :equivalent ioctl: N/A 1255 :master payload: N/A 1256 1257 Sets the GPU protocol socket file descriptor, which is passed as 1258 ancillary data. The GPU protocol is used to inform the master of 1259 rendering state and updates. See vhost-user-gpu.rst for details. 1260 1261``VHOST_USER_RESET_DEVICE`` 1262 :id: 34 1263 :equivalent ioctl: N/A 1264 :master payload: N/A 1265 :slave payload: N/A 1266 1267 Ask the vhost user backend to disable all rings and reset all 1268 internal device state to the initial state, ready to be 1269 reinitialized. The backend retains ownership of the device 1270 throughout the reset operation. 1271 1272 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1273 feature is set by the backend. 1274 1275``VHOST_USER_VRING_KICK`` 1276 :id: 35 1277 :equivalent ioctl: N/A 1278 :slave payload: vring state description 1279 :master payload: N/A 1280 1281 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1282 feature has been successfully negotiated, this message may be 1283 submitted by the master to indicate that a buffer was added to 1284 the vring instead of signalling it using the vring's kick file 1285 descriptor or having the slave rely on polling. 1286 1287 The state.num field is currently reserved and must be set to 0. 1288 1289``VHOST_USER_GET_MAX_MEM_SLOTS`` 1290 :id: 36 1291 :equivalent ioctl: N/A 1292 :slave payload: u64 1293 1294 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1295 feature has been successfully negotiated, this message is submitted 1296 by master to the slave. The slave should return the message with a 1297 u64 payload containing the maximum number of memory slots for 1298 QEMU to expose to the guest. The value returned by the backend 1299 will be capped at the maximum number of ram slots which can be 1300 supported by the target platform. 1301 1302``VHOST_USER_ADD_MEM_REG`` 1303 :id: 37 1304 :equivalent ioctl: N/A 1305 :slave payload: single memory region description 1306 1307 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1308 feature has been successfully negotiated, this message is submitted 1309 by the master to the slave. The message payload contains a memory 1310 region descriptor struct, describing a region of guest memory which 1311 the slave device must map in. When the 1312 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1313 been successfully negotiated, along with the 1314 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1315 update the memory tables of the slave device. 1316 1317``VHOST_USER_REM_MEM_REG`` 1318 :id: 38 1319 :equivalent ioctl: N/A 1320 :slave payload: single memory region description 1321 1322 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1323 feature has been successfully negotiated, this message is submitted 1324 by the master to the slave. The message payload contains a memory 1325 region descriptor struct, describing a region of guest memory which 1326 the slave device must unmap. When the 1327 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1328 been successfully negotiated, along with the 1329 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1330 update the memory tables of the slave device. 1331 1332``VHOST_USER_SET_STATUS`` 1333 :id: 39 1334 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1335 :slave payload: N/A 1336 :master payload: ``u64`` 1337 1338 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1339 successfully negotiated, this message is submitted by the master to 1340 notify the backend with updated device status as defined in the Virtio 1341 specification. 1342 1343``VHOST_USER_GET_STATUS`` 1344 :id: 40 1345 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1346 :slave payload: ``u64`` 1347 :master payload: N/A 1348 1349 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1350 successfully negotiated, this message is submitted by the master to 1351 query the backend for its device status as defined in the Virtio 1352 specification. 1353 1354 1355Slave message types 1356------------------- 1357 1358``VHOST_USER_SLAVE_IOTLB_MSG`` 1359 :id: 1 1360 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1361 :slave payload: ``struct vhost_iotlb_msg`` 1362 :master payload: N/A 1363 1364 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1365 Slave sends such requests to notify of an IOTLB miss, or an IOTLB 1366 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1367 negotiated, and slave set the ``VHOST_USER_NEED_REPLY`` flag, master 1368 must respond with zero when operation is successfully completed, or 1369 non-zero otherwise. This request should be send only when 1370 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1371 negotiated. 1372 1373``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG`` 1374 :id: 2 1375 :equivalent ioctl: N/A 1376 :slave payload: N/A 1377 :master payload: N/A 1378 1379 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1380 slave sends such messages to notify that the virtio device's 1381 configuration space has changed, for those host devices which can 1382 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1383 message to slave to get the latest content. If 1384 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and slave set the 1385 ``VHOST_USER_NEED_REPLY`` flag, master must respond with zero when 1386 operation is successfully completed, or non-zero otherwise. 1387 1388``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG`` 1389 :id: 3 1390 :equivalent ioctl: N/A 1391 :slave payload: vring area description 1392 :master payload: N/A 1393 1394 Sets host notifier for a specified queue. The queue index is 1395 contained in the ``u64`` field of the vring area description. The 1396 host notifier is described by the file descriptor (typically it's a 1397 VFIO device fd) which is passed as ancillary data and the size 1398 (which is mmap size and should be the same as host page size) and 1399 offset (which is mmap offset) carried in the vring area 1400 description. QEMU can mmap the file descriptor based on the size and 1401 offset to get a memory range. Registering a host notifier means 1402 mapping this memory range to the VM as the specified queue's notify 1403 MMIO region. Slave sends this request to tell QEMU to de-register 1404 the existing notifier if any and register the new notifier if the 1405 request is sent with a file descriptor. 1406 1407 This request should be sent only when 1408 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1409 successfully negotiated. 1410 1411``VHOST_USER_SLAVE_VRING_CALL`` 1412 :id: 4 1413 :equivalent ioctl: N/A 1414 :slave payload: vring state description 1415 :master payload: N/A 1416 1417 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1418 feature has been successfully negotiated, this message may be 1419 submitted by the slave to indicate that a buffer was used from 1420 the vring instead of signalling this using the vring's call file 1421 descriptor or having the master relying on polling. 1422 1423 The state.num field is currently reserved and must be set to 0. 1424 1425``VHOST_USER_SLAVE_VRING_ERR`` 1426 :id: 5 1427 :equivalent ioctl: N/A 1428 :slave payload: vring state description 1429 :master payload: N/A 1430 1431 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1432 feature has been successfully negotiated, this message may be 1433 submitted by the slave to indicate that an error occurred on the 1434 specific vring, instead of signalling the error file descriptor 1435 set by the master via ``VHOST_USER_SET_VRING_ERR``. 1436 1437 The state.num field is currently reserved and must be set to 0. 1438 1439.. _reply_ack: 1440 1441VHOST_USER_PROTOCOL_F_REPLY_ACK 1442------------------------------- 1443 1444The original vhost-user specification only demands replies for certain 1445commands. This differs from the vhost protocol implementation where 1446commands are sent over an ``ioctl()`` call and block until the client 1447has completed. 1448 1449With this protocol extension negotiated, the sender (QEMU) can set the 1450``need_reply`` [Bit 3] flag to any command. This indicates that the 1451client MUST respond with a Payload ``VhostUserMsg`` indicating success 1452or failure. The payload should be set to zero on success or non-zero 1453on failure, unless the message already has an explicit reply body. 1454 1455The response payload gives QEMU a deterministic indication of the result 1456of the command. Today, QEMU is expected to terminate the main vhost-user 1457loop upon receiving such errors. In future, qemu could be taught to be more 1458resilient for selective requests. 1459 1460For the message types that already solicit a reply from the client, 1461the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1462being set brings no behavioural change. (See the Communication_ 1463section for details.) 1464 1465.. _backend_conventions: 1466 1467Backend program conventions 1468=========================== 1469 1470vhost-user backends can provide various devices & services and may 1471need to be configured manually depending on the use case. However, it 1472is a good idea to follow the conventions listed here when 1473possible. Users, QEMU or libvirt, can then rely on some common 1474behaviour to avoid heterogeneous configuration and management of the 1475backend programs and facilitate interoperability. 1476 1477Each backend installed on a host system should come with at least one 1478JSON file that conforms to the vhost-user.json schema. Each file 1479informs the management applications about the backend type, and binary 1480location. In addition, it defines rules for management apps for 1481picking the highest priority backend when multiple match the search 1482criteria (see ``@VhostUserBackend`` documentation in the schema file). 1483 1484If the backend is not capable of enabling a requested feature on the 1485host (such as 3D acceleration with virgl), or the initialization 1486failed, the backend should fail to start early and exit with a status 1487!= 0. It may also print a message to stderr for further details. 1488 1489The backend program must not daemonize itself, but it may be 1490daemonized by the management layer. It may also have a restricted 1491access to the system. 1492 1493File descriptors 0, 1 and 2 will exist, and have regular 1494stdin/stdout/stderr usage (they may have been redirected to /dev/null 1495by the management layer, or to a log handler). 1496 1497The backend program must end (as quickly and cleanly as possible) when 1498the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1499the management layer after a few seconds. 1500 1501The following command line options have an expected behaviour. They 1502are mandatory, unless explicitly said differently: 1503 1504--socket-path=PATH 1505 1506 This option specify the location of the vhost-user Unix domain socket. 1507 It is incompatible with --fd. 1508 1509--fd=FDNUM 1510 1511 When this argument is given, the backend program is started with the 1512 vhost-user socket as file descriptor FDNUM. It is incompatible with 1513 --socket-path. 1514 1515--print-capabilities 1516 1517 Output to stdout the backend capabilities in JSON format, and then 1518 exit successfully. Other options and arguments should be ignored, and 1519 the backend program should not perform its normal function. The 1520 capabilities can be reported dynamically depending on the host 1521 capabilities. 1522 1523The JSON output is described in the ``vhost-user.json`` schema, by 1524```@VHostUserBackendCapabilities``. Example: 1525 1526.. code:: json 1527 1528 { 1529 "type": "foo", 1530 "features": [ 1531 "feature-a", 1532 "feature-b" 1533 ] 1534 } 1535 1536vhost-user-input 1537---------------- 1538 1539Command line options: 1540 1541--evdev-path=PATH 1542 1543 Specify the linux input device. 1544 1545 (optional) 1546 1547--no-grab 1548 1549 Do no request exclusive access to the input device. 1550 1551 (optional) 1552 1553vhost-user-gpu 1554-------------- 1555 1556Command line options: 1557 1558--render-node=PATH 1559 1560 Specify the GPU DRM render node. 1561 1562 (optional) 1563 1564--virgl 1565 1566 Enable virgl rendering support. 1567 1568 (optional) 1569 1570vhost-user-blk 1571-------------- 1572 1573Command line options: 1574 1575--blk-file=PATH 1576 1577 Specify block device or file path. 1578 1579 (optional) 1580 1581--read-only 1582 1583 Enable read-only. 1584 1585 (optional)