ibmvmc.rst (10064B)
1.. SPDX-License-Identifier: GPL-2.0+ 2 3====================================================== 4IBM Virtual Management Channel Kernel Driver (IBMVMC) 5====================================================== 6 7:Authors: 8 Dave Engebretsen <engebret@us.ibm.com>, 9 Adam Reznechek <adreznec@linux.vnet.ibm.com>, 10 Steven Royer <seroyer@linux.vnet.ibm.com>, 11 Bryant G. Ly <bryantly@linux.vnet.ibm.com>, 12 13Introduction 14============ 15 16Note: Knowledge of virtualization technology is required to understand 17this document. 18 19A good reference document would be: 20 21https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf 22 23The Virtual Management Channel (VMC) is a logical device which provides an 24interface between the hypervisor and a management partition. This interface 25is like a message passing interface. This management partition is intended 26to provide an alternative to systems that use a Hardware Management 27Console (HMC) - based system management. 28 29The primary hardware management solution that is developed by IBM relies 30on an appliance server named the Hardware Management Console (HMC), 31packaged as an external tower or rack-mounted personal computer. In a 32Power Systems environment, a single HMC can manage multiple POWER 33processor-based systems. 34 35Management Application 36---------------------- 37 38In the management partition, a management application exists which enables 39a system administrator to configure the system’s partitioning 40characteristics via a command line interface (CLI) or Representational 41State Transfer Application (REST API's). 42 43The management application runs on a Linux logical partition on a 44POWER8 or newer processor-based server that is virtualized by PowerVM. 45System configuration, maintenance, and control functions which 46traditionally require an HMC can be implemented in the management 47application using a combination of HMC to hypervisor interfaces and 48existing operating system methods. This tool provides a subset of the 49functions implemented by the HMC and enables basic partition configuration. 50The set of HMC to hypervisor messages supported by the management 51application component are passed to the hypervisor over a VMC interface, 52which is defined below. 53 54The VMC enables the management partition to provide basic partitioning 55functions: 56 57- Logical Partitioning Configuration 58- Start, and stop actions for individual partitions 59- Display of partition status 60- Management of virtual Ethernet 61- Management of virtual Storage 62- Basic system management 63 64Virtual Management Channel (VMC) 65-------------------------------- 66 67A logical device, called the Virtual Management Channel (VMC), is defined 68for communicating between the management application and the hypervisor. It 69basically creates the pipes that enable virtualization management 70software. This device is presented to a designated management partition as 71a virtual device. 72 73This communication device uses Command/Response Queue (CRQ) and the 74Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is 75defined that must take place to establish that both the hypervisor and 76management partition sides of the channel are running prior to 77sending/receiving any of the protocol messages. 78 79This driver also utilizes Transport Event CRQs. CRQ messages are sent 80when the hypervisor detects one of the peer partitions has abnormally 81terminated, or one side has called H_FREE_CRQ to close their CRQ. 82Two new classes of CRQ messages are introduced for the VMC device. VMC 83Administrative messages are used for each partition using the VMC to 84communicate capabilities to their partner. HMC Interface messages are used 85for the actual flow of HMC messages between the management partition and 86the hypervisor. As most HMC messages far exceed the size of a CRQ buffer, 87a virtual DMA (RMDA) of the HMC message data is done prior to each HMC 88Interface CRQ message. Only the management partition drives RDMA 89operations; hypervisors never directly cause the movement of message data. 90 91 92Terminology 93----------- 94RDMA 95 Remote Direct Memory Access is DMA transfer from the server to its 96 client or from the server to its partner partition. DMA refers 97 to both physical I/O to and from memory operations and to memory 98 to memory move operations. 99CRQ 100 Command/Response Queue a facility which is used to communicate 101 between partner partitions. Transport events which are signaled 102 from the hypervisor to partition are also reported in this queue. 103 104Example Management Partition VMC Driver Interface 105================================================= 106 107This section provides an example for the management application 108implementation where a device driver is used to interface to the VMC 109device. This driver consists of a new device, for example /dev/ibmvmc, 110which provides interfaces to open, close, read, write, and perform 111ioctl’s against the VMC device. 112 113VMC Interface Initialization 114---------------------------- 115 116The device driver is responsible for initializing the VMC when the driver 117is loaded. It first creates and initializes the CRQ. Next, an exchange of 118VMC capabilities is performed to indicate the code version and number of 119resources available in both the management partition and the hypervisor. 120Finally, the hypervisor requests that the management partition create an 121initial pool of VMC buffers, one buffer for each possible HMC connection, 122which will be used for management application session initialization. 123Prior to completion of this initialization sequence, the device returns 124EBUSY to open() calls. EIO is returned for all open() failures. 125 126:: 127 128 Management Partition Hypervisor 129 CRQ INIT 130 ----------------------------------------> 131 CRQ INIT COMPLETE 132 <---------------------------------------- 133 CAPABILITIES 134 ----------------------------------------> 135 CAPABILITIES RESPONSE 136 <---------------------------------------- 137 ADD BUFFER (HMC IDX=0,1,..) _ 138 <---------------------------------------- | 139 ADD BUFFER RESPONSE | - Perform # HMCs Iterations 140 ----------------------------------------> - 141 142VMC Interface Open 143------------------ 144 145After the basic VMC channel has been initialized, an HMC session level 146connection can be established. The application layer performs an open() to 147the VMC device and executes an ioctl() against it, indicating the HMC ID 148(32 bytes of data) for this session. If the VMC device is in an invalid 149state, EIO will be returned for the ioctl(). The device driver creates a 150new HMC session value (ranging from 1 to 255) and HMC index value (starting 151at index 0 and ranging to 254) for this HMC ID. The driver then does an 152RDMA of the HMC ID to the hypervisor, and then sends an Interface Open 153message to the hypervisor to establish the session over the VMC. After the 154hypervisor receives this information, it sends Add Buffer messages to the 155management partition to seed an initial pool of buffers for the new HMC 156connection. Finally, the hypervisor sends an Interface Open Response 157message, to indicate that it is ready for normal runtime messaging. The 158following illustrates this VMC flow: 159 160:: 161 162 Management Partition Hypervisor 163 RDMA HMC ID 164 ----------------------------------------> 165 Interface Open 166 ----------------------------------------> 167 Add Buffer _ 168 <---------------------------------------- | 169 Add Buffer Response | - Perform N Iterations 170 ----------------------------------------> - 171 Interface Open Response 172 <---------------------------------------- 173 174VMC Interface Runtime 175--------------------- 176 177During normal runtime, the management application and the hypervisor 178exchange HMC messages via the Signal VMC message and RDMA operations. When 179sending data to the hypervisor, the management application performs a 180write() to the VMC device, and the driver RDMA’s the data to the hypervisor 181and then sends a Signal Message. If a write() is attempted before VMC 182device buffers have been made available by the hypervisor, or no buffers 183are currently available, EBUSY is returned in response to the write(). A 184write() will return EIO for all other errors, such as an invalid device 185state. When the hypervisor sends a message to the management, the data is 186put into a VMC buffer and an Signal Message is sent to the VMC driver in 187the management partition. The driver RDMA’s the buffer into the partition 188and passes the data up to the appropriate management application via a 189read() to the VMC device. The read() request blocks if there is no buffer 190available to read. The management application may use select() to wait for 191the VMC device to become ready with data to read. 192 193:: 194 195 Management Partition Hypervisor 196 MSG RDMA 197 ----------------------------------------> 198 SIGNAL MSG 199 ----------------------------------------> 200 SIGNAL MSG 201 <---------------------------------------- 202 MSG RDMA 203 <---------------------------------------- 204 205VMC Interface Close 206------------------- 207 208HMC session level connections are closed by the management partition when 209the application layer performs a close() against the device. This action 210results in an Interface Close message flowing to the hypervisor, which 211causes the session to be terminated. The device driver must free any 212storage allocated for buffers for this HMC connection. 213 214:: 215 216 Management Partition Hypervisor 217 INTERFACE CLOSE 218 ----------------------------------------> 219 INTERFACE CLOSE RESPONSE 220 <---------------------------------------- 221 222Additional Information 223====================== 224 225For more information on the documentation for CRQ Messages, VMC Messages, 226HMC interface Buffers, and signal messages please refer to the Linux on 227Power Architecture Platform Reference. Section F.