cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

papr_hcalls.rst (14720B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3===========================
      4Hypercall Op-codes (hcalls)
      5===========================
      6
      7Overview
      8=========
      9
     10Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
     11specification [1]_ which describes the run-time environment for a guest
     12operating system and how it should interact with the hypervisor for
     13privileged operations. Currently there are two PAPR compliant hypervisors:
     14
     15- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
     16  IBM-i and  Linux as supported guests (termed as Logical Partitions
     17  or LPARS). It supports the full PAPR specification.
     18
     19- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
     20  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
     21
     22On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
     23a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
     24issue hypercalls to the hypervisor whenever it needs to perform an action
     25that is hypervisor priviledged [3]_ or for other services managed by the
     26hypervisor.
     27
     28Hence a Hypercall (hcall) is essentially a request by the pseries guest
     29asking hypervisor to perform a privileged operation on behalf of the guest. The
     30guest issues a with necessary input operands. The hypervisor after performing
     31the privilege operation returns a status code and output operands back to the
     32guest.
     33
     34HCALL ABI
     35=========
     36The ABI specification for a hcall between a pseries guest and PAPR hypervisor
     37is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
     38done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
     39and any in-arguments for the hcall are provided in registers *r4-r12*. If values
     40have to be passed through a memory buffer, the data stored in that buffer should be
     41in Big-endian byte order.
     42
     43Once control returns back to the guest after hypervisor has serviced the
     44'HVCS' instruction the return value of the hcall is available in *r3* and any
     45out values are returned in registers *r4-r12*. Again like in case of in-arguments,
     46any out values stored in a memory buffer will be in Big-endian byte order.
     47
     48Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
     49in a arch specific header [4]_ to issue hcalls from the linux kernel
     50running as pseries guest.
     51
     52Register Conventions
     53====================
     54
     55Any hcall should follow same register convention as described in section 2.2.1.1
     56of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
     57summarizes these conventions:
     58
     59+----------+----------+-------------------------------------------+
     60| Register |Volatile  |  Purpose                                  |
     61| Range    |(Y/N)     |                                           |
     62+==========+==========+===========================================+
     63|   r0     |    Y     |  Optional-usage                           |
     64+----------+----------+-------------------------------------------+
     65|   r1     |    N     |  Stack Pointer                            |
     66+----------+----------+-------------------------------------------+
     67|   r2     |    N     |  TOC                                      |
     68+----------+----------+-------------------------------------------+
     69|   r3     |    Y     |  hcall opcode/return value                |
     70+----------+----------+-------------------------------------------+
     71|  r4-r10  |    Y     |  in and out values                        |
     72+----------+----------+-------------------------------------------+
     73|   r11    |    Y     |  Optional-usage/Environmental pointer     |
     74+----------+----------+-------------------------------------------+
     75|   r12    |    Y     |  Optional-usage/Function entry address at |
     76|          |          |  global entry point                       |
     77+----------+----------+-------------------------------------------+
     78|   r13    |    N     |  Thread-Pointer                           |
     79+----------+----------+-------------------------------------------+
     80|  r14-r31 |    N     |  Local Variables                          |
     81+----------+----------+-------------------------------------------+
     82|    LR    |    Y     |  Link Register                            |
     83+----------+----------+-------------------------------------------+
     84|   CTR    |    Y     |  Loop Counter                             |
     85+----------+----------+-------------------------------------------+
     86|   XER    |    Y     |  Fixed-point exception register.          |
     87+----------+----------+-------------------------------------------+
     88|  CR0-1   |    Y     |  Condition register fields.               |
     89+----------+----------+-------------------------------------------+
     90|  CR2-4   |    N     |  Condition register fields.               |
     91+----------+----------+-------------------------------------------+
     92|  CR5-7   |    Y     |  Condition register fields.               |
     93+----------+----------+-------------------------------------------+
     94|  Others  |    N     |                                           |
     95+----------+----------+-------------------------------------------+
     96
     97DRC & DRC Indexes
     98=================
     99::
    100
    101     DR1                                  Guest
    102     +--+        +------------+         +---------+
    103     |  | <----> |            |         |  User   |
    104     +--+  DRC1  |            |   DRC   |  Space  |
    105                 |    PAPR    |  Index  +---------+
    106     DR2         | Hypervisor |         |         |
    107     +--+        |            | <-----> |  Kernel |
    108     |  | <----> |            |  Hcall  |         |
    109     +--+  DRC2  +------------+         +---------+
    110
    111PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
    112available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
    113an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
    114to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
    115called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
    116where its present as an attribute in the device tree node associated with the
    117DR.
    118
    119HCALL Return-values
    120===================
    121
    122After servicing the hcall, hypervisor sets the return-value in *r3* indicating
    123success or failure of the hcall. In case of a failure an error code indicates
    124the cause for error. These codes are defined and documented in arch specific
    125header [4]_.
    126
    127In some cases a hcall can potentially take a long time and need to be issued
    128multiple times in order to be completely serviced. These hcalls will usually
    129accept an opaque value *continue-token* within there argument list and a
    130return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
    131servicing the hcall yet.
    132
    133To make such hcalls the guest need to set *continue-token == 0* for the
    134initial call and use the hypervisor returned value of *continue-token*
    135for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
    136return value.
    137
    138HCALL Op-codes
    139==============
    140
    141Below is a partial list of HCALLs that are supported by PHYP. For the
    142corresponding opcode values please look into the arch specific header [4]_:
    143
    144**H_SCM_READ_METADATA**
    145
    146| Input: *drcIndex, offset, buffer-address, numBytesToRead*
    147| Out: *numBytesRead*
    148| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
    149
    150Given a DRC Index of an NVDIMM, read N-bytes from the metadata area
    151associated with it, at a specified offset and copy it to provided buffer.
    152The metadata area stores configuration information such as label information,
    153bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
    154area hence a separate access semantics is provided.
    155
    156**H_SCM_WRITE_METADATA**
    157
    158| Input: *drcIndex, offset, data, numBytesToWrite*
    159| Out: *None*
    160| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
    161
    162Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
    163associated with it, at the specified offset and from the provided buffer.
    164
    165**H_SCM_BIND_MEM**
    166
    167| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
    168| *targetLogicalMemoryAddress, continue-token*
    169| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
    170| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
    171| *H_Too_Big, H_P5, H_Busy*
    172
    173Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
    174*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
    175at *targetLogicalMemoryAddress* within guest physical address space. In
    176case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
    177assigns a target address to the guest. The HCALL can fail if the Guest has
    178an active PTE entry to the SCM block being bound.
    179
    180**H_SCM_UNBIND_MEM**
    181| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
    182| Out: numScmBlocksUnbound
    183| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
    184| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
    185
    186Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
    187at *startingScmLogicalMemoryAddress* from guest physical address space. The
    188HCALL can fail if the Guest has an active PTE entry to the SCM block being
    189unbound.
    190
    191**H_SCM_QUERY_BLOCK_MEM_BINDING**
    192
    193| Input: *drcIndex, scmBlockIndex*
    194| Out: *Guest-Physical-Address*
    195| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
    196
    197Given a DRC-Index and an SCM Block index return the guest physical address to
    198which the SCM block is mapped to.
    199
    200**H_SCM_QUERY_LOGICAL_MEM_BINDING**
    201
    202| Input: *Guest-Physical-Address*
    203| Out: *drcIndex, scmBlockIndex*
    204| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
    205
    206Given a guest physical address return which DRC Index and SCM block is mapped
    207to that address.
    208
    209**H_SCM_UNBIND_ALL**
    210
    211| Input: *scmTargetScope, drcIndex*
    212| Out: *None*
    213| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
    214| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
    215
    216Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
    217or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
    218from the LPAR memory.
    219
    220**H_SCM_HEALTH**
    221
    222| Input: drcIndex
    223| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
    224| Return Value: *H_Success, H_Parameter, H_Hardware*
    225
    226Given a DRC Index return the info on predictive failure and overall health of
    227the PMEM device. The asserted bits in the health-bitmap indicate one or more states
    228(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
    229which bits in health-bitmap are valid. The bits are reported in
    230reverse bit ordering for example a value of 0xC400000000000000
    231indicates bits 0, 1, and 5 are valid.
    232
    233Health Bitmap Flags:
    234
    235+------+-----------------------------------------------------------------------+
    236|  Bit |               Definition                                              |
    237+======+=======================================================================+
    238|  00  | PMEM device is unable to persist memory contents.                     |
    239|      | If the system is powered down, nothing will be saved.                 |
    240+------+-----------------------------------------------------------------------+
    241|  01  | PMEM device failed to persist memory contents. Either contents were   |
    242|      | not saved successfully on power down or were not restored properly on |
    243|      | power up.                                                             |
    244+------+-----------------------------------------------------------------------+
    245|  02  | PMEM device contents are persisted from previous IPL. The data from   |
    246|      | the last boot were successfully restored.                             |
    247+------+-----------------------------------------------------------------------+
    248|  03  | PMEM device contents are not persisted from previous IPL. There was no|
    249|      | data to restore from the last boot.                                   |
    250+------+-----------------------------------------------------------------------+
    251|  04  | PMEM device memory life remaining is critically low                   |
    252+------+-----------------------------------------------------------------------+
    253|  05  | PMEM device will be garded off next IPL due to failure                |
    254+------+-----------------------------------------------------------------------+
    255|  06  | PMEM device contents cannot persist due to current platform health    |
    256|      | status. A hardware failure may prevent data from being saved or       |
    257|      | restored.                                                             |
    258+------+-----------------------------------------------------------------------+
    259|  07  | PMEM device is unable to persist memory contents in certain conditions|
    260+------+-----------------------------------------------------------------------+
    261|  08  | PMEM device is encrypted                                              |
    262+------+-----------------------------------------------------------------------+
    263|  09  | PMEM device has successfully completed a requested erase or secure    |
    264|      | erase procedure.                                                      |
    265+------+-----------------------------------------------------------------------+
    266|10:63 | Reserved / Unused                                                     |
    267+------+-----------------------------------------------------------------------+
    268
    269**H_SCM_PERFORMANCE_STATS**
    270
    271| Input: drcIndex, resultBuffer Addr
    272| Out: None
    273| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
    274
    275Given a DRC Index collect the performance statistics for NVDIMM and copy them
    276to the resultBuffer.
    277
    278**H_SCM_FLUSH**
    279
    280| Input: *drcIndex, continue-token*
    281| Out: *continue-token*
    282| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY*
    283
    284Given a DRC Index Flush the data to backend NVDIMM device.
    285
    286The hcall returns H_BUSY when the flush takes longer time and the hcall needs
    287to be issued multiple times in order to be completely serviced. The
    288*continue-token* from the output to be passed in the argument list of
    289subsequent hcalls to the hypervisor until the hcall is completely serviced
    290at which point H_SUCCESS or other error is returned by the hypervisor.
    291
    292References
    293==========
    294.. [1] "Power Architecture Platform Reference"
    295       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
    296.. [2] "Linux on Power Architecture Platform Reference"
    297       https://members.openpowerfoundation.org/document/dl/469
    298.. [3] "Definitions and Notation" Book III-Section 14.5.3
    299       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
    300.. [4] arch/powerpc/include/asm/hvcall.h
    301.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"
    302       https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture