cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

ppc-pv.rst (8144B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=================================
      4The PPC KVM paravirtual interface
      5=================================
      6
      7The basic execution principle by which KVM on PowerPC works is to run all kernel
      8space code in PR=1 which is user space. This way we trap all privileged
      9instructions and can emulate them accordingly.
     10
     11Unfortunately that is also the downfall. There are quite some privileged
     12instructions that needlessly return us to the hypervisor even though they
     13could be handled differently.
     14
     15This is what the PPC PV interface helps with. It takes privileged instructions
     16and transforms them into unprivileged ones with some help from the hypervisor.
     17This cuts down virtualization costs by about 50% on some of my benchmarks.
     18
     19The code for that interface can be found in arch/powerpc/kernel/kvm*
     20
     21Querying for existence
     22======================
     23
     24To find out if we're running on KVM or not, we leverage the device tree. When
     25Linux is running on KVM, a node /hypervisor exists. That node contains a
     26compatible property with the value "linux,kvm".
     27
     28Once you determined you're running under a PV capable KVM, you can now use
     29hypercalls as described below.
     30
     31KVM hypercalls
     32==============
     33
     34Inside the device tree's /hypervisor node there's a property called
     35'hypercall-instructions'. This property contains at most 4 opcodes that make
     36up the hypercall. To call a hypercall, just call these instructions.
     37
     38The parameters are as follows:
     39
     40        ========	================	================
     41	Register	IN			OUT
     42        ========	================	================
     43	r0		-			volatile
     44	r3		1st parameter		Return code
     45	r4		2nd parameter		1st output value
     46	r5		3rd parameter		2nd output value
     47	r6		4th parameter		3rd output value
     48	r7		5th parameter		4th output value
     49	r8		6th parameter		5th output value
     50	r9		7th parameter		6th output value
     51	r10		8th parameter		7th output value
     52	r11		hypercall number	8th output value
     53	r12		-			volatile
     54        ========	================	================
     55
     56Hypercall definitions are shared in generic code, so the same hypercall numbers
     57apply for x86 and powerpc alike with the exception that each KVM hypercall
     58also needs to be ORed with the KVM vendor code which is (42 << 16).
     59
     60Return codes can be as follows:
     61
     62	====		=========================
     63	Code		Meaning
     64	====		=========================
     65	0		Success
     66	12		Hypercall not implemented
     67	<0		Error
     68	====		=========================
     69
     70The magic page
     71==============
     72
     73To enable communication between the hypervisor and guest there is a new shared
     74page that contains parts of supervisor visible register state. The guest can
     75map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
     76
     77With this hypercall issued the guest always gets the magic page mapped at the
     78desired location. The first parameter indicates the effective address when the
     79MMU is enabled. The second parameter indicates the address in real mode, if
     80applicable to the target. For now, we always map the page to -4096. This way we
     81can access it using absolute load and store functions. The following
     82instruction reads the first field of the magic page::
     83
     84	ld	rX, -4096(0)
     85
     86The interface is designed to be extensible should there be need later to add
     87additional registers to the magic page. If you add fields to the magic page,
     88also define a new hypercall feature to indicate that the host can give you more
     89registers. Only if the host supports the additional features, make use of them.
     90
     91The magic page layout is described by struct kvm_vcpu_arch_shared
     92in arch/powerpc/include/asm/kvm_para.h.
     93
     94Magic page features
     95===================
     96
     97When mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE,
     98a second return value is passed to the guest. This second return value contains
     99a bitmap of available features inside the magic page.
    100
    101The following enhancements to the magic page are currently available:
    102
    103  ============================  =======================================
    104  KVM_MAGIC_FEAT_SR		Maps SR registers r/w in the magic page
    105  KVM_MAGIC_FEAT_MAS0_TO_SPRG7	Maps MASn, ESR, PIR and high SPRGs
    106  ============================  =======================================
    107
    108For enhanced features in the magic page, please check for the existence of the
    109feature before using them!
    110
    111Magic page flags
    112================
    113
    114In addition to features that indicate whether a host is capable of a particular
    115feature we also have a channel for a guest to tell the guest whether it's capable
    116of something. This is what we call "flags".
    117
    118Flags are passed to the host in the low 12 bits of the Effective Address.
    119
    120The following flags are currently available for a guest to expose:
    121
    122  MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correctly wrt magic page
    123
    124MSR bits
    125========
    126
    127The MSR contains bits that require hypervisor intervention and bits that do
    128not require direct hypervisor intervention because they only get interpreted
    129when entering the guest or don't have any impact on the hypervisor's behavior.
    130
    131The following bits are safe to be set inside the guest:
    132
    133  - MSR_EE
    134  - MSR_RI
    135
    136If any other bit changes in the MSR, please still use mtmsr(d).
    137
    138Patched instructions
    139====================
    140
    141The "ld" and "std" instructions are transformed to "lwz" and "stw" instructions
    142respectively on 32 bit systems with an added offset of 4 to accommodate for big
    143endianness.
    144
    145The following is a list of mapping the Linux kernel performs when running as
    146guest. Implementing any of those mappings is optional, as the instruction traps
    147also act on the shared page. So calling privileged instructions still works as
    148before.
    149
    150======================= ================================
    151From			To
    152======================= ================================
    153mfmsr	rX		ld	rX, magic_page->msr
    154mfsprg	rX, 0		ld	rX, magic_page->sprg0
    155mfsprg	rX, 1		ld	rX, magic_page->sprg1
    156mfsprg	rX, 2		ld	rX, magic_page->sprg2
    157mfsprg	rX, 3		ld	rX, magic_page->sprg3
    158mfsrr0	rX		ld	rX, magic_page->srr0
    159mfsrr1	rX		ld	rX, magic_page->srr1
    160mfdar	rX		ld	rX, magic_page->dar
    161mfdsisr	rX		lwz	rX, magic_page->dsisr
    162
    163mtmsr	rX		std	rX, magic_page->msr
    164mtsprg	0, rX		std	rX, magic_page->sprg0
    165mtsprg	1, rX		std	rX, magic_page->sprg1
    166mtsprg	2, rX		std	rX, magic_page->sprg2
    167mtsprg	3, rX		std	rX, magic_page->sprg3
    168mtsrr0	rX		std	rX, magic_page->srr0
    169mtsrr1	rX		std	rX, magic_page->srr1
    170mtdar	rX		std	rX, magic_page->dar
    171mtdsisr	rX		stw	rX, magic_page->dsisr
    172
    173tlbsync			nop
    174
    175mtmsrd	rX, 0		b	<special mtmsr section>
    176mtmsr	rX		b	<special mtmsr section>
    177
    178mtmsrd	rX, 1		b	<special mtmsrd section>
    179
    180[Book3S only]
    181mtsrin	rX, rY		b	<special mtsrin section>
    182
    183[BookE only]
    184wrteei	[0|1]		b	<special wrteei section>
    185======================= ================================
    186
    187Some instructions require more logic to determine what's going on than a load
    188or store instruction can deliver. To enable patching of those, we keep some
    189RAM around where we can live translate instructions to. What happens is the
    190following:
    191
    192	1) copy emulation code to memory
    193	2) patch that code to fit the emulated instruction
    194	3) patch that code to return to the original pc + 4
    195	4) patch the original instruction to branch to the new code
    196
    197That way we can inject an arbitrary amount of code as replacement for a single
    198instruction. This allows us to check for pending interrupts when setting EE=1
    199for example.
    200
    201Hypercall ABIs in KVM on PowerPC
    202=================================
    203
    2041) KVM hypercalls (ePAPR)
    205
    206These are ePAPR compliant hypercall implementation (mentioned above). Even
    207generic hypercalls are implemented here, like the ePAPR idle hcall. These are
    208available on all targets.
    209
    2102) PAPR hypercalls
    211
    212PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU).
    213These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of
    214them are handled in the kernel, some are handled in user space. This is only
    215available on book3s_64.
    216
    2173) OSI hypercalls
    218
    219Mac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long
    220before KVM). This is supported to maintain compatibility. All these hypercalls get
    221forwarded to user space. This is only useful on book3s_32, but can be used with
    222book3s_64 as well.