cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

msr.rst (13726B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=================
      4KVM-specific MSRs
      5=================
      6
      7:Author: Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
      8
      9KVM makes use of some custom MSRs to service some requests.
     10
     11Custom MSRs have a range reserved for them, that goes from
     120x4b564d00 to 0x4b564dff. There are MSRs outside this area,
     13but they are deprecated and their use is discouraged.
     14
     15Custom MSR list
     16---------------
     17
     18The current supported Custom MSR list is:
     19
     20MSR_KVM_WALL_CLOCK_NEW:
     21	0x4b564d00
     22
     23data:
     24	4-byte alignment physical address of a memory area which must be
     25	in guest RAM. This memory is expected to hold a copy of the following
     26	structure::
     27
     28	 struct pvclock_wall_clock {
     29		u32   version;
     30		u32   sec;
     31		u32   nsec;
     32	  } __attribute__((__packed__));
     33
     34	whose data will be filled in by the hypervisor. The hypervisor is only
     35	guaranteed to update this data at the moment of MSR write.
     36	Users that want to reliably query this information more than once have
     37	to write more than once to this MSR. Fields have the following meanings:
     38
     39	version:
     40		guest has to check version before and after grabbing
     41		time information and check that they are both equal and even.
     42		An odd version indicates an in-progress update.
     43
     44	sec:
     45		 number of seconds for wallclock at time of boot.
     46
     47	nsec:
     48		 number of nanoseconds for wallclock at time of boot.
     49
     50	In order to get the current wallclock time, the system_time from
     51	MSR_KVM_SYSTEM_TIME_NEW needs to be added.
     52
     53	Note that although MSRs are per-CPU entities, the effect of this
     54	particular MSR is global.
     55
     56	Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
     57	leaf prior to usage.
     58
     59MSR_KVM_SYSTEM_TIME_NEW:
     60	0x4b564d01
     61
     62data:
     63	4-byte aligned physical address of a memory area which must be in
     64	guest RAM, plus an enable bit in bit 0. This memory is expected to hold
     65	a copy of the following structure::
     66
     67	  struct pvclock_vcpu_time_info {
     68		u32   version;
     69		u32   pad0;
     70		u64   tsc_timestamp;
     71		u64   system_time;
     72		u32   tsc_to_system_mul;
     73		s8    tsc_shift;
     74		u8    flags;
     75		u8    pad[2];
     76	  } __attribute__((__packed__)); /* 32 bytes */
     77
     78	whose data will be filled in by the hypervisor periodically. Only one
     79	write, or registration, is needed for each VCPU. The interval between
     80	updates of this structure is arbitrary and implementation-dependent.
     81	The hypervisor may update this structure at any time it sees fit until
     82	anything with bit0 == 0 is written to it.
     83
     84	Fields have the following meanings:
     85
     86	version:
     87		guest has to check version before and after grabbing
     88		time information and check that they are both equal and even.
     89		An odd version indicates an in-progress update.
     90
     91	tsc_timestamp:
     92		the tsc value at the current VCPU at the time
     93		of the update of this structure. Guests can subtract this value
     94		from current tsc to derive a notion of elapsed time since the
     95		structure update.
     96
     97	system_time:
     98		a host notion of monotonic time, including sleep
     99		time at the time this structure was last updated. Unit is
    100		nanoseconds.
    101
    102	tsc_to_system_mul:
    103		multiplier to be used when converting
    104		tsc-related quantity to nanoseconds
    105
    106	tsc_shift:
    107		shift to be used when converting tsc-related
    108		quantity to nanoseconds. This shift will ensure that
    109		multiplication with tsc_to_system_mul does not overflow.
    110		A positive value denotes a left shift, a negative value
    111		a right shift.
    112
    113		The conversion from tsc to nanoseconds involves an additional
    114		right shift by 32 bits. With this information, guests can
    115		derive per-CPU time by doing::
    116
    117			time = (current_tsc - tsc_timestamp)
    118			if (tsc_shift >= 0)
    119				time <<= tsc_shift;
    120			else
    121				time >>= -tsc_shift;
    122			time = (time * tsc_to_system_mul) >> 32
    123			time = time + system_time
    124
    125	flags:
    126		bits in this field indicate extended capabilities
    127		coordinated between the guest and the hypervisor. Availability
    128		of specific flags has to be checked in 0x40000001 cpuid leaf.
    129		Current flags are:
    130
    131
    132		+-----------+--------------+----------------------------------+
    133		| flag bit  | cpuid bit    | meaning			      |
    134		+-----------+--------------+----------------------------------+
    135		|	    |		   | time measures taken across       |
    136		|    0      |	   24      | multiple cpus are guaranteed to  |
    137		|	    |		   | be monotonic		      |
    138		+-----------+--------------+----------------------------------+
    139		|	    |		   | guest vcpu has been paused by    |
    140		|    1	    |	  N/A	   | the host			      |
    141		|	    |		   | See 4.70 in api.txt	      |
    142		+-----------+--------------+----------------------------------+
    143
    144	Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
    145	leaf prior to usage.
    146
    147
    148MSR_KVM_WALL_CLOCK:
    149	0x11
    150
    151data and functioning:
    152	same as MSR_KVM_WALL_CLOCK_NEW. Use that instead.
    153
    154	This MSR falls outside the reserved KVM range and may be removed in the
    155	future. Its usage is deprecated.
    156
    157	Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
    158	leaf prior to usage.
    159
    160MSR_KVM_SYSTEM_TIME:
    161	0x12
    162
    163data and functioning:
    164	same as MSR_KVM_SYSTEM_TIME_NEW. Use that instead.
    165
    166	This MSR falls outside the reserved KVM range and may be removed in the
    167	future. Its usage is deprecated.
    168
    169	Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
    170	leaf prior to usage.
    171
    172	The suggested algorithm for detecting kvmclock presence is then::
    173
    174		if (!kvm_para_available())    /* refer to cpuid.txt */
    175			return NON_PRESENT;
    176
    177		flags = cpuid_eax(0x40000001);
    178		if (flags & 3) {
    179			msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
    180			msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
    181			return PRESENT;
    182		} else if (flags & 0) {
    183			msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
    184			msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
    185			return PRESENT;
    186		} else
    187			return NON_PRESENT;
    188
    189MSR_KVM_ASYNC_PF_EN:
    190	0x4b564d02
    191
    192data:
    193	Asynchronous page fault (APF) control MSR.
    194
    195	Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area
    196	which must be in guest RAM and must be zeroed. This memory is expected
    197	to hold a copy of the following structure::
    198
    199	  struct kvm_vcpu_pv_apf_data {
    200		/* Used for 'page not present' events delivered via #PF */
    201		__u32 flags;
    202
    203		/* Used for 'page ready' events delivered via interrupt notification */
    204		__u32 token;
    205
    206		__u8 pad[56];
    207		__u32 enabled;
    208	  };
    209
    210	Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1
    211	when asynchronous page faults are enabled on the vcpu, 0 when disabled.
    212	Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in
    213	cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as
    214	#PF vmexits.  Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is
    215	present in CPUID. Bit 3 enables interrupt based delivery of 'page ready'
    216	events. Bit 3 can only be set if KVM_FEATURE_ASYNC_PF_INT is present in
    217	CPUID.
    218
    219	'Page not present' events are currently always delivered as synthetic
    220	#PF exception. During delivery of these events APF CR2 register contains
    221	a token that will be used to notify the guest when missing page becomes
    222	available. Also, to make it possible to distinguish between real #PF and
    223	APF, first 4 bytes of 64 byte memory location ('flags') will be written
    224	to by the hypervisor at the time of injection. Only first bit of 'flags'
    225	is currently supported, when set, it indicates that the guest is dealing
    226	with asynchronous 'page not present' event. If during a page fault APF
    227	'flags' is '0' it means that this is regular page fault. Guest is
    228	supposed to clear 'flags' when it is done handling #PF exception so the
    229	next event can be delivered.
    230
    231	Note, since APF 'page not present' events use the same exception vector
    232	as regular page fault, guest must reset 'flags' to '0' before it does
    233	something that can generate normal page fault.
    234
    235	Bytes 5-7 of 64 byte memory location ('token') will be written to by the
    236	hypervisor at the time of APF 'page ready' event injection. The content
    237	of these bytes is a token which was previously delivered as 'page not
    238	present' event. The event indicates the page in now available. Guest is
    239	supposed to write '0' to 'token' when it is done handling 'page ready'
    240	event and to write 1' to MSR_KVM_ASYNC_PF_ACK after clearing the location;
    241	writing to the MSR forces KVM to re-scan its queue and deliver the next
    242	pending notification.
    243
    244	Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page
    245	ready' APF delivery needs to be written to before enabling APF mechanism
    246	in MSR_KVM_ASYNC_PF_EN or interrupt #0 can get injected. The MSR is
    247	available if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
    248
    249	Note, previously, 'page ready' events were delivered via the same #PF
    250	exception as 'page not present' events but this is now deprecated. If
    251	bit 3 (interrupt based delivery) is not set APF events are not delivered.
    252
    253	If APF is disabled while there are outstanding APFs, they will
    254	not be delivered.
    255
    256	Currently 'page ready' APF events will be always delivered on the
    257	same vcpu as 'page not present' event was, but guest should not rely on
    258	that.
    259
    260MSR_KVM_STEAL_TIME:
    261	0x4b564d03
    262
    263data:
    264	64-byte alignment physical address of a memory area which must be
    265	in guest RAM, plus an enable bit in bit 0. This memory is expected to
    266	hold a copy of the following structure::
    267
    268	  struct kvm_steal_time {
    269		__u64 steal;
    270		__u32 version;
    271		__u32 flags;
    272		__u8  preempted;
    273		__u8  u8_pad[3];
    274		__u32 pad[11];
    275	  }
    276
    277	whose data will be filled in by the hypervisor periodically. Only one
    278	write, or registration, is needed for each VCPU. The interval between
    279	updates of this structure is arbitrary and implementation-dependent.
    280	The hypervisor may update this structure at any time it sees fit until
    281	anything with bit0 == 0 is written to it. Guest is required to make sure
    282	this structure is initialized to zero.
    283
    284	Fields have the following meanings:
    285
    286	version:
    287		a sequence counter. In other words, guest has to check
    288		this field before and after grabbing time information and make
    289		sure they are both equal and even. An odd version indicates an
    290		in-progress update.
    291
    292	flags:
    293		At this point, always zero. May be used to indicate
    294		changes in this structure in the future.
    295
    296	steal:
    297		the amount of time in which this vCPU did not run, in
    298		nanoseconds. Time during which the vcpu is idle, will not be
    299		reported as steal time.
    300
    301	preempted:
    302		indicate the vCPU who owns this struct is running or
    303		not. Non-zero values mean the vCPU has been preempted. Zero
    304		means the vCPU is not preempted. NOTE, it is always zero if the
    305		the hypervisor doesn't support this field.
    306
    307MSR_KVM_EOI_EN:
    308	0x4b564d04
    309
    310data:
    311	Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
    312	when disabled.  Bit 1 is reserved and must be zero.  When PV end of
    313	interrupt is enabled (bit 0 set), bits 63-2 hold a 4-byte aligned
    314	physical address of a 4 byte memory area which must be in guest RAM and
    315	must be zeroed.
    316
    317	The first, least significant bit of 4 byte memory location will be
    318	written to by the hypervisor, typically at the time of interrupt
    319	injection.  Value of 1 means that guest can skip writing EOI to the apic
    320	(using MSR or MMIO write); instead, it is sufficient to signal
    321	EOI by clearing the bit in guest memory - this location will
    322	later be polled by the hypervisor.
    323	Value of 0 means that the EOI write is required.
    324
    325	It is always safe for the guest to ignore the optimization and perform
    326	the APIC EOI write anyway.
    327
    328	Hypervisor is guaranteed to only modify this least
    329	significant bit while in the current VCPU context, this means that
    330	guest does not need to use either lock prefix or memory ordering
    331	primitives to synchronise with the hypervisor.
    332
    333	However, hypervisor can set and clear this memory bit at any time:
    334	therefore to make sure hypervisor does not interrupt the
    335	guest and clear the least significant bit in the memory area
    336	in the window between guest testing it to detect
    337	whether it can skip EOI apic write and between guest
    338	clearing it to signal EOI to the hypervisor,
    339	guest must both read the least significant bit in the memory area and
    340	clear it using a single CPU instruction, such as test and clear, or
    341	compare and exchange.
    342
    343MSR_KVM_POLL_CONTROL:
    344	0x4b564d05
    345
    346	Control host-side polling.
    347
    348data:
    349	Bit 0 enables (1) or disables (0) host-side HLT polling logic.
    350
    351	KVM guests can request the host not to poll on HLT, for example if
    352	they are performing polling themselves.
    353
    354MSR_KVM_ASYNC_PF_INT:
    355	0x4b564d06
    356
    357data:
    358	Second asynchronous page fault (APF) control MSR.
    359
    360	Bits 0-7: APIC vector for delivery of 'page ready' APF events.
    361	Bits 8-63: Reserved
    362
    363	Interrupt vector for asynchnonous 'page ready' notifications delivery.
    364	The vector has to be set up before asynchronous page fault mechanism
    365	is enabled in MSR_KVM_ASYNC_PF_EN.  The MSR is only available if
    366	KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
    367
    368MSR_KVM_ASYNC_PF_ACK:
    369	0x4b564d07
    370
    371data:
    372	Asynchronous page fault (APF) acknowledgment.
    373
    374	When the guest is done processing 'page ready' APF event and 'token'
    375	field in 'struct kvm_vcpu_pv_apf_data' is cleared it is supposed to
    376	write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
    377	and check if there are more notifications pending. The MSR is available
    378	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
    379
    380MSR_KVM_MIGRATION_CONTROL:
    381        0x4b564d08
    382
    383data:
    384        This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
    385        CPUID.  Bit 0 represents whether live migration of the guest is allowed.
    386
    387        When a guest is started, bit 0 will be 0 if the guest has encrypted
    388        memory and 1 if the guest does not have encrypted memory.  If the
    389        guest is communicating page encryption status to the host using the
    390        ``KVM_HC_MAP_GPA_RANGE`` hypercall, it can set bit 0 in this MSR to
    391        allow live migration of the guest.