summaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'sev-snp-iommu-avic_5.19-rc6_v4' of github.com:AMDESE/linuxLouis Burda2023-01-312-2/+0
|\
| * Revert "KVM: SEV: add cache flush to solve SEV cache incoherency issues"Ashish Kalra2022-09-282-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 683412ccf61294d727ead4a73d97397396e69a6b. Need to revert this commit to fix soft-lockup and RCU stall issues on both SNP host and guest. The wbinvd_on_all_cpus() invoked from the MMU invalidation notifiers as part of this patch adds a lot of additional overhead and latencies on SNP host kernel especially with large physical count cpus during NUMA autobalancing and host RMP page fault handling. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
* | Minimize diff to 0aaa1e5 and small restructureLouis Burda2023-01-111-3/+8
| |
* | Refactor more code out into repo filesLouis Burda2022-10-201-1/+1
| |
* | Move kvm_page_track_mode to sevstep uapiLouis Burda2022-10-051-8/+1
| |
* | Refactor out sevstep into cachepc repositoryLouis Burda2022-10-052-250/+0
| |
* | Add page trackingLouis Burda2022-10-042-0/+256
|/
* KVM: x86: Tag kvm_mmu_x86_module_init() with __initSean Christopherson2022-08-051-1/+1
| | | | | | | | | | | | | Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists is to initialize variables when kvm.ko is loaded, i.e. it must never be called after module initialization. Fixes: 1d0e84806047 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com>
* KVM: SVM: Do not activate AVIC for SNP-enabled systemSuravee Suthikulpanit2022-07-131-0/+5
| | | | | | | Since current AVIC implementation cannot support AMD Secure Nested Paging (SNP), inhibit AVIC for SNP-enabled system. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
* KVM: SVM: Support SEV-SNP AP Creation NAE eventTom Lendacky2022-07-133-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP guests to alter the register state of the APs on their own. This allows the guest a way of simulating INIT-SIPI. A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used so as to avoid updating the VMSA pointer while the vCPU is running. For CREATE The guest supplies the GPA of the VMSA to be used for the vCPU with the specified APIC ID. The GPA is saved in the svm struct of the target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the vCPU and then the vCPU is kicked. For CREATE_ON_INIT: The guest supplies the GPA of the VMSA to be used for the vCPU with the specified APIC ID the next time an INIT is performed. The GPA is saved in the svm struct of the target vCPU. For DESTROY: The guest indicates it wishes to stop the vCPU. The GPA is cleared from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to vCPU and then the vCPU is kicked. The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as a result of the event or as a result of an INIT. The handler sets the vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the vCPU as not runnable. Any previous VMSA pages that were installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is to be installed, the VMSA guest page is pinned and set as the VMSA in the vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being run. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: x86: Export the kvm_zap_gfn_range() for the SNP useBrijesh Singh2022-07-131-0/+2
| | | | | | | | | | | | | While resolving the RMP page fault, we may run into cases where the page level between the RMP entry and TDP does not match and the 2M RMP entry must be split into 4K RMP entries. Or a 2M TDP page need to be broken into multiple of 4K pages. To keep the RMP and TDP page level in sync, we will zap the gfn range after splitting the pages in the RMP entry. The zap should force the TDP to gets rebuilt with the new page level. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Introduce ops for the post gfn map and unmapBrijesh Singh2022-07-132-0/+4
| | | | | | | | | | | | | | | | | | | | | When SEV-SNP is enabled in the guest VM, the guest memory pages can either be a private or shared. A write from the hypervisor goes through the RMP checks. If hardware sees that hypervisor is attempting to write to a guest private page, then it triggers an RMP violation #PF. To avoid the RMP violation with GHCB pages, added new post_{map,unmap}_gfn functions to verify if its safe to map GHCB pages. Uses a spinlock to protect against the page state change for existing mapped pages. Need to add generic post_{map,unmap}_gfn() ops that can be used to verify that its safe to map a given guest page in the hypervisor. This patch will need to be revisited later after consensus is reached on how to manage guest private memory as probably UPM private memslots will be able to handle this page state change more gracefully. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off by: Ashish Kalra <ashish.kalra@amd.com>
* KVM: SVM: Add support to handle Page State Change VMGEXITBrijesh Singh2022-07-131-0/+7
| | | | | | | | SEV-SNP VMs can ask the hypervisor to change the page state in the RMP table to be private or shared using the Page State Change NAE event as defined in the GHCB specification version 2. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Add support to handle MSR based Page State Change VMGEXITBrijesh Singh2022-07-131-0/+9
| | | | | | | | | | | | | | SEV-SNP VMs can ask the hypervisor to change the page state in the RMP table to be private or shared using the Page State Change MSR protocol as defined in the GHCB specification. Before changing the page state in the RMP entry, lookup the page in the NPT to make sure that there is a valid mapping for it. If the mapping exist then try to find a workable page level between the NPT and RMP for the page. If the page is not mapped in the NPT, then create a fault such that it gets mapped before we change the page state in the RMP entry. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Add support to handle GHCB GPA register VMGEXITBrijesh Singh2022-07-131-0/+8
| | | | | | | | | | | | SEV-SNP guests are required to perform a GHCB GPA registration. Before using a GHCB GPA for a vCPU the first time, a guest must register the vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then it must respond back with the same GPA otherwise return -1. On VMEXIT, Verify that GHCB GPA matches with the registered value. If a mismatch is detected then abort the guest. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: x86: Define RMP page fault error bits for #NPFBrijesh Singh2022-07-131-0/+8
| | | | | | | | | | | When SEV-SNP is enabled globally, the hardware places restrictions on all memory accesses based on the RMP entry, whether the hypervisor or a VM, performs the accesses. When hardware encounters an RMP access violation during a guest access, it will cause a #VMEXIT(NPF). See APM2 section 16.36.10 for more details. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: X86: Keep the NPT and RMP page level in syncBrijesh Singh2022-07-132-0/+2
| | | | | | | | | | When running an SEV-SNP VM, the sPA used to index the RMP entry is obtained through the NPT translation (gva->gpa->spa). The NPT page level is checked against the page level programmed in the RMP entry. If the page level does not match, then it will cause a nested page fault with the RMP bit set to indicate the RMP violation. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Add KVM_SNP_INIT commandBrijesh Singh2022-07-131-0/+1
| | | | | | | | | | | | | | The KVM_SNP_INIT command is used by the hypervisor to initialize the SEV-SNP platform context. In a typical workflow, this command should be the first command issued. When creating SEV-SNP guest, the VMM must use this command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT. The flags value must be zero, it will be extended in future SNP support to communicate the optional features (such as restricted INT injection etc). Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safeBrijesh Singh2022-07-132-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Implement a workaround for an SNP erratum where the CPU will incorrectly signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the RMP entry of a VMCB, VMSA or AVIC backing page. When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC backing pages as "in-use" in the RMP after a successful VMRUN. This is done for _all_ VMs, not just SNP-Active VMs. If the hypervisor accesses an in-use page through a writable translation, the CPU will throw an RMP violation #PF. On early SNP hardware, if an in-use page is 2mb aligned and software accesses any part of the associated 2mb region with a hupage, the CPU will incorrectly treat the entire 2mb region as in-use and signal a spurious RMP violation #PF. The recommended is to not use the hugepage for the VMCB, VMSA or AVIC backing page. Add a generic allocator that will ensure that the page returns is not hugepage (2mb or 1gb) and is safe to be used when SEV-SNP is enabled. Co-developed-by: Marc Orr <marcorr@google.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Provide the Hypervisor Feature support VMGEXITBrijesh Singh2022-07-131-0/+2
| | | | | | | | | | Version 2 of the GHCB specification introduced advertisement of features that are supported by the Hypervisor. Now that KVM supports version 2 of the GHCB specification, bump the maximum supported protocol version. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* KVM: SVM: Add support to handle AP reset MSR protocolTom Lendacky2022-07-131-0/+2
| | | | | | | | Add support for AP Reset Hold being invoked using the GHCB MSR protocol, available in version 2 of the GHCB specification. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/fault: Add support to dump RMP entry on faultBrijesh Singh2022-07-131-0/+5
| | | | | | | | | When SEV-SNP is enabled globally, a write from the host goes through the RMP check. If the hardware encounters the check failure, then it raises the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help the debug. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/traps: Define RMP violation #PF error codeBrijesh Singh2022-07-131-7/+11
| | | | | | | | | Bit 31 in the page fault-error bit will be set when processor encounters an RMP violation. While at it, use the BIT_ULL() macro. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/sev: Add helper functions for RMPUPDATE and PSMASH instructionBrijesh Singh2022-07-131-0/+11
| | | | | | | | | | | | The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The hypervisor will use the instruction to add pages to the RMP table. See APM3 for details on the instruction operations. The PSMASH instruction expands a 2MB RMP entry into a corresponding set of contiguous 4KB-Page RMP entries. The hypervisor will use this instruction to adjust the RMP entry without invalidating the previous RMP entry. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/sev: Add RMP entry lookup helpersBrijesh Singh2022-07-131-0/+27
| | | | | | | | The snp_lookup_page_in_rmptable() can be used by the host to read the RMP entry for a given page. The RMP entry format is documented in AMD PPR, see https://bugzilla.kernel.org/attachment.cgi?id=296015. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/sev: set SYSCFG.MFMDBrijesh Singh2022-07-131-0/+3
| | | | | | | | | SEV-SNP FW >= 1.51 requires that SYSCFG.MFMD must be set. Subsequent CCP patches while require 1.51 as the minimum SEV-SNP firmware version. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/sev: Add the host SEV-SNP initialization supportBrijesh Singh2022-07-132-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory integrity guarantees of SEV-SNP are enforced through a new structure called the Reverse Map Table (RMP). The RMP is a single data structure shared across the system that contains one entry for every 4K page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to track the owner of each page of memory. Pages of memory can be owned by the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2 section 15.36.3 for more detail on RMP. The RMP table is used to enforce access control to memory. The table itself is not directly writable by the software. New CPU instructions (RMPUPDATE, PVALIDATE, RMPADJUST) are used to manipulate the RMP entries. Based on the platform configuration, the BIOS reserves the memory used for the RMP table. The start and end address of the RMP table must be queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and RMP_END are not set then disable the SEV-SNP feature. The SEV-SNP feature is enabled only after the RMP table is successfully initialized. RMP table entry format is non-architectural and it can vary by processor and is defined by the PPR. Restrict SNP support on the known CPU model and family for which the RMP table entry format is currently defined for. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-b: Ashish Kalra <ashish.kalra@amd.com>
* x86/cpufeatures: Add SEV-SNP CPU featureBrijesh Singh2022-07-131-0/+1
| | | | | | | | | Add CPU feature detection for Secure Encrypted Virtualization with Secure Nested Paging. This feature adds a strong memory integrity protection to help prevent malicious hypervisor-based attacks like data replay, memory re-mapping, and more. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
* x86/xen: Use clear_bss() for Xen PV guestsJuergen Gross2022-07-011-0/+3
| | | | | | | | | | | | | | | Instead of clearing the bss area in assembly code, use the clear_bss() function. This requires to pass the start_info address as parameter to xen_start_kernel() in order to avoid the xen_start_info being zeroed again. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
* Merge tag 'efi-urgent-for-v5.19-1' of ↵Linus Torvalds2022-06-211-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - remove pointless include of asm/efi.h, which does not exist on ia64 - fix DXE service marshalling prototype for mixed mode * tag 'efi-urgent-for-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/x86: libstub: Fix typo in __efi64_argmap* name efi: sysfb_efi: remove unnecessary <asm/efi.h> include
| * efi/x86: libstub: Fix typo in __efi64_argmap* nameEvgeniy Baskov2022-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | The actual name of the DXE services function used is set_memory_space_attributes(), not set_memory_space_descriptor(). Change EFI mixed mode helper macro name to match the function name. Fixes: 31f1a0edff78 ("efi/x86: libstub: Make DXE calls mixed mode safe") Signed-off-by: Evgeniy Baskov <baskov@ispras.ru> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
* | Merge tag 'x86-urgent-2022-06-19' of ↵Linus Torvalds2022-06-191-17/+21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Make RESERVE_BRK() work again with older binutils. The recent 'simplification' broke that. - Make early #VE handling increment RIP when successful. - Make the #VE code consistent vs. the RIP adjustments and add comments. - Handle load_unaligned_zeropad() across page boundaries correctly in #VE when the second page is shared. * tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page x86/tdx: Clarify RIP adjustments in #VE handler x86/tdx: Fix early #VE handling x86/mm: Fix RESERVE_BRK() for older binutils
| * | x86/mm: Fix RESERVE_BRK() for older binutilsJosh Poimboeuf2022-06-131-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With binutils 2.26, RESERVE_BRK() causes a build failure: /tmp/ccnGOKZ5.s: Assembler messages: /tmp/ccnGOKZ5.s:98: Error: missing ')' /tmp/ccnGOKZ5.s:98: Error: missing ')' /tmp/ccnGOKZ5.s:98: Error: missing ')' /tmp/ccnGOKZ5.s:98: Error: junk at end of line, first unrecognized character is `U' The problem is this line: RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE) Specifically, the INIT_PGT_BUF_SIZE macro which (via PAGE_SIZE's use _AC()) has a "1UL", which makes older versions of the assembler unhappy. Unfortunately the _AC() macro doesn't work for inline asm. Inline asm was only needed here to convince the toolchain to add the STT_NOBITS flag. However, if a C variable is placed in a section whose name is prefixed with ".bss", GCC and Clang automatically set STT_NOBITS. In fact, ".bss..page_aligned" already relies on this trick. So fix the build failure (and simplify the macro) by allocating the variable in C. Also, add NOLOAD to the ".brk" output section clause in the linker script. This is a failsafe in case the ".bss" prefix magic trick ever stops working somehow. If there's a section type mismatch, the GNU linker will force the ".brk" output section to be STT_NOBITS. The LLVM linker will fail with a "section type mismatch" error. Note this also changes the name of the variable from .brk.##name to __brk_##name. The variable names aren't actually used anywhere, so it's harmless. Fixes: a1e2c031ec39 ("x86/mm: Simplify RESERVE_BRK()") Reported-by: Joe Damato <jdamato@fastly.com> Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Joe Damato <jdamato@fastly.com> Link: https://lore.kernel.org/r/22d07a44c80d8e8e1e82b9a806ddc8c6bbb2606e.1654759036.git.jpoimboe@kernel.org
* | | Merge tag 'pci-v5.19-fixes-2' of ↵Linus Torvalds2022-06-172-5/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fix from Bjorn Helgaas: "Revert clipping of PCI host bridge windows to avoid E820 regions, which broke several machines by forcing unnecessary BAR reassignments (Hans de Goede)" * tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
| * | | x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"Hans de Goede2022-06-172-5/+8
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 4c5e242d3e93. Prior to 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions"), E820 regions did not affect PCI host bridge windows. We only looked at E820 regions and avoided them when allocating new MMIO space. If firmware PCI bridge window and BAR assignments used E820 regions, we left them alone. After 4c5e242d3e93, we removed E820 regions from the PCI host bridge windows before looking at BARs, so firmware assignments in E820 regions looked like errors, and we moved things around to fit in the space left (if any) after removing the E820 regions. This unnecessary BAR reassignment broke several machines. Guilherme reported that Steam Deck fails to boot after 4c5e242d3e93. We clipped the window that contained most 32-bit BARs: BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff] which forced us to reassign all those BARs, for example, this NVMe BAR: pci 0000:00:01.2: PCI bridge to [bus 01] pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff] pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit] pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff] pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit] All the reassignments were successful, so the devices should have been functional at the new addresses, but some were not. Andy reported a similar failure on an Intel MID platform. Benjamin reported a similar failure on a VMWare Fusion VM. Note: this is not a clean revert; this revert keeps the later change to make the clipping dependent on a new pci_use_e820 bool, moving the checking of this bool to arch_remove_reservations(). [bhelgaas: commit log, add more reporters and testers] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reported-by: Benjamin Coddington <bcodding@redhat.com> Reported-by: Jongman Heo <jongman.heo@gmail.com> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | Merge tag 'hyperv-fixes-signed-20220617' of ↵Linus Torvalds2022-06-171-0/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix hv_init_clocksource annotation (Masahiro Yamada) - Two bug fixes for vmbus driver (Saurabh Sengar) - Fix SEV negotiation (Tianyu Lan) - Fix comments in code (Xiang Wang) - One minor fix to HID driver (Michael Kelley) * tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM Drivers: hv: vmbus: Release cpu lock in error case HID: hyperv: Correctly access fields declared as __le16 clocksource: hyper-v: unexport __init-annotated hv_init_clocksource() Drivers: hv: Fix syntax errors in comments Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
| * | | x86/Hyper-V: Add SEV negotiate protocol support in Isolation VMTianyu Lan2022-06-151-0/+4
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyper-V Isolation VM current code uses sev_es_ghcb_hv_call() to read/write MSR via GHCB page and depends on the sev code. This may cause regression when sev code changes interface design. The latest SEV-ES code requires to negotiate GHCB version before reading/writing MSR via GHCB page and sev_es_ghcb_hv_call() doesn't work for Hyper-V Isolation VM. Add Hyper-V ghcb related implementation to decouple SEV and Hyper-V code. Negotiate GHCB version in the hyperv_init() and use the version to communicate with Hyper-V in the ghcb hv call function. Fixes: 2ea29c5abbc2 ("x86/sev: Save the negotiated GHCB version") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-06-141-2/+65
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...
| * | | KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky2022-06-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: document AVIC/APICv inhibit reasonsMaxim Levitsky2022-06-091-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini2022-06-0982-595/+1079
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into HEAD KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry
* | | | Merge tag 'x86-bugs-2022-06-01' of ↵Linus Torvalds2022-06-143-0/+28
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
| * | | KVM: x86/speculation: Disable Fill buffer clear within guestsPawan Gupta2022-05-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an accurate indicator on all CPUs of whether the VERW instruction will overwrite fill buffers. FB_CLEAR enumeration in IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not vulnerable to MDS/TAA, indicating that microcode does overwrite fill buffers. Guests running in VMM environments may not be aware of all the capabilities/vulnerabilities of the host CPU. Specifically, a guest may apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable to MDS/TAA even when the physical CPU is not. On CPUs that enumerate FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS during VMENTER and resetting on VMEXIT. For guests that enumerate FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM will not use FB_CLEAR_DIS. Irrespective of guest state, host overwrites CPU buffers before VMENTER to protect itself from an MMIO capable guest, as part of mitigation for MMIO Stale Data vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Add mitigation for Processor MMIO Stale DataPawan Gupta2022-05-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst. These vulnerabilities are broadly categorized as: Device Register Partial Write (DRPW): Some endpoint MMIO registers incorrectly handle writes that are smaller than the register size. Instead of aborting the write or only copying the correct subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than specified by the write transaction may be written to the register. On some processors, this may expose stale data from the fill buffers of the core that created the write transaction. Shared Buffers Data Sampling (SBDS): After propagators may have moved data around the uncore and copied stale data into client core fill buffers, processors affected by MFBDS can leak data from the fill buffer. Shared Buffers Data Read (SBDR): It is similar to Shared Buffer Data Sampling (SBDS) except that the data is directly read into the architectural software-visible state. An attacker can use these vulnerabilities to extract data from CPU fill buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill buffers using the VERW instruction before returning to a user or a guest. On CPUs not affected by MDS and TAA, user application cannot sample data from CPU fill buffers using MDS or TAA. A guest with MMIO access can still use DRPW or SBDR to extract data architecturally. Mitigate it with VERW instruction to clear fill buffers before VMENTER for MMIO capable guests. Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control the mitigation. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Enumerate Processor MMIO Stale Data bugPawan Gupta2022-05-212-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For more details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst Add the Processor MMIO Stale Data bug enumeration. A microcode update adds new bits to the MSR IA32_ARCH_CAPABILITIES, define them. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-06-082-1/+4
|\ \ \ \ | | |/ / | |/| / | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: - syzkaller NULL pointer dereference - TDP MMU performance issue with disabling dirty logging - 5.14 regression with SVM TSC scaling - indefinite stall on applying live patches - unstable selftest - memory leak from wrong copy-and-paste - missed PV TLB flush when racing with emulation * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not report a vCPU as preempted outside instruction boundaries KVM: x86: do not set st->preempted when going back to user space KVM: SVM: fix tsc scaling cache logic KVM: selftests: Make hyperv_clock selftest more stable KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set KVM: Don't null dereference ops->destroy
| * | KVM: x86: do not report a vCPU as preempted outside instruction boundariesPaolo Bonzini2022-06-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()Jan Beulich2022-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs and clobbering of "cc" don't work well together. The compiler appears to mean to reject such, but doesn't - in its upstream form - quite manage to yet for "cc". Furthermore two similar macros don't clobber "cc", and clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler always assumes status flags to be clobbered there. Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses") Signed-off-by: Jan Beulich <jbeulich@suse.com> Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge tag 'objtool-urgent-2022-06-05' of ↵Linus Torvalds2022-06-053-5/+9
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Thomas Gleixner: - Handle __ubsan_handle_builtin_unreachable() correctly and treat it as noreturn - Allow architectures to select uaccess validation - Use the non-instrumented bit test for test_cpu_has() to prevent escape from non-instrumentable regions - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape from non-instrumentable regions - Mark a few tiny inline as __always_inline to prevent GCC from bringing them out of line and instrumenting them - Mark the empty stub context_tracking_enabled() as always inline as GCC brings them out of line and instruments the empty shell - Annotate ex_handler_msr_mce() as dead end * tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/extable: Annotate ex_handler_msr_mce() as a dead end context_tracking: Always inline empty stubs x86: Always inline on_thread_stack() and current_top_of_stack() jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds x86/cpu: Elide KCSAN for cpu_has() and friends objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn objtool: Add CONFIG_HAVE_UACCESS_VALIDATION
| * | | x86/extable: Annotate ex_handler_msr_mce() as a dead endBorislav Petkov2022-05-271-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix vmlinux.o: warning: objtool: fixup_exception+0x2d6: unreachable instruction Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220520192729.23969-1-bp@alien8.de