acpi_hest_ghes.rst (6864B)
1APEI tables generating and CPER record 2====================================== 3 4.. 5 Copyright (c) 2020 HUAWEI TECHNOLOGIES CO., LTD. 6 7 This work is licensed under the terms of the GNU GPL, version 2 or later. 8 See the COPYING file in the top-level directory. 9 10Design Details 11-------------- 12 13:: 14 15 etc/acpi/tables etc/hardware_errors 16 ==================== =============================== 17 + +--------------------------+ +----------------------------+ 18 | | HEST | +--------->| error_block_address1 |------+ 19 | +--------------------------+ | +----------------------------+ | 20 | | GHES1 | | +------->| error_block_address2 |------+-+ 21 | +--------------------------+ | | +----------------------------+ | | 22 | | ................. | | | | .............. | | | 23 | | error_status_address-----+-+ | -----------------------------+ | | 24 | | ................. | | +--->| error_block_addressN |------+-+---+ 25 | | read_ack_register--------+-+ | | +----------------------------+ | | | 26 | | read_ack_preserve | +-+---+--->| read_ack_register1 | | | | 27 | | read_ack_write | | | +----------------------------+ | | | 28 + +--------------------------+ | +-+--->| read_ack_register2 | | | | 29 | | GHES2 | | | | +----------------------------+ | | | 30 + +--------------------------+ | | | | ............. | | | | 31 | | ................. | | | | +----------------------------+ | | | 32 | | error_status_address-----+---+ | | +->| read_ack_registerN | | | | 33 | | ................. | | | | +----------------------------+ | | | 34 | | read_ack_register--------+-----+ | | |Generic Error Status Block 1|<-----+ | | 35 | | read_ack_preserve | | | |-+------------------------+-+ | | 36 | | read_ack_write | | | | | CPER | | | | 37 + +--------------------------| | | | | CPER | | | | 38 | | ............... | | | | | .... | | | | 39 + +--------------------------+ | | | | CPER | | | | 40 | | GHESN | | | |-+------------------------+-| | | 41 + +--------------------------+ | | |Generic Error Status Block 2|<-------+ | 42 | | ................. | | | |-+------------------------+-+ | 43 | | error_status_address-----+-------+ | | | CPER | | | 44 | | ................. | | | | CPER | | | 45 | | read_ack_register--------+---------+ | | .... | | | 46 | | read_ack_preserve | | | CPER | | | 47 | | read_ack_write | +-+------------------------+-+ | 48 + +--------------------------+ | .......... | | 49 |----------------------------+ | 50 |Generic Error Status Block N |<----------+ 51 |-+-------------------------+-+ 52 | | CPER | | 53 | | CPER | | 54 | | .... | | 55 | | CPER | | 56 +-+-------------------------+-+ 57 58 59(1) QEMU generates the ACPI HEST table. This table goes in the current 60 "etc/acpi/tables" fw_cfg blob. Each error source has different 61 notification types. 62 63(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU 64 also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob 65 contains an address registers table and an Error Status Data Block table. 66 67(3) The address registers table contains N Error Block Address entries 68 and N Read Ack Register entries. The size for each entry is 8-byte. 69 The Error Status Data Block table contains N Error Status Data Block 70 entries. The size for each entry is 4096(0x1000) bytes. The total size 71 for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. 72 N is the number of the kinds of hardware error sources. 73 74(4) QEMU generates the ACPI linker/loader script for the firmware. The 75 firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors" 76 and copies blob contents there. 77 78(5) QEMU generates N ADD_POINTER commands, which patch addresses in the 79 "error_status_address" fields of the HEST table with a pointer to the 80 corresponding "address registers" in the "etc/hardware_errors" blob. 81 82(6) QEMU generates N ADD_POINTER commands, which patch addresses in the 83 "read_ack_register" fields of the HEST table with a pointer to the 84 corresponding "read_ack_register" within the "etc/hardware_errors" blob. 85 86(7) QEMU generates N ADD_POINTER commands for the firmware, which patch 87 addresses in the "error_block_address" fields with a pointer to the 88 respective "Error Status Data Block" in the "etc/hardware_errors" blob. 89 90(8) QEMU defines a third and write-only fw_cfg blob which is called 91 "etc/hardware_errors_addr". Through that blob, the firmware can send back 92 the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr" 93 blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command 94 for the firmware. The firmware will write back the start address of 95 "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr". 96 97(9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into corresponding 98 "Error Status Data Block", guest memory, and then injects platform specific 99 interrupt (in case of arm/virt machine it's Synchronous External Abort) as a 100 notification which is necessary for notifying the guest. 101 102(10) This notification (in virtual hardware) will be handled by the guest 103 kernel, on receiving notification, guest APEI driver could read the CPER error 104 and take appropriate action. 105 106(11) kvm_arch_on_sigbus_vcpu() uses source_id as index in "etc/hardware_errors" to 107 find out "Error Status Data Block" entry corresponding to error source. So supported 108 source_id values should be assigned here and not be changed afterwards to make sure 109 that guest will write error into expected "Error Status Data Block" even if guest was 110 migrated to a newer QEMU.