| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | | |/ /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that we have helpers that are aware of the pseudo-NMI
feature, introduce them to cpu_do_idle(). This allows for
some nice cleanup.
No functional change intended.
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210615111227.2454465-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
| | |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Additional CPU sanity checks for MTE and preparatory changes for systems
where not all of the CPUs support 32-bit EL0.
* for-next/cpufeature:
arm64: Restrict undef hook for cpufeature registers
arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs
KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support
arm64: Allow mismatched 32-bit EL0 support
arm64: cpuinfo: Split AArch32 registers out into a separate struct
arm64: Check if GMID_EL1.BS is the same on all CPUs
arm64: Change the cpuinfo_arm64 member type for some sysregs to u64
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This commit modifies the mask of the mrs_hook declared in
arch/arm64/kernel/cpufeatures.c which emulates only feature register
access. This is necessary because this hook's mask was too large and
thus masking any mrs instruction, even if not related to the emulated
registers which made the pmu emulation inefficient.
Signed-off-by: Raphael Gault <raphael.gault@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210517180256.2881891-1-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Scheduling a 32-bit application on a 64-bit-only CPU is a bad idea.
Ensure that 32-bit applications always take the slow-path when returning
to userspace on a system with mismatched support at EL0, so that we can
avoid trying to run on a 64-bit-only CPU and force a SIGKILL instead.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210608180313.11502-5-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When confronted with a mixture of CPUs, some of which support 32-bit
applications and others which don't, we quite sensibly treat the system
as 64-bit only for userspace and prevent execve() of 32-bit binaries.
Unfortunately, some crazy folks have decided to build systems like this
with the intention of running 32-bit applications, so relax our
sanitisation logic to continue to advertise 32-bit support to userspace
on these systems and track the real 32-bit capable cores in a cpumask
instead. For now, the default behaviour remains but will be tied to
a command-line option in a later patch.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210608180313.11502-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In preparation for late initialisation of the "sanitised" AArch32 register
state, move the AArch32 registers out of 'struct cpuinfo' and into their
own struct definition.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210608180313.11502-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The GMID_EL1.BS field determines the number of tags accessed by the
LDGM/STGM instructions (EL1 and up), used by the kernel for copying or
zeroing page tags.
Taint the kernel if GMID_EL1.BS differs between CPUs but only of
CONFIG_ARM64_MTE is enabled.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Link: https://lore.kernel.org/r/20210526193621.21559-3-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |/ /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The architecture has been updated and the CTR_EL0, CNTFRQ_EL0,
DCZID_EL0, MIDR_EL1, REVIDR_EL1 registers are all 64-bit, even if most
of them have a RES0 top 32-bit.
Change their type to u64 in struct cpuinfo_arm64.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Link: https://lore.kernel.org/r/20210526193621.21559-2-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | |\ \ \
| | | |/
| | |/|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Big cleanup of our cache maintenance routines, which were confusingly
named and inconsistent in their implementations.
* for-next/caches:
arm64: Rename arm64-internal cache maintenance functions
arm64: Fix cache maintenance function comments
arm64: sync_icache_aliases to take end parameter instead of size
arm64: __clean_dcache_area_pou to take end parameter instead of size
arm64: __clean_dcache_area_pop to take end parameter instead of size
arm64: __clean_dcache_area_poc to take end parameter instead of size
arm64: __flush_dcache_area to take end parameter instead of size
arm64: dcache_by_line_op to take end parameter instead of size
arm64: __inval_dcache_area to take end parameter instead of size
arm64: Fix comments to refer to correct function __flush_icache_range
arm64: Move documentation of dcache_by_line_op
arm64: assembler: remove user_alt
arm64: Downgrade flush_icache_range to invalidate
arm64: Do not enable uaccess for invalidate_icache_range
arm64: Do not enable uaccess for flush_icache_range
arm64: Apply errata to swsusp_arch_suspend_exit
arm64: assembler: add conditional cache fixups
arm64: assembler: replace `kaddr` with `addr`
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Although naming across the codebase isn't that consistent, it
tends to follow certain patterns. Moreover, the term "flush"
isn't defined in the Arm Architecture reference manual, and might
be interpreted to mean clean, invalidate, or both for a cache.
Rename arm64-internal functions to make the naming internally
consistent, as well as making it consistent with the Arm ARM, by
specifying whether it applies to the instruction, data, or both
caches, whether the operation is a clean, invalidate, or both.
Also specify which point the operation applies to, i.e., to the
point of unification (PoU), coherency (PoC), or persistence
(PoP).
This commit applies the following sed transformation to all files
under arch/arm64:
"s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\
"s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\
"s/\binvalidate_icache_range\b/icache_inval_pou/g;"\
"s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\
"s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\
"s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\
"s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\
"s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\
"s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\
"s/\b__flush_icache_all\b/icache_inval_all_pou/g;"
Note that __clean_dcache_area_poc is deliberately missing a word
boundary check at the beginning in order to match the efistub
symbols in image-vars.h.
Also note that, despite its name, __flush_icache_range operates
on both instruction and data caches. The name change here
reflects that.
No functional change intended.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.
No functional change intended.
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-17-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.
Because the code is shared with __dma_clean_area, it changes the
parameters for that as well. However, __dma_clean_area is local to
cache.S, so no other users are affected.
No functional change intended.
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-14-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.
No functional change intended.
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-13-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.
Because the code is shared with __dma_inv_area, it changes the
parameters for that as well. However, __dma_inv_area is local to
cache.S, so no other users are affected.
No functional change intended.
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-11-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Many comments refer to the function flush_icache_range, where the
intent is in fact __flush_icache_range. Fix these comments to
refer to the intended function.
That's probably due to commit 3b8c9f1cdfc506e9 ("arm64: IPI each
CPU after invalidating the I-cache for kernel mappings"), which
renamed flush_icache_range() to __flush_icache_range() and added
a wrapper.
No functional change intended.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-10-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since __flush_dcache_area is called right before,
invalidate_icache_range is sufficient in this case.
Rewrite the comment to better explain the rationale behind the
cache maintenance operations used here.
No functional change intended.
Possible performance impact due to invalidating only the icache
rather than invalidating and cleaning both caches.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-arch/20200511110014.lb9PEahJ4hVOYrbwIb_qUHXyNy9KQzNFdb_I3YlzY6A@z/
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-7-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Arm errata covered by ARM64_WORKAROUND_CLEAN_CACHE require
that "dc cvau" instructions get promoted to "dc civac".
Reported-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-4-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The scs_load and scs_save asm macros don't make use of the mandatory
'tmp' register argument, so drop it and fix up the callers.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20210527105529.21967-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that we have a consistent place to initialize CPU context registers
early in the boot path, let's also initialize the per-cpu offset here.
This makes the primary and secondary boot paths more consistent, and
allows for the use of per-cpu operations earlier, which will be
necessary for instrumentation with KCSAN.
Note that smp_prepare_boot_cpu() still needs to re-initialize CPU0's
offset as immediately prior to this the per-cpu areas may be
reallocated, and hence the boot-time offset may be stale. A comment is
added to make this clear.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210520115031.18509-7-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Once we enable the MMU, we have to initialize:
* SP_EL0 to point at the active task
* SP to point at the active task's stack
* SCS_SP to point at the active task's shadow stack
For all tasks (including init_task), this information can be derived
from the task's task_struct.
Let's unify __primary_switched and __secondary_switched to consistently
acquire this information from the relevant task_struct. At the same
time, let's fold this together with initializing a task's final frame.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210520115031.18509-6-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we boot a secondary CPU, we pass it a task and a stack to use. As
the stack is always the task's stack, which can be derived from the
task, let's have the secondary CPU derive this itself and avoid passing
redundant information.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210520115031.18509-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
All reads and writes of secondary_data occur with the MMU on, using
coherent attributes, so there's no need to perform any cache maintenance
for this.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210520115031.18509-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The AAPCS places no requirements on the alignment of the frame
record. In theory it could be placed anywhere, although it seems
sensible to require it to be aligned to 8 bytes. With an upcoming
enhancement to tag-based KASAN Clang will begin creating frame records
located at an address that is only aligned to 8 bytes. Accommodate
such frame records in the stack unwinding code.
As pointed out by Mark Rutland, the userspace stack unwinding code
has the same problem, so fix it there as well.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Ia22c375230e67ca055e9e4bb639383567f7ad268
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210526174927.2477847-2-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
unwind_frame() was previously implicitly checking that the frame
record is in bounds of the stack by enforcing that FP is both aligned
to 16 and in bounds of the stack. Once the FP alignment requirement
is relaxed to 8 this will not be sufficient because it does not
account for the case where FP points to 8 bytes before the end of the
stack.
Make the check explicit by changing the on_*stack functions to take a
size argument and adjusting the callers to pass the appropriate sizes.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Ib7a3eb3eea41b0687ffaba045ceb2012d077d8b4
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210526174927.2477847-1-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| |\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
|
| | |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As pointed out by commit
de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread")
init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.
As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().
Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().
Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().
Secondary startups were patched via coccinelle:
@begone@
@@
-preempt_disable();
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Kprobes has a counter 'nmissed', that is used to count the number of
times a probe handler was not called. This generally happens when we hit
a kprobe while handling another kprobe.
However, if one of the probe handlers causes a fault, we are currently
incrementing 'nmissed'. The comment in fault handler indicates that this
can be used to account faults taken by the probe handlers. But, this has
never been the intention as is evident from the comment above 'nmissed'
in 'struct kprobe':
/*count the number of times this probe was temporarily disarmed */
unsigned long nmissed;
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reason for kprobe::fault_handler(), as given by their comment:
* We come here because instructions in the pre/post
* handler caused the page_fault, this could happen
* if handler tries to access user space by
* copy_from_user(), get_user() etc. Let the
* user-specified handler try to fix it first.
Is just plain bad. Those other handlers are ran from non-preemptible
context and had better use _nofault() functions. Also, there is no
upstream usage of this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"A mix of fixes and clean-ups that turned up too late for the first
pull request:
- Restore terminal stack frame records. Their previous removal caused
traces which cross secondary_start_kernel to terminate one entry
too late, with a spurious "0" entry.
- Fix boot warning with pseudo-NMI due to the way we manipulate the
PMR register.
- ACPI fixes: avoid corruption of interrupt mappings on watchdog
probe failure (GTDT), prevent unregistering of GIC SGIs.
- Force SPARSEMEM_VMEMMAP as the only memory model, it saves with
having to test all the other combinations.
- Documentation fixes and updates: tagged address ABI exceptions on
brk/mmap/mremap(), event stream frequency, update booting
requirements on the configuration of traps"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kernel: Update the stale comment
arm64: Fix the documented event stream frequency
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
arm64: Explicitly document boot requirements for SVE
arm64: Explicitly require that FPSIMD instructions do not trap
arm64: Relax booting requirements for configuration of traps
arm64: cpufeatures: use min and max
arm64: stacktrace: restore terminal records
arm64/vdso: Discard .note.gnu.property sections in vDSO
arm64: doc: Add brk/mmap/mremap() to the Tagged Address ABI Exceptions
psci: Remove unneeded semicolon
ACPI: irq: Prevent unregistering of GIC SGIs
ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
arm64: Show three registers per line
arm64: remove HAVE_DEBUG_BUGVERBOSE
arm64: alternative: simplify passing alt_region
arm64: Force SPARSEMEM_VMEMMAP as the only memory management model
arm64: vdso32: drop -no-integrated-as flag
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit af391b15f7b5 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm")
has used @index instead of @arg, but the comment is stale, update it.
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/1620280462-21937-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Zenghui reports that booting a kernel with "irqchip.gicv3_pseudo_nmi=1"
on the command line hits a warning during kernel entry, due to the way
we manipulate the PMR.
Early in the entry sequence, we call lockdep_hardirqs_off() to inform
lockdep that interrupts have been masked (as the HW sets DAIF wqhen
entering an exception). Architecturally PMR_EL1 is not affected by
exception entry, and we don't set GIC_PRIO_PSR_I_SET in the PMR early in
the exception entry sequence, so early in exception entry the PMR can
indicate that interrupts are unmasked even though they are masked by
DAIF.
If DEBUG_LOCKDEP is selected, lockdep_hardirqs_off() will check that
interrupts are masked, before we set GIC_PRIO_PSR_I_SET in any of the
exception entry paths, and hence lockdep_hardirqs_off() will WARN() that
something is amiss.
We can avoid this by consistently setting GIC_PRIO_PSR_I_SET during
exception entry so that kernel code sees a consistent environment. We
must also update local_daif_inherit() to undo this, as currently only
touches DAIF. For other paths, local_daif_restore() will update both
DAIF and the PMR. With this done, we can remove the existing special
cases which set this later in the entry code.
We always use (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET) for consistency with
local_daif_save(), as this will warn if it ever encounters
(GIC_PRIO_IRQOFF | GIC_PRIO_PSR_I_SET), and never sets this itself. This
matches the gic_prio_kentry_setup that we have to retain for
ret_to_user.
The original splat from Zenghui's report was:
| DEBUG_LOCKS_WARN_ON(!irqs_disabled())
| WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258 lockdep_hardirqs_off+0xd4/0xe8
| Modules linked in:
| CPU: 3 PID: 125 Comm: modprobe Tainted: G W 5.12.0-rc8+ #463
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : lockdep_hardirqs_off+0xd4/0xe8
| lr : lockdep_hardirqs_off+0xd4/0xe8
| sp : ffff80002a39bad0
| pmr_save: 000000e0
| x29: ffff80002a39bad0 x28: ffff0000de214bc0
| x27: ffff0000de1c0400 x26: 000000000049b328
| x25: 0000000000406f30 x24: ffff0000de1c00a0
| x23: 0000000020400005 x22: ffff8000105f747c
| x21: 0000000096000044 x20: 0000000000498ef9
| x19: ffff80002a39bc88 x18: ffffffffffffffff
| x17: 0000000000000000 x16: ffff800011c61eb0
| x15: ffff800011700a88 x14: 0720072007200720
| x13: 0720072007200720 x12: 0720072007200720
| x11: 0720072007200720 x10: 0720072007200720
| x9 : ffff80002a39bad0 x8 : ffff80002a39bad0
| x7 : ffff8000119f0800 x6 : c0000000ffff7fff
| x5 : ffff8000119f07a8 x4 : 0000000000000001
| x3 : 9bcdab23f2432800 x2 : ffff800011730538
| x1 : 9bcdab23f2432800 x0 : 0000000000000000
| Call trace:
| lockdep_hardirqs_off+0xd4/0xe8
| enter_from_kernel_mode.isra.5+0x7c/0xa8
| el1_abort+0x24/0x100
| el1_sync_handler+0x80/0xd0
| el1_sync+0x6c/0x100
| __arch_clear_user+0xc/0x90
| load_elf_binary+0x9fc/0x1450
| bprm_execve+0x404/0x880
| kernel_execve+0x180/0x188
| call_usermodehelper_exec_async+0xdc/0x158
| ret_from_fork+0x10/0x18
Fixes: 23529049c684 ("arm64: entry: fix non-NMI user<->kernel transitions")
Fixes: 7cd1ea1010ac ("arm64: entry: fix non-NMI kernel<->kernel transitions")
Fixes: f0cd5ac1e4c5 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
Fixes: 2a9b3e6ac69a ("arm64: entry: fix EL1 debug transitions")
Link: https://lore.kernel.org/r/f4012761-026f-4e51-3a0c-7524e434e8b3@huawei.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210428111555.50880-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use min and max to make the effect more clear.
Generated by: scripts/coccinelle/misc/minmax.cocci
CC: Denis Efremov <efremov@linux.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2104292246300.16899@hadrien
[catalin.marinas@arm.com: include <linux/minmax.h> explicitly]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We removed the terminal frame records in commit:
6106e1112cc69a36 ("arm64: remove EL0 exception frame record")
... on the assumption that as we no longer used them to find the pt_regs
at exception boundaries, they were no longer necessary.
However, Leo reports that as an unintended side-effect, this causes
traces which cross secondary_start_kernel to terminate one entry too
late, with a spurious "0" entry.
There are a few ways we could sovle this, but as we're planning to use
terminal records for RELIABLE_STACKTRACE, let's revert the logic change
for now, keeping the update comments and accounting for the changes in
commit:
3c02600144bdb0a1 ("arm64: stacktrace: Report when we reach the end of the stack")
This is effectively a partial revert of commit:
6106e1112cc69a36 ("arm64: remove EL0 exception frame record")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 6106e1112cc6 ("arm64: remove EL0 exception frame record")
Reported-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210429104813.GA33550@C02TD0UTHF1T.local
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The arm64 assembler in binutils 2.32 and above generates a program
property note in a note section, .note.gnu.property, to encode used x86
ISAs and features. But the kernel linker script only contains a single
NOTE segment:
PHDRS
{
text PT_LOAD FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
note PT_NOTE FLAGS(4); /* PF_R */
}
The NOTE segment generated by the vDSO linker script is aligned to 4 bytes.
But the .note.gnu.property section must be aligned to 8 bytes on arm64.
$ readelf -n vdso64.so
Displaying notes found in: .note
Owner Data size Description
Linux 0x00000004 Unknown note type: (0x00000000)
description data: 06 00 00 00
readelf: Warning: note with invalid namesz and/or descsz found at offset 0x20
readelf: Warning: type: 0x78, namesize: 0x00000100, descsize: 0x756e694c, alignment: 8
Since the note.gnu.property section in the vDSO is not checked by the
dynamic linker, discard the .note.gnu.property sections in the vDSO.
Similar to commit 4caffe6a28d31 ("x86/vdso: Discard .note.gnu.property
sections in vDSO"), but for arm64.
Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210423205159.830854-1-morbo@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Displaying two registers per line takes 15 lines. That improves to just
10 lines if we display three registers per line, which reduces the amount
of information lost when oopses are cut off. It stays within 80 columns
and matches x86-64.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210420172245.3679077-1-willy@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In __apply_alternatives() we take a pointer to void which we later
assign to a pointer to struct alt_region. As all callers are passing a
pointer to struct alt_region to begin with, it's simpler and more robust
to take a pointer to struct alt region, so let's do so and avoid the
need for a temporary variable.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210416163032.10857-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Clang can assemble these files just fine; this is a relic from the top
level Makefile conditionally adding this. We no longer need --prefix,
--gcc-toolchain, or -Qunused-arguments flags either with this change, so
remove those too.
To test building:
$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make LLVM=1 LLVM_IAS=1 \
defconfig arch/arm64/kernel/vdso32/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210420174427.230228-1-ndesaulniers@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
|
| | |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.13
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
|
| | | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | |\ \ \
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | |\ \ \ \
| | | | | | |
| | | | | | |
| | | | | | | |
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Instead of doing a RMW on SCTLR_EL1 to disable the MMU, use the
existing define that loads the right set of bits.
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | |\ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | |/ / / /
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
To aid with debugging, add details of the source of a panic from nVHE
hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler()
rather than directly to panic(). The handler will then add the extra
details for debugging before panicking the kernel.
If the panic was due to a BUG(), look up the metadata to log the file
and line, if available, otherwise log an address that can be looked up
in vmlinux. The hyp offset is also logged to allow other hyp VAs to be
converted, similar to how the kernel offset is logged during a panic.
__hyp_panic_string is now inlined since it no longer needs to be
referenced as a symbol and the message is free to diverge between VHE
and nVHE.
The following is an example of the logs generated by a BUG in nVHE hyp.
[ 46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242!
[ 46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000
[ 46.755824] Kernel panic - not syncing: HYP panic:
[ 46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.755824] VCPU:0000d93a880d0000
[ 46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133
[ 46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 46.758366] Call trace:
[ 46.758601] dump_backtrace+0x0/0x1b0
[ 46.758856] show_stack+0x18/0x70
[ 46.759057] dump_stack+0xd0/0x12c
[ 46.759236] panic+0x16c/0x334
[ 46.759426] arm64_kernel_unmapped_at_el0+0x0/0x30
[ 46.759661] kvm_arch_vcpu_ioctl_run+0x134/0x750
[ 46.759936] kvm_vcpu_ioctl+0x2f0/0x970
[ 46.760156] __arm64_sys_ioctl+0xa8/0xec
[ 46.760379] el0_svc_common.constprop.0+0x60/0x120
[ 46.760627] do_el0_svc+0x24/0x90
[ 46.760766] el0_svc+0x2c/0x54
[ 46.760915] el0_sync_handler+0x1a4/0x1b0
[ 46.761146] el0_sync+0x170/0x180
[ 46.761889] SMP: stopping secondary CPUs
[ 46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000
[ 46.763142] PHYS_OFFSET: 0xffffa9f680000000
[ 46.763359] CPU features: 0x00240022,61806008
[ 46.763651] Memory Limit: none
[ 46.813867] ---[ end Kernel panic - not syncing: HYP panic:
[ 46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.813867] VCPU:0000d93a880d0000 ]---
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
|
| | | |\ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Now that the read_ctr macro has been specialised for nVHE,
the whole CPU_FTR_REG_HYP_COPY infrastrcture looks completely
overengineered.
Simplify it by populating the two u64 quantities (MMFR0 and 1)
that the hypervisor need.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In protected mode, late CPUs are not allowed to boot (enforced by
the PSCI relay). We can thus specialise the read_ctr macro to
always return a pre-computed, sanitised value. Special care is
taken to prevent the use of this custome version outside of
the protected mode.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|