| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Andrii suggested we can simply jump to again label
instead of making recursion call.
Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-7-jolsa@kernel.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Adding type_id pointer as argument to __btf_resolve_size
to return also BTF ID of the resolved type. It will be
used in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-6-jolsa@kernel.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
If the resolved type is array, make btf_resolve_size return also
ID of the elem type. It will be needed in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-5-jolsa@kernel.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Moving btf_resolve_size into __btf_resolve_size and
keeping btf_resolve_size public with just first 3
arguments, because the rest of the arguments are not
used by outside callers.
Following changes are adding more arguments, which
are not useful to outside callers. They will be added
to the __btf_resolve_size function.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-4-jolsa@kernel.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The CC_CAN_LINK checks that the host compiler can link, but bpf_preload
relies on libbpf which in turn needs libelf to be present during linking.
allmodconfig runs in odd setups with cross compilers and missing host
libraries like libelf. Instead of extending kconfig with every possible
library that bpf_preload might need disallow building BPF_PRELOAD in
such build-only configurations.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used
in LSM programs. These helpers are not used for tracing programs
(currently) as their usage is tied to the life-cycle of the object and
should only be used where the owning object won't be freed (when the
owning object is passed as an argument to the LSM hook). Thus, they
are safer to use in LSM hooks than tracing. Usage of local storage in
tracing programs will probably follow a per function based whitelist
approach.
Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock,
it, leads to a compilation warning for LSM programs, it's also updated
to accept a void * pointer instead.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.
The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the
security blob which are now stackable and can co-exist with other LSMs.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
A purely mechanical change:
bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c
bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-5-kpsingh@chromium.org
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Allow calling bpf_map_update_elem on sockmap and sockhash from a BPF
context. The synchronization required for this is a bit fiddly: we
need to prevent the socket from changing its state while we add it
to the sockmap, since we rely on getting a callback via
sk_prot->unhash. However, we can't just lock_sock like in
sock_map_sk_acquire because that might sleep. So instead we disable
softirq processing and use bh_lock_sock to prevent further
modification.
Yet, this is still not enough. BPF can be called in contexts where
the current CPU might have locked a socket. If the BPF can get
a hold of such a socket, inserting it into a sockmap would lead to
a deadlock. One straight forward example are sock_ops programs that
have ctx->sk, but the same problem exists for kprobes, etc.
We deal with this by allowing sockmap updates only from known safe
contexts. Improper usage is rejected by the verifier.
I've audited the enabled contexts to make sure they can't run in
a locked context. It's possible that CGROUP_SKB and others are
safe as well, but the auditing here is much more difficult. In
any case, we can extend the safe contexts when the need arises.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200821102948.21918-6-lmb@cloudflare.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The verifier assumes that map values are simple blobs of memory, and
therefore treats ARG_PTR_TO_MAP_VALUE, etc. as such. However, there are
map types where this isn't true. For example, sockmap and sockhash store
sockets. In general this isn't a big problem: we can just
write helpers that explicitly requests PTR_TO_SOCKET instead of
ARG_PTR_TO_MAP_VALUE.
The one exception are the standard map helpers like map_update_elem,
map_lookup_elem, etc. Here it would be nice we could overload the
function prototype for different kinds of maps. Unfortunately, this
isn't entirely straight forward:
We only know the type of the map once we have resolved meta->map_ptr
in check_func_arg. This means we can't swap out the prototype
in check_helper_call until we're half way through the function.
Instead, modify check_func_arg to treat ARG_PTR_TO_MAP_VALUE to
mean "the native type for the map" instead of "pointer to memory"
for sockmap and sockhash. This means we don't have to modify the
function prototype at all
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200821102948.21918-5-lmb@cloudflare.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Don't go via map->ops to call sock_map_update_elem, since we know
what function to call in bpf_map_update_value. Since we currently
don't allow calling map_update_elem from BPF context, we can remove
ops->map_update_elem and rename the function to sock_map_update_elem_sys.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200821102948.21918-4-lmb@cloudflare.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
For bpf_map_elem and bpf_sk_local_storage bpf iterators,
additional map_id should be shown for fdinfo and
userspace query. For example, the following is for
a bpf_map_elem iterator.
$ cat /proc/1753/fdinfo/9
pos: 0
flags: 02000000
mnt_id: 14
link_type: iter
link_id: 34
prog_tag: 104be6d3fe45e6aa
prog_id: 173
target_name: bpf_map_elem
map_id: 127
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821184419.574240-1-yhs@fb.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This patch implemented bpf_link callback functions
show_fdinfo and fill_link_info to support link_query
interface.
The general interface for show_fdinfo and fill_link_info
will print/fill the target_name. Each targets can
register show_fdinfo and fill_link_info callbacks
to print/fill more target specific information.
For example, the below is a fdinfo result for a bpf
task iterator.
$ cat /proc/1749/fdinfo/7
pos: 0
flags: 02000000
mnt_id: 14
link_type: iter
link_id: 11
prog_tag: 990e1f8152f7e54f
prog_id: 59
target_name: task
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821184418.574122-1-yhs@fb.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Add kernel module with user mode driver that populates bpffs with
BPF iterators.
$ mount bpffs /my/bpffs/ -t bpf
$ ls -la /my/bpffs/
total 4
drwxrwxrwt 2 root root 0 Jul 2 00:27 .
drwxr-xr-x 19 root root 4096 Jul 2 00:09 ..
-rw------- 1 root root 0 Jul 2 00:27 maps.debug
-rw------- 1 root root 0 Jul 2 00:27 progs.debug
The user mode driver will load BPF Type Formats, create BPF maps, populate BPF
maps, load two BPF programs, attach them to BPF iterators, and finally send two
bpf_link IDs back to the kernel.
The kernel will pin two bpf_links into newly mounted bpffs instance under
names "progs.debug" and "maps.debug". These two files become human readable.
$ cat /my/bpffs/progs.debug
id name attached
11 dump_bpf_map bpf_iter_bpf_map
12 dump_bpf_prog bpf_iter_bpf_prog
27 test_pkt_access
32 test_main test_pkt_access test_pkt_access
33 test_subprog1 test_pkt_access_subprog1 test_pkt_access
34 test_subprog2 test_pkt_access_subprog2 test_pkt_access
35 test_subprog3 test_pkt_access_subprog3 test_pkt_access
36 new_get_skb_len get_skb_len test_pkt_access
37 new_get_skb_ifindex get_skb_ifindex test_pkt_access
38 new_get_constant get_constant test_pkt_access
The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about
all BPF programs currently loaded in the system. This information is unstable
and will change from kernel to kernel as ".debug" suffix conveys.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The program and map iterators work similar to seq_file-s.
Once the program is pinned in bpffs it can be read with "cat" tool
to print human readable output. In this case about BPF programs and maps.
For example:
$ cat /sys/fs/bpf/progs.debug
id name attached
5 dump_bpf_map bpf_iter_bpf_map
6 dump_bpf_prog bpf_iter_bpf_prog
$ cat /sys/fs/bpf/maps.debug
id name max_entries
3 iterator.rodata 1
To avoid kernel build dependency on clang 10 separate bpf skeleton generation
into manual "make" step and instead check-in generated .skel.h into git.
Unlike 'bpftool prog show' in-kernel BTF name is used (when available)
to print full name of BPF program instead of 16-byte truncated name.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200819042759.51280-3-alexei.starovoitov@gmail.com
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Refactor the code a bit to extract bpf_link_by_id() helper.
It's similar to existing bpf_prog_by_id().
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200819042759.51280-2-alexei.starovoitov@gmail.com
|
| |\ \ \ \ \ \ \ \
| |_|_|_|_|_|/ /
|/| | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
"Most of the changes are cleanups and reorganization to make the
objtool code more arch-agnostic. This is in preparation for non-x86
support.
Other changes:
- KASAN fixes
- Handle unreachable trap after call to noreturn functions better
- Ignore unreachable fake jumps
- Misc smaller fixes & cleanups"
* tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf build: Allow nested externs to enable BUILD_BUG() usage
objtool: Allow nested externs to enable BUILD_BUG()
objtool: Permit __kasan_check_{read,write} under UACCESS
objtool: Ignore unreachable trap after call to noreturn functions
objtool: Handle calling non-function symbols in other sections
objtool: Ignore unreachable fake jumps
objtool: Remove useless tests before save_reg()
objtool: Decode unwind hint register depending on architecture
objtool: Make unwind hint definitions available to other architectures
objtool: Only include valid definitions depending on source file type
objtool: Rename frame.h -> objtool.h
objtool: Refactor jump table code to support other architectures
objtool: Make relocation in alternative handling arch dependent
objtool: Abstract alternative special case handling
objtool: Move macros describing structures to arch-dependent code
objtool: Make sync-check consider the target architecture
objtool: Group headers to check in a single list
objtool: Define 'struct orc_entry' only when needed
objtool: Skip ORC entry creation for non-text sections
objtool: Move ORC logic out of check()
...
|
| | | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Header frame.h is getting more code annotations to help objtool analyze
object files.
Rename the file to objtool.h.
[ jpoimboe: add objtool.h to MAINTAINERS ]
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
| | |_|_|_|_|/ /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Simon reported an issue with the current scalar32_min_max_or() implementation.
That is, compared to the other 32 bit subreg tracking functions, the code in
scalar32_min_max_or() stands out that it's using the 64 bit registers instead
of 32 bit ones. This leads to bounds tracking issues, for example:
[...]
8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
8: (79) r1 = *(u64 *)(r0 +0)
R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
9: (b7) r0 = 1
10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
10: (18) r2 = 0x600000002
12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
12: (ad) if r1 < r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: (95) exit
14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
14: (25) if r1 > 0x0 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: (95) exit
16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
16: (47) r1 |= 0
17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
[...]
The bound tests on the map value force the upper unsigned bound to be 25769803777
in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By
using OR they are truncated and thus result in the range [1,1] for the 32 bit reg
tracker. This is incorrect given the only thing we know is that the value must be
positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the
subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes
sense, for example, for the case where we update dst_reg->s32_{min,max}_value in
the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as
we know that these are positive. Previously, in the else branch the 64 bit values
of umin_value=1 and umax_value=32212254719 were used and latter got truncated to
be 1 as upper bound there. After the fix the subreg range is now correct:
[...]
8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
8: (79) r1 = *(u64 *)(r0 +0)
R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
9: (b7) r0 = 1
10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
10: (18) r2 = 0x600000002
12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
12: (ad) if r1 < r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: (95) exit
14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
14: (25) if r1 > 0x0 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: (95) exit
16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
16: (47) r1 |= 0
17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
[...]
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Simon Scannell <scannell.smn@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| | |_|_|_|/ /
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If BTF data is missing or removed from the ELF section it is still exported
via sysfs as a zero-length file:
root@OpenWrt:/# ls -l /sys/kernel/btf/vmlinux
-r--r--r-- 1 root root 0 Jul 18 02:59 /sys/kernel/btf/vmlinux
Moreover, reads from this file succeed and leak kernel data:
root@OpenWrt:/# hexdump -C /sys/kernel/btf/vmlinux|head -10
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000cc0 00 00 00 00 00 00 00 00 00 00 00 00 80 83 b0 80 |................|
000cd0 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ce0 00 00 00 00 00 00 00 00 00 00 00 00 57 ac 6e 9d |............W.n.|
000cf0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
002650 00 00 00 00 00 00 00 10 00 00 00 01 00 00 00 01 |................|
002660 80 82 9a c4 80 85 97 80 81 a9 51 68 00 00 00 02 |..........Qh....|
002670 80 25 44 dc 80 85 97 80 81 a9 50 24 81 ab c4 60 |.%D.......P$...`|
This situation was first observed with kernel 5.4.x, cross-compiled for a
MIPS target system. Fix by adding a sanity-check for export of zero-length
data sections.
Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/b38db205a66238f70823039a8c531535864eaac5.1600417359.git.Tony.Ambardar@gmail.com
|
| |\ \ \ \ \ \
| |_|_|_|/ /
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Alexei Starovoitov says:
====================
pull-request: bpf 2020-09-15
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 19 day(s) which contain
a total of 10 files changed, 47 insertions(+), 38 deletions(-).
The main changes are:
1) docs/bpf fixes, from Andrii.
2) ld_abs fix, from Daniel.
3) socket casting helpers fix, from Martin.
4) hash iterator fixes, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Running selftest
./btf_btf -p
the kernel had the following warning:
[ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
[ 51.529217] Modules linked in:
[ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
[ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
[ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
...
[ 51.542826] Call Trace:
[ 51.543119] map_seq_next+0x53/0x80
[ 51.543528] seq_read+0x263/0x400
[ 51.543932] vfs_read+0xad/0x1c0
[ 51.544311] ksys_read+0x5f/0xe0
[ 51.544689] do_syscall_64+0x33/0x40
[ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9
The related source code in kernel/bpf/hashtab.c:
709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
710 {
711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
712 struct hlist_nulls_head *head;
713 struct htab_elem *l, *next_l;
714 u32 hash, key_size;
715 int i = 0;
716
717 WARN_ON_ONCE(!rcu_read_lock_held());
In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
without holding a rcu_read_lock(), hence causing the above warning.
To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.
Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently, for hashmap, the bpf iterator will grab a bucket lock, a
spinlock, before traversing the elements in the bucket. This can ensure
all bpf visted elements are valid. But this mechanism may cause
deadlock if update/deletion happens to the same bucket of the
visited map in the program. For example, if we added bpf_map_update_elem()
call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
we will have the following deadlock:
============================================
WARNING: possible recursive locking detected
5.9.0-rc1+ #841 Not tainted
--------------------------------------------
test_progs/1750 is trying to acquire lock:
ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410
but task is already holding lock:
ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&htab->buckets[i].raw_lock);
lock(&htab->buckets[i].raw_lock);
*** DEADLOCK ***
...
Call Trace:
dump_stack+0x78/0xa0
__lock_acquire.cold.74+0x209/0x2e3
lock_acquire+0xba/0x380
? htab_map_update_elem+0x1cf/0x410
? __lock_acquire+0x639/0x20c0
_raw_spin_lock_irqsave+0x3b/0x80
? htab_map_update_elem+0x1cf/0x410
htab_map_update_elem+0x1cf/0x410
? lock_acquire+0xba/0x380
bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
? find_held_lock+0x34/0xa0
bpf_iter_run_prog+0x81/0x16e
__bpf_hash_map_seq_show+0x145/0x180
bpf_seq_read+0xff/0x3d0
vfs_read+0xad/0x1c0
ksys_read+0x5f/0xe0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
and then grabbed again in htab_map_update_elem() in the bpf program, causing
deadlocks.
Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
seq_ops->start():
rcu_read_lock();
seq_ops->next():
rcu_read_unlock();
/* next element */
rcu_read_lock();
seq_ops->stop();
rcu_read_unlock();
Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
we may visit stale elements, miss some elements, or repeat some elements.
I think this is a reasonable compromise. For users wanting to avoid
stale, missing/repeated accesses, bpf_map batch access syscall interface
can be used.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com
|
| |\| | | | |
| |/ / / /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull networking fixes from David Miller:
1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
Kivilinna.
2) Fix loss of RTT samples in rxrpc, from David Howells.
3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.
4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.
5) We disable BH for too lokng in sctp_get_port_local(), add a
cond_resched() here as well, from Xin Long.
6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.
7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
Yonghong Song.
8) Missing of_node_put() in mt7530 DSA driver, from Sumera
Priyadarsini.
9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.
10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.
11) Memory leak in rxkad_verify_response, from Dinghao Liu.
12) In tipc, don't use smp_processor_id() in preemptible context. From
Tuong Lien.
13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.
14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.
15) Fix ABI mismatch between driver and firmware in nfp, from Louis
Peens.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
net/smc: fix sock refcounting in case of termination
net/smc: reset sndbuf_desc if freed
net/smc: set rx_off for SMCR explicitly
net/smc: fix toleration of fake add_link messages
tg3: Fix soft lockup when tg3_reset_task() fails.
doc: net: dsa: Fix typo in config code sample
net: dp83867: Fix WoL SecureOn password
nfp: flower: fix ABI mismatch between driver and firmware
tipc: fix shutdown() of connectionless socket
ipv6: Fix sysctl max for fib_multipath_hash_policy
drivers/net/wan/hdlc: Change the default of hard_header_len to 0
net: gemini: Fix another missing clk_disable_unprepare() in probe
net: bcmgenet: fix mask check in bcmgenet_validate_flow()
amd-xgbe: Add support for new port mode
net: usb: dm9601: Add USB ID of Keenetic Plus DSL
vhost: fix typo in error message
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
pktgen: fix error message with wrong function name
net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
cxgb4: fix thermal zone device registration
...
|
| | |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit f2e10bff16a0 ("bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link")
added link query for raw_tp. One of fields in link_info is to
fill a user buffer with tp_name. The Scurrent checking only
declares "ulen && !ubuf" as invalid. So "!ulen && ubuf" will be
valid. Later on, we do "copy_to_user(ubuf, tp_name, ulen - 1)" which
may overwrite user memory incorrectly.
This patch fixed the problem by disallowing "!ulen && ubuf" case as well.
Fixes: f2e10bff16a0 ("bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200821191054.714731-1-yhs@fb.com
|
| |/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently when traversing all tasks, the next tid
is always increased by one. This may result in
visiting the same task multiple times in a
pid namespace.
This patch fixed the issue by seting the next
tid as pid_nr_ns(pid, ns) + 1, similar to
funciton next_tgid().
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/bpf/20200818222310.2181500-1-yhs@fb.com
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In our production system, we observed rcu stalls when
'bpftool prog` is running.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
\x09(t=21031 jiffies g=2534773 q=179750)
NMI backtrace for cpu 7
CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G W 5.8.0-00004-g68bfc7f8c1b4 #6
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
Call Trace:
<IRQ>
dump_stack+0x57/0x70
nmi_cpu_backtrace.cold+0x14/0x53
? lapic_can_unplug_cpu.cold+0x39/0x39
nmi_trigger_cpumask_backtrace+0xb7/0xc7
rcu_dump_cpu_stacks+0xa2/0xd0
rcu_sched_clock_irq.cold+0x1ff/0x3d9
? tick_nohz_handler+0x100/0x100
update_process_times+0x5b/0x90
tick_sched_timer+0x5e/0xf0
__hrtimer_run_queues+0x12a/0x2a0
hrtimer_interrupt+0x10e/0x280
__sysvec_apic_timer_interrupt+0x51/0xe0
asm_call_on_stack+0xf/0x20
</IRQ>
sysvec_apic_timer_interrupt+0x6f/0x80
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:task_file_seq_get_next+0x71/0x220
Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
task_file_seq_next+0x52/0xa0
bpf_seq_read+0xb9/0x320
vfs_read+0x9d/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x38/0x60
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8815f4f76e
Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e
Note that `bpftool prog` actually calls a task_file bpf iterator
program to establish an association between prog/map/link/btf anon
files and processes.
In the case where the above rcu stall occured, we had a process
having 1587 tasks and each task having roughly 81305 files.
This implied 129 million bpf prog invocations. Unfortunwtely none of
these files are prog/map/link/btf files so bpf iterator/prog needs
to traverse all these files and not able to return to user space
since there are no seq_file buffer overflow.
This patch fixed the issue in bpf_seq_read() to limit the number
of visited objects. If the maximum number of visited objects is
reached, no more objects will be visited in the current syscall.
If there is nothing written in the seq_file buffer, -EAGAIN will
return to the user so user can try again.
The maximum number of visited objects is set at 1 million.
In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
visiting 1 million files takes around 0.18 seconds.
We did not use cond_resched() since for some iterators, e.g.,
netlink iterator, where rcu read_lock critical section spans between
consecutive seq_ops->next(), which makes impossible to do cond_resched()
in the key while loop of function bpf_seq_read().
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/bpf/20200818222309.2181348-1-yhs@fb.com
|
| |/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With latest `bpftool prog` command, we observed the following kernel
panic.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
Oops: 0010 [#1] SMP
CPU: 9 PID: 6023 ...
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
FS: 00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
rcu_core+0x1a4/0x440
__do_softirq+0xd3/0x2c8
irq_exit+0x9d/0xa0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0033:0x47ce80
Code: Bad RIP value.
RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380
After further analysis, the bug is triggered by
Commit eaaacd23910f ("bpf: Add task and task/file iterator targets")
which introduced task_file bpf iterator, which traverses all open file
descriptors for all tasks in the current namespace.
The latest `bpftool prog` calls a task_file bpf program to traverse
all files in the system in order to associate processes with progs/maps, etc.
When traversing files for a given task, rcu read_lock is taken to
access all files in a file_struct. But it used get_file() to grab
a file, which is not right. It is possible file->f_count is 0 and
get_file() will unconditionally increase it.
Later put_file() may cause all kind of issues with the above
as one of sympotoms.
The failure can be reproduced with the following steps in a few seconds:
$ cat t.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define N 10000
int fd[N];
int main() {
int i;
for (i = 0; i < N; i++) {
fd[i] = open("./note.txt", 'r');
if (fd[i] < 0) {
fprintf(stderr, "failed\n");
return -1;
}
}
for (i = 0; i < N; i++)
close(fd[i]);
return 0;
}
$ gcc -O2 t.c
$ cat run.sh
#/bin/bash
for i in {1..100}
do
while true; do ./a.out; done &
done
$ ./run.sh
$ while true; do bpftool prog >& /dev/null; done
This patch used get_file_rcu() which only grabs a file if the
file->f_count is not zero. This is to ensure the file pointer
is always valid. The above reproducer did not fail for more
than 30 minutes.
Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/bpf/20200817174214.252601-1-yhs@fb.com
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently when we look for build id within bpf_get_stackid helper
call, we check the first NOTE section and we fail if build id is
not there.
However on some system (Fedora) there can be multiple NOTE sections
in binaries and build id data is not always the first one, like:
$ readelf -a /usr/bin/ls
...
[ 2] .note.gnu.propert NOTE 0000000000000338 00000338
0000000000000020 0000000000000000 A 0 0 8358
[ 3] .note.gnu.build-i NOTE 0000000000000358 00000358
0000000000000024 0000000000000000 A 0 0 437c
[ 4] .note.ABI-tag NOTE 000000000000037c 0000037c
...
So the stack_map_get_build_id function will fail on build id retrieval
and fallback to BPF_STACK_BUILD_ID_IP.
This patch is changing the stack_map_get_build_id code to iterate
through all the NOTE sections and try to get build id data from
each of them.
When tracing on sched_switch tracepoint that does bpf_get_stackid
helper call kernel build, I can see about 60% increase of successful
build id retrieval. The rest seems fails on -EFAULT.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200812123102.20032-1-jolsa@kernel.org
|
| |
|
|
|
|
|
|
| |
Drop repeated words in kernel/bpf/: {has, the}
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200807033141.10437-1-rdunlap@infradead.org
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a5cbe05a6673 ("bpf: Implement bpf iterator for
map elements") added bpf iterator support for
map elements. The map element bpf iterator requires
info to identify a particular map. In the above
commit, the attr->link_create.target_fd is used
to carry map_fd and an enum bpf_iter_link_info
is added to uapi to specify the target_fd actually
representing a map_fd:
enum bpf_iter_link_info {
BPF_ITER_LINK_UNSPEC = 0,
BPF_ITER_LINK_MAP_FD = 1,
MAX_BPF_ITER_LINK_INFO,
};
This is an extensible approach as we can grow
enumerator for pid, cgroup_id, etc. and we can
unionize target_fd for pid, cgroup_id, etc.
But in the future, there are chances that
more complex customization may happen, e.g.,
for tasks, it could be filtered based on
both cgroup_id and user_id.
This patch changed the uapi to have fields
__aligned_u64 iter_info;
__u32 iter_info_len;
for additional iter_info for link_create.
The iter_info is defined as
union bpf_iter_link_info {
struct {
__u32 map_fd;
} map;
};
So future extension for additional customization
will be easier. The bpf_iter_link_info will be
passed to target callback to validate and generic
bpf_iter framework does not need to deal it any
more.
Note that map_fd = 0 will be considered invalid
and -EBADF will be returned to user space.
Fixes: a5cbe05a6673 ("bpf: Implement bpf iterator for map elements")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-08-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).
The main changes are:
1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.
2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
in-kernel inspection, from Yonghong Song and Alexei Starovoitov.
3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
unwinder errors, from Song Liu.
4) Allow cgroup local storage map to be shared between programs on the same
cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.
5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
load instructions, from Jean-Philippe Brucker.
6) Follow-up fixes on BPF socket lookup in combination with reuseport group
handling. Also add related BPF selftests, from Jakub Sitnicki.
7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
socket create/release as well as bind functions, from Stanislav Fomichev.
8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
xdp_statistics, from Peilin Ye.
9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.
10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.
11) Fix a bpftool segfault due to missing program type name and make it more robust
to prevent them in future gaps, from Quentin Monnet.
12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
resolver issue, from John Fastabend.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.
This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Newline characters are added in two verifier error messages,
refactored in Commit afbf21dce668 ("bpf: Support readonly/readwrite
buffers in verifier"). This way, they do not mix with
messages afterwards.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728221801.1090349-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Entire net/core subsystem is not built without CONFIG_NET. linux/netdevice.h
just assumes that it's always there, so the easiest way to fix this is to
conditionally compile out bpf_xdp_link_attach() use in bpf/syscall.c.
Fixes: aa8d3a716b59 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728190527.110830-1-andriin@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are a couple of arguments of the boolean flag zero_size_allowed and
the char pointer buf_info when calling to function check_buffer_access that
are swapped by mistake. Fix these by swapping them to correct the argument
ordering.
Fixes: afbf21dce668 ("bpf: Support readonly/readwrite buffers in verifier")
Addresses-Coverity: ("Array compared to 0")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200727175411.155179-1-colin.king@canonical.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
BPF_LINK_CREATE command.
bpf_xdp_link is mutually exclusive with direct BPF program attachment,
previous BPF program should be detached prior to attempting to create a new
bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
can't be replaced by other BPF program attachment or link attachment. It will
be detached only when the last BPF link FD is closed.
bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
will become defunct, but won't be destroyed until last FD is closed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Architectures like s390, powerpc, arm64, riscv have speical definition of
bpf_user_pt_regs_t. So we need to cast the pointer before passing it to
bpf_get_stack(). This is similar to bpf_get_stack_tp().
Fixes: 03d42fd2d83f ("bpf: Separate bpf_get_[stack|stackid] for perf events BPF")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200724200503.3629591-1-songliubraving@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
local_storage.o has its compile guard as CONFIG_BPF_SYSCALL, which
does not imply that CONFIG_CGROUP is on. Including cgroup-internal.h
when CONFIG_CGROUP is off cause a compilation failure.
Fixes: f67cfc233706 ("bpf: Make cgroup storages shared between programs on the same cgroup")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200724211753.902969-1-zhuyifei1999@gmail.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change comes in several parts:
One, the restriction that the CGROUP_STORAGE map can only be used
by one program is removed. This results in the removal of the field
'aux' in struct bpf_cgroup_storage_map, and removal of relevant
code associated with the field, and removal of now-noop functions
bpf_free_cgroup_storage and bpf_cgroup_storage_release.
Second, we permit a key of type u64 as the key to the map.
Providing such a key type indicates that the map should ignore
attach type when comparing map keys. However, for simplicity newly
linked storage will still have the attach type at link time in
its key struct. cgroup_storage_check_btf is adapted to accept
u64 as the type of the key.
Third, because the storages are now shared, the storages cannot
be unconditionally freed on program detach. There could be two
ways to solve this issue:
* A. Reference count the usage of the storages, and free when the
last program is detached.
* B. Free only when the storage is impossible to be referred to
again, i.e. when either the cgroup_bpf it is attached to, or
the map itself, is freed.
Option A has the side effect that, when the user detach and
reattach a program, whether the program gets a fresh storage
depends on whether there is another program attached using that
storage. This could trigger races if the user is multi-threaded,
and since nondeterminism in data races is evil, go with option B.
The both the map and the cgroup_bpf now tracks their associated
storages, and the storage unlink and free are removed from
cgroup_bpf_detach and added to cgroup_bpf_release and
cgroup_storage_map_free. The latter also new holds the cgroup_mutex
to prevent any races with the former.
Fourth, on attach, we reuse the old storage if the key already
exists in the map, via cgroup_storage_lookup. If the storage
does not exist yet, we create a new one, and publish it at the
last step in the attach process. This does not create a race
condition because for the whole attach the cgroup_mutex is held.
We keep track of an array of new storages that was allocated
and if the process fails only the new storages would get freed.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bpf_get_[stack|stackid] on perf_events with precise_ip uses callchain
attached to perf_sample_data. If this callchain is not presented, do not
allow attaching BPF program that calls bpf_get_[stack|stackid] to this
event.
In the error case, -EPROTO is returned so that libbpf can identify this
error and print proper hint message.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-3-songliubraving@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, add separate version of these
two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in
bpf_perf_event_data_kern->data->callchain.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The bpf iterators for array and percpu array
are implemented. Similar to hash maps, for percpu
array map, bpf program will receive values
from all cpus.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184115.590532-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The bpf iterators for hash, percpu hash, lru hash
and lru percpu hash are implemented. During link time,
bpf_iter_reg->check_target() will check map type
and ensure the program access key/value region is
within the map defined key/value size limit.
For percpu hash and lru hash maps, the bpf program
will receive values for all cpus. The map element
bpf iterator infrastructure will prepare value
properly before passing the value pointer to the
bpf program.
This patch set supports readonly map keys and
read/write map values. It does not support deleting
map elements, e.g., from hash tables. If there is
a user case for this, the following mechanism can
be used to support map deletion for hashtab, etc.
- permit a new bpf program return value, e.g., 2,
to let bpf iterator know the map element should
be removed.
- since bucket lock is taken, the map element will be
queued.
- once bucket lock is released after all elements under
this bucket are traversed, all to-be-deleted map
elements can be deleted.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184114.590470-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
bpf_iter_meta *meta: the meta data
bpf_map *map: the bpf_map whose elements are traversed
void *key: the key of one element
void *value: the value of the same element
Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.
A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.
In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.
Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Readonly and readwrite buffer register states
are introduced. Totally four states,
PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL]
are supported. As suggested by their respective
names, PTR_TO_RDONLY_BUF[_OR_NULL] are for
readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL]
for read/write buffers.
These new register states will be used
by later bpf map element iterator.
New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.
Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or
PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.
The verifier is also enhanced such that PTR_TO_RDONLY_BUF
can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and
PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or
ARG_PTR_TO_UNINIT_MEM.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.
This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It's mostly a copy paste of commit 6086d29def80 ("bpf: Add bpf_map iterator")
that is use to implement bpf_seq_file opreations to traverse all bpf programs.
v1->v2: Tweak to use build time btf_id
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
|