Merge branch 'bpf-reuseport-map'

Martin KaFai Lau says: ==================== This series introduces a new map type "BPF_MAP_TYPE_REUSEPORT_SOCKARRAY" and a new prog type BPF_PROG_TYPE_SK_REUSEPORT. Here is a snippet from a commit message: "To unleash the full potential of a bpf prog, it is essential for the userspace to be capable of directly setting up a bpf map which can then be consumed by the bpf prog to make decision. In this case, decide which SO_REUSEPORT sk to serve the incoming request. By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control and visibility on where a SO_REUSEPORT sk should be located in a bpf map. The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that the bpf prog can directly select a sk from the bpf map. That will raise the programmability of the bpf prog attached to a reuseport group (a group of sk serving the same IP:PORT). For example, in UDP, the bpf prog can peek into the payload (e.g. through the "data" pointer introduced in the later patch) to learn the application level's connection information and then decide which sk to pick from a bpf map. The userspace can tightly couple the sk's location in a bpf map with the application logic in generating the UDP payload's connection information. This connection info contact/API stays within the userspace. Also, when used with map-in-map, the userspace can switch the old-server-process's inner map to a new-server-process's inner map in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)". The bpf prog will then direct incoming requests to the new process instead of the old process. The old process can finish draining the pending requests (e.g. by "accept()") before closing the old-fds. [Note that deleting a fd from a bpf map does not necessary mean the fd is closed]" ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
author: Daniel Borkmann <daniel@iogearbox.net> 2018-08-11 01:58:47 +0200
committer: Daniel Borkmann <daniel@iogearbox.net> 2018-08-11 01:58:48 +0200
commit: 9d6f417714c3aaf67b23ffdc1d2b036cce3ecc1c (patch)
tree: 4bc949c2a15c2c0d1cdbb390740f508fe7644623 /include/uapi/linux
parent: 74b247f4c36315e5c08580700a68e0eb3b72de03 (diff)
parent: 91134d849a0e8fbc70b8607d280e0d325dcaf7bb (diff)
download: cachepc-linux-9d6f417714c3aaf67b23ffdc1d2b036cce3ecc1c.tar.gz
cachepc-linux-9d6f417714c3aaf67b23ffdc1d2b036cce3ecc1c.zip
1 files changed, 36 insertions, 1 deletions
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dd5758dc35d3..3102a2a23c31 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -126,6 +126,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_XSKMAP,
 	BPF_MAP_TYPE_SOCKHASH,
 	BPF_MAP_TYPE_CGROUP_STORAGE,
+	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
 };
 
 enum bpf_prog_type {
@@ -150,6 +151,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 	BPF_PROG_TYPE_LWT_SEG6LOCAL,
 	BPF_PROG_TYPE_LIRC_MODE2,
+	BPF_PROG_TYPE_SK_REUSEPORT,
 };
 
 enum bpf_attach_type {
@@ -2113,6 +2115,14 @@ union bpf_attr {
  *		the shared data.
  *	Return
  *		Pointer to the local storage area.
+ *
+ * int bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, struct bpf_map *map, void *key, u64 flags)
+ *	Description
+ *		Select a SO_REUSEPORT sk from a	BPF_MAP_TYPE_REUSEPORT_ARRAY map
+ *		It checks the selected sk is matching the incoming
+ *		request in the skb.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2196,7 +2206,8 @@ union bpf_attr {
 	FN(rc_keydown),			\
 	FN(skb_cgroup_id),		\
 	FN(get_current_cgroup_id),	\
-	FN(get_local_storage),
+	FN(get_local_storage),		\
+	FN(sk_select_reuseport),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2413,6 +2424,30 @@ struct sk_msg_md {
 	__u32 local_port;	/* stored in host byte order */
 };
 
+struct sk_reuseport_md {
+	/*
+	 * Start of directly accessible data. It begins from
+	 * the tcp/udp header.
+	 */
+	void *data;
+	void *data_end;		/* End of directly accessible data */
+	/*
+	 * Total length of packet (starting from the tcp/udp header).
+	 * Note that the directly accessible bytes (data_end - data)
+	 * could be less than this "len".  Those bytes could be
+	 * indirectly read by a helper "bpf_skb_load_bytes()".
+	 */
+	__u32 len;
+	/*
+	 * Eth protocol in the mac header (network byte order). e.g.
+	 * ETH_P_IP(0x0800) and ETH_P_IPV6(0x86DD)
+	 */
+	__u32 eth_protocol;
+	__u32 ip_protocol;	/* IP protocol. e.g. IPPROTO_TCP, IPPROTO_UDP */
+	__u32 bind_inany;	/* Is sock bound to an INANY address? */
+	__u32 hash;		/* A hash of the packet 4 tuples */
+};
+
 #define BPF_TAG_SIZE	8
 
 struct bpf_prog_info {
author	Daniel Borkmann <daniel@iogearbox.net>	2018-08-11 01:58:47 +0200
committer	Daniel Borkmann <daniel@iogearbox.net>	2018-08-11 01:58:48 +0200
commit	9d6f417714c3aaf67b23ffdc1d2b036cce3ecc1c (patch)
tree	4bc949c2a15c2c0d1cdbb390740f508fe7644623 /include/uapi/linux
parent	74b247f4c36315e5c08580700a68e0eb3b72de03 (diff)
parent	91134d849a0e8fbc70b8607d280e0d325dcaf7bb (diff)
download	cachepc-linux-9d6f417714c3aaf67b23ffdc1d2b036cce3ecc1c.tar.gz cachepc-linux-9d6f417714c3aaf67b23ffdc1d2b036cce3ecc1c.zip