cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

nf_flowtable.rst (9796B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3====================================
      4Netfilter's flowtable infrastructure
      5====================================
      6
      7This documentation describes the Netfilter flowtable infrastructure which allows
      8you to define a fastpath through the flowtable datapath. This infrastructure
      9also provides hardware offload support. The flowtable supports for the layer 3
     10IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
     11
     12Overview
     13--------
     14
     15Once the first packet of the flow successfully goes through the IP forwarding
     16path, from the second packet on, you might decide to offload the flow to the
     17flowtable through your ruleset. The flowtable infrastructure provides a rule
     18action that allows you to specify when to add a flow to the flowtable.
     19
     20A packet that finds a matching entry in the flowtable (ie. flowtable hit) is
     21transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the
     22classic IP forwarding path (the visible effect is that you do not see these
     23packets from any of the Netfilter hooks coming after ingress). In case that
     24there is no matching entry in the flowtable (ie. flowtable miss), the packet
     25follows the classic IP forwarding path.
     26
     27The flowtable uses a resizable hashtable. Lookups are based on the following
     28n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3
     29source and destination, layer 4 source and destination ports and the input
     30interface (useful in case there are several conntrack zones in place).
     31
     32The 'flow add' action allows you to populate the flowtable, the user selectively
     33specifies what flows are placed into the flowtable. Hence, packets follow the
     34classic IP forwarding path unless the user explicitly instruct flows to use this
     35new alternative forwarding path via policy.
     36
     37The flowtable datapath is represented in Fig.1, which describes the classic IP
     38forwarding path including the Netfilter hooks and the flowtable fastpath bypass.
     39
     40::
     41
     42					 userspace process
     43					  ^              |
     44					  |              |
     45				     _____|____     ____\/___
     46				    /          \   /         \
     47				    |   input   |  |  output  |
     48				    \__________/   \_________/
     49					 ^               |
     50					 |               |
     51      _________      __________      ---------     _____\/_____
     52     /         \    /          \     |Routing |   /            \
     53  -->  ingress  ---> prerouting ---> |decision|   | postrouting |--> neigh_xmit
     54     \_________/    \__________/     ----------   \____________/          ^
     55       |      ^                          |               ^                |
     56   flowtable  |                     ____\/___            |                |
     57       |      |                    /         \           |                |
     58    __\/___   |                    | forward |------------                |
     59    |-----|   |                    \_________/                            |
     60    |-----|   |                 'flow offload' rule                       |
     61    |-----|   |                   adds entry to                           |
     62    |_____|   |                     flowtable                             |
     63       |      |                                                           |
     64      / \     |                                                           |
     65     /hit\_no_|                                                           |
     66     \ ? /                                                                |
     67      \ /                                                                 |
     68       |__yes_________________fastpath bypass ____________________________|
     69
     70	       Fig.1 Netfilter hooks and flowtable interactions
     71
     72The flowtable entry also stores the NAT configuration, so all packets are
     73mangled according to the NAT policy that is specified from the classic IP
     74forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented
     75traffic is passed up to follow the classic IP forwarding path given that the
     76transport header is missing, in this case, flowtable lookups are not possible.
     77TCP RST and FIN packets are also passed up to the classic IP forwarding path to
     78release the flow gracefully. Packets that exceed the MTU are also passed up to
     79the classic forwarding path to report packet-too-big ICMP errors to the sender.
     80
     81Example configuration
     82---------------------
     83
     84Enabling the flowtable bypass is relatively easy, you only need to create a
     85flowtable and add one rule to your forward chain::
     86
     87	table inet x {
     88		flowtable f {
     89			hook ingress priority 0; devices = { eth0, eth1 };
     90		}
     91		chain y {
     92			type filter hook forward priority 0; policy accept;
     93			ip protocol tcp flow add @f
     94			counter packets 0 bytes 0
     95		}
     96	}
     97
     98This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
     99netdevices. You can create as many flowtables as you want in case you need to
    100perform resource partitioning. The flowtable priority defines the order in which
    101hooks are run in the pipeline, this is convenient in case you already have a
    102nftables ingress chain (make sure the flowtable priority is smaller than the
    103nftables ingress chain hence the flowtable runs before in the pipeline).
    104
    105The 'flow offload' action from the forward chain 'y' adds an entry to the
    106flowtable for the TCP syn-ack packet coming in the reply direction. Once the
    107flow is offloaded, you will observe that the counter rule in the example above
    108does not get updated for the packets that are being forwarded through the
    109forwarding bypass.
    110
    111You can identify offloaded flows through the [OFFLOAD] tag when listing your
    112connection tracking table.
    113
    114::
    115
    116	# conntrack -L
    117	tcp      6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
    118
    119
    120Layer 2 encapsulation
    121---------------------
    122
    123Since Linux kernel 5.13, the flowtable infrastructure discovers the real
    124netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath
    125parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the
    126VLAN ID / PPPoE session ID which are used for the flowtable lookups. The
    127flowtable datapath also deals with layer 2 decapsulation.
    128
    129You do not need to add the PPPoE and the VLAN devices to your flowtable,
    130instead the real device is sufficient for the flowtable to track your flows.
    131
    132Bridge and IP forwarding
    133------------------------
    134
    135Since Linux kernel 5.13, you can add bridge ports to the flowtable. The
    136flowtable infrastructure discovers the topology behind the bridge device. This
    137allows the flowtable to define a fastpath bypass between the bridge ports
    138(represented as eth1 and eth2 in the example figure below) and the gateway
    139device (represented as eth0) in your switch/router.
    140
    141::
    142
    143                      fastpath bypass
    144               .-------------------------.
    145              /                           \
    146              |           IP forwarding   |
    147              |          /             \ \/
    148              |       br0               eth0 ..... eth0
    149              .       / \                          *host B*
    150               -> eth1  eth2
    151                   .           *switch/router*
    152                   .
    153                   .
    154                 eth0
    155               *host A*
    156
    157The flowtable infrastructure also supports for bridge VLAN filtering actions
    158such as PVID and untagged. You can also stack a classic VLAN device on top of
    159your bridge port.
    160
    161If you would like that your flowtable defines a fastpath between your bridge
    162ports and your IP forwarding path, you have to add your bridge ports (as
    163represented by the real netdevice) to your flowtable definition.
    164
    165Counters
    166--------
    167
    168The flowtable can synchronize packet and byte counters with the existing
    169connection tracking entry by specifying the counter statement in your flowtable
    170definition, e.g.
    171
    172::
    173
    174	table inet x {
    175		flowtable f {
    176			hook ingress priority 0; devices = { eth0, eth1 };
    177			counter
    178		}
    179	}
    180
    181Counter support is available since Linux kernel 5.7.
    182
    183Hardware offload
    184----------------
    185
    186If your network device provides hardware offload support, you can turn it on by
    187means of the 'offload' flag in your flowtable definition, e.g.
    188
    189::
    190
    191	table inet x {
    192		flowtable f {
    193			hook ingress priority 0; devices = { eth0, eth1 };
    194			flags offload;
    195		}
    196	}
    197
    198There is a workqueue that adds the flows to the hardware. Note that a few
    199packets might still run over the flowtable software path until the workqueue has
    200a chance to offload the flow to the network device.
    201
    202You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when
    203listing your connection tracking table. Please, note that the [OFFLOAD] tag
    204refers to the software offload mode, so there is a distinction between [OFFLOAD]
    205which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers
    206to the hardware offload datapath being used by the flow.
    207
    208The flowtable hardware offload infrastructure also supports for the DSA
    209(Distributed Switch Architecture).
    210
    211Limitations
    212-----------
    213
    214The flowtable behaves like a cache. The flowtable entries might get stale if
    215either the destination MAC address or the egress netdevice that is used for
    216transmission changes.
    217
    218This might be a problem if:
    219
    220- You run the flowtable in software mode and you combine bridge and IP
    221  forwarding in your setup.
    222- Hardware offload is enabled.
    223
    224More reading
    225------------
    226
    227This documentation is based on the LWN.net articles [1]_\ [2]_. Rafal Milecki
    228also made a very complete and comprehensive summary called "A state of network
    229acceleration" that describes how things were before this infrastructure was
    230mainlined [3]_ and it also makes a rough summary of this work [4]_.
    231
    232.. [1] https://lwn.net/Articles/738214/
    233.. [2] https://lwn.net/Articles/742164/
    234.. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
    235.. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html