cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

openvswitch.rst (11760B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=============================================
      4Open vSwitch datapath developer documentation
      5=============================================
      6
      7The Open vSwitch kernel module allows flexible userspace control over
      8flow-level packet processing on selected network devices.  It can be
      9used to implement a plain Ethernet switch, network device bonding,
     10VLAN processing, network access control, flow-based network control,
     11and so on.
     12
     13The kernel module implements multiple "datapaths" (analogous to
     14bridges), each of which can have multiple "vports" (analogous to ports
     15within a bridge).  Each datapath also has associated with it a "flow
     16table" that userspace populates with "flows" that map from keys based
     17on packet headers and metadata to sets of actions.  The most common
     18action forwards the packet to another vport; other actions are also
     19implemented.
     20
     21When a packet arrives on a vport, the kernel module processes it by
     22extracting its flow key and looking it up in the flow table.  If there
     23is a matching flow, it executes the associated actions.  If there is
     24no match, it queues the packet to userspace for processing (as part of
     25its processing, userspace will likely set up a flow to handle further
     26packets of the same type entirely in-kernel).
     27
     28
     29Flow key compatibility
     30----------------------
     31
     32Network protocols evolve over time.  New protocols become important
     33and existing protocols lose their prominence.  For the Open vSwitch
     34kernel module to remain relevant, it must be possible for newer
     35versions to parse additional protocols as part of the flow key.  It
     36might even be desirable, someday, to drop support for parsing
     37protocols that have become obsolete.  Therefore, the Netlink interface
     38to Open vSwitch is designed to allow carefully written userspace
     39applications to work with any version of the flow key, past or future.
     40
     41To support this forward and backward compatibility, whenever the
     42kernel module passes a packet to userspace, it also passes along the
     43flow key that it parsed from the packet.  Userspace then extracts its
     44own notion of a flow key from the packet and compares it against the
     45kernel-provided version:
     46
     47    - If userspace's notion of the flow key for the packet matches the
     48      kernel's, then nothing special is necessary.
     49
     50    - If the kernel's flow key includes more fields than the userspace
     51      version of the flow key, for example if the kernel decoded IPv6
     52      headers but userspace stopped at the Ethernet type (because it
     53      does not understand IPv6), then again nothing special is
     54      necessary.  Userspace can still set up a flow in the usual way,
     55      as long as it uses the kernel-provided flow key to do it.
     56
     57    - If the userspace flow key includes more fields than the
     58      kernel's, for example if userspace decoded an IPv6 header but
     59      the kernel stopped at the Ethernet type, then userspace can
     60      forward the packet manually, without setting up a flow in the
     61      kernel.  This case is bad for performance because every packet
     62      that the kernel considers part of the flow must go to userspace,
     63      but the forwarding behavior is correct.  (If userspace can
     64      determine that the values of the extra fields would not affect
     65      forwarding behavior, then it could set up a flow anyway.)
     66
     67How flow keys evolve over time is important to making this work, so
     68the following sections go into detail.
     69
     70
     71Flow key format
     72---------------
     73
     74A flow key is passed over a Netlink socket as a sequence of Netlink
     75attributes.  Some attributes represent packet metadata, defined as any
     76information about a packet that cannot be extracted from the packet
     77itself, e.g. the vport on which the packet was received.  Most
     78attributes, however, are extracted from headers within the packet,
     79e.g. source and destination addresses from Ethernet, IP, or TCP
     80headers.
     81
     82The <linux/openvswitch.h> header file defines the exact format of the
     83flow key attributes.  For informal explanatory purposes here, we write
     84them as comma-separated strings, with parentheses indicating arguments
     85and nesting.  For example, the following could represent a flow key
     86corresponding to a TCP packet that arrived on vport 1::
     87
     88    in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
     89    eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
     90    frag=no), tcp(src=49163, dst=80)
     91
     92Often we ellipsize arguments not important to the discussion, e.g.::
     93
     94    in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
     95
     96
     97Wildcarded flow key format
     98--------------------------
     99
    100A wildcarded flow is described with two sequences of Netlink attributes
    101passed over the Netlink socket. A flow key, exactly as described above, and an
    102optional corresponding flow mask.
    103
    104A wildcarded flow can represent a group of exact match flows. Each '1' bit
    105in the mask specifies a exact match with the corresponding bit in the flow key.
    106A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit
    107of a incoming packet. Using wildcarded flow can improve the flow set up rate
    108by reduce the number of new flows need to be processed by the user space program.
    109
    110Support for the mask Netlink attribute is optional for both the kernel and user
    111space program. The kernel can ignore the mask attribute, installing an exact
    112match flow, or reduce the number of don't care bits in the kernel to less than
    113what was specified by the user space program. In this case, variations in bits
    114that the kernel does not implement will simply result in additional flow setups.
    115The kernel module will also work with user space programs that neither support
    116nor supply flow mask attributes.
    117
    118Since the kernel may ignore or modify wildcard bits, it can be difficult for
    119the userspace program to know exactly what matches are installed. There are
    120two possible approaches: reactively install flows as they miss the kernel
    121flow table (and therefore not attempt to determine wildcard changes at all)
    122or use the kernel's response messages to determine the installed wildcards.
    123
    124When interacting with userspace, the kernel should maintain the match portion
    125of the key exactly as originally installed. This will provides a handle to
    126identify the flow for all future operations. However, when reporting the
    127mask of an installed flow, the mask should include any restrictions imposed
    128by the kernel.
    129
    130The behavior when using overlapping wildcarded flows is undefined. It is the
    131responsibility of the user space program to ensure that any incoming packet
    132can match at most one flow, wildcarded or not. The current implementation
    133performs best-effort detection of overlapping wildcarded flows and may reject
    134some but not all of them. However, this behavior may change in future versions.
    135
    136
    137Unique flow identifiers
    138-----------------------
    139
    140An alternative to using the original match portion of a key as the handle for
    141flow identification is a unique flow identifier, or "UFID". UFIDs are optional
    142for both the kernel and user space program.
    143
    144User space programs that support UFID are expected to provide it during flow
    145setup in addition to the flow, then refer to the flow using the UFID for all
    146future operations. The kernel is not required to index flows by the original
    147flow key if a UFID is specified.
    148
    149
    150Basic rule for evolving flow keys
    151---------------------------------
    152
    153Some care is needed to really maintain forward and backward
    154compatibility for applications that follow the rules listed under
    155"Flow key compatibility" above.
    156
    157The basic rule is obvious::
    158
    159    ==================================================================
    160    New network protocol support must only supplement existing flow
    161    key attributes.  It must not change the meaning of already defined
    162    flow key attributes.
    163    ==================================================================
    164
    165This rule does have less-obvious consequences so it is worth working
    166through a few examples.  Suppose, for example, that the kernel module
    167did not already implement VLAN parsing.  Instead, it just interpreted
    168the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
    169packet.  The flow key for any packet with an 802.1Q header would look
    170essentially like this, ignoring metadata::
    171
    172    eth(...), eth_type(0x8100)
    173
    174Naively, to add VLAN support, it makes sense to add a new "vlan" flow
    175key attribute to contain the VLAN tag, then continue to decode the
    176encapsulated headers beyond the VLAN tag using the existing field
    177definitions.  With this change, a TCP packet in VLAN 10 would have a
    178flow key much like this::
    179
    180    eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
    181
    182But this change would negatively affect a userspace application that
    183has not been updated to understand the new "vlan" flow key attribute.
    184The application could, following the flow compatibility rules above,
    185ignore the "vlan" attribute that it does not understand and therefore
    186assume that the flow contained IP packets.  This is a bad assumption
    187(the flow only contains IP packets if one parses and skips over the
    188802.1Q header) and it could cause the application's behavior to change
    189across kernel versions even though it follows the compatibility rules.
    190
    191The solution is to use a set of nested attributes.  This is, for
    192example, why 802.1Q support uses nested attributes.  A TCP packet in
    193VLAN 10 is actually expressed as::
    194
    195    eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
    196    ip(proto=6, ...), tcp(...)))
    197
    198Notice how the "eth_type", "ip", and "tcp" flow key attributes are
    199nested inside the "encap" attribute.  Thus, an application that does
    200not understand the "vlan" key will not see either of those attributes
    201and therefore will not misinterpret them.  (Also, the outer eth_type
    202is still 0x8100, not changed to 0x0800.)
    203
    204Handling malformed packets
    205--------------------------
    206
    207Don't drop packets in the kernel for malformed protocol headers, bad
    208checksums, etc.  This would prevent userspace from implementing a
    209simple Ethernet switch that forwards every packet.
    210
    211Instead, in such a case, include an attribute with "empty" content.
    212It doesn't matter if the empty content could be valid protocol values,
    213as long as those values are rarely seen in practice, because userspace
    214can always forward all packets with those values to userspace and
    215handle them individually.
    216
    217For example, consider a packet that contains an IP header that
    218indicates protocol 6 for TCP, but which is truncated just after the IP
    219header, so that the TCP header is missing.  The flow key for this
    220packet would include a tcp attribute with all-zero src and dst, like
    221this::
    222
    223    eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
    224
    225As another example, consider a packet with an Ethernet type of 0x8100,
    226indicating that a VLAN TCI should follow, but which is truncated just
    227after the Ethernet type.  The flow key for this packet would include
    228an all-zero-bits vlan and an empty encap attribute, like this::
    229
    230    eth(...), eth_type(0x8100), vlan(0), encap()
    231
    232Unlike a TCP packet with source and destination ports 0, an
    233all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
    234VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
    235attribute expressly to allow this situation to be distinguished.
    236Thus, the flow key in this second example unambiguously indicates a
    237missing or malformed VLAN TCI.
    238
    239Other rules
    240-----------
    241
    242The other rules for flow keys are much less subtle:
    243
    244    - Duplicate attributes are not allowed at a given nesting level.
    245
    246    - Ordering of attributes is not significant.
    247
    248    - When the kernel sends a given flow key to userspace, it always
    249      composes it the same way.  This allows userspace to hash and
    250      compare entire flow keys that it may not be able to fully
    251      interpret.