cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

vrf.rst (16853B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3====================================
      4Virtual Routing and Forwarding (VRF)
      5====================================
      6
      7The VRF Device
      8==============
      9
     10The VRF device combined with ip rules provides the ability to create virtual
     11routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the
     12Linux network stack. One use case is the multi-tenancy problem where each
     13tenant has their own unique routing tables and in the very least need
     14different default gateways.
     15
     16Processes can be "VRF aware" by binding a socket to the VRF device. Packets
     17through the socket then use the routing table associated with the VRF
     18device. An important feature of the VRF device implementation is that it
     19impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected
     20(ie., they do not need to be run in each VRF). The design also allows
     21the use of higher priority ip rules (Policy Based Routing, PBR) to take
     22precedence over the VRF device rules directing specific traffic as desired.
     23
     24In addition, VRF devices allow VRFs to be nested within namespaces. For
     25example network namespaces provide separation of network interfaces at the
     26device layer, VLANs on the interfaces within a namespace provide L2 separation
     27and then VRF devices provide L3 separation.
     28
     29Design
     30------
     31A VRF device is created with an associated route table. Network interfaces
     32are then enslaved to a VRF device::
     33
     34	 +-----------------------------+
     35	 |           vrf-blue          |  ===> route table 10
     36	 +-----------------------------+
     37	    |        |            |
     38	 +------+ +------+     +-------------+
     39	 | eth1 | | eth2 | ... |    bond1    |
     40	 +------+ +------+     +-------------+
     41				  |       |
     42			      +------+ +------+
     43			      | eth8 | | eth9 |
     44			      +------+ +------+
     45
     46Packets received on an enslaved device and are switched to the VRF device
     47in the IPv4 and IPv6 processing stacks giving the impression that packets
     48flow through the VRF device. Similarly on egress routing rules are used to
     49send packets to the VRF device driver before getting sent out the actual
     50interface. This allows tcpdump on a VRF device to capture all packets into
     51and out of the VRF as a whole\ [1]_. Similarly, netfilter\ [2]_ and tc rules
     52can be applied using the VRF device to specify rules that apply to the VRF
     53domain as a whole.
     54
     55.. [1] Packets in the forwarded state do not flow through the device, so those
     56       packets are not seen by tcpdump. Will revisit this limitation in a
     57       future release.
     58
     59.. [2] Iptables on ingress supports PREROUTING with skb->dev set to the real
     60       ingress device and both INPUT and PREROUTING rules with skb->dev set to
     61       the VRF device. For egress POSTROUTING and OUTPUT rules can be written
     62       using either the VRF device or real egress device.
     63
     64Setup
     65-----
     661. VRF device is created with an association to a FIB table.
     67   e.g,::
     68
     69	ip link add vrf-blue type vrf table 10
     70	ip link set dev vrf-blue up
     71
     722. An l3mdev FIB rule directs lookups to the table associated with the device.
     73   A single l3mdev rule is sufficient for all VRFs. The VRF device adds the
     74   l3mdev rule for IPv4 and IPv6 when the first device is created with a
     75   default preference of 1000. Users may delete the rule if desired and add
     76   with a different priority or install per-VRF rules.
     77
     78   Prior to the v4.8 kernel iif and oif rules are needed for each VRF device::
     79
     80       ip ru add oif vrf-blue table 10
     81       ip ru add iif vrf-blue table 10
     82
     833. Set the default route for the table (and hence default route for the VRF)::
     84
     85       ip route add table 10 unreachable default metric 4278198272
     86
     87   This high metric value ensures that the default unreachable route can
     88   be overridden by a routing protocol suite.  FRRouting interprets
     89   kernel metrics as a combined admin distance (upper byte) and priority
     90   (lower 3 bytes).  Thus the above metric translates to [255/8192].
     91
     924. Enslave L3 interfaces to a VRF device::
     93
     94       ip link set dev eth1 master vrf-blue
     95
     96   Local and connected routes for enslaved devices are automatically moved to
     97   the table associated with VRF device. Any additional routes depending on
     98   the enslaved device are dropped and will need to be reinserted to the VRF
     99   FIB table following the enslavement.
    100
    101   The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global
    102   addresses as VRF enslavement changes::
    103
    104       sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
    105
    1065. Additional VRF routes are added to associated table::
    107
    108       ip route add table 10 ...
    109
    110
    111Applications
    112------------
    113Applications that are to work within a VRF need to bind their socket to the
    114VRF device::
    115
    116    setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
    117
    118or to specify the output device using cmsg and IP_PKTINFO.
    119
    120By default the scope of the port bindings for unbound sockets is
    121limited to the default VRF. That is, it will not be matched by packets
    122arriving on interfaces enslaved to an l3mdev and processes may bind to
    123the same port if they bind to an l3mdev.
    124
    125TCP & UDP services running in the default VRF context (ie., not bound
    126to any VRF device) can work across all VRF domains by enabling the
    127tcp_l3mdev_accept and udp_l3mdev_accept sysctl options::
    128
    129    sysctl -w net.ipv4.tcp_l3mdev_accept=1
    130    sysctl -w net.ipv4.udp_l3mdev_accept=1
    131
    132These options are disabled by default so that a socket in a VRF is only
    133selected for packets in that VRF. There is a similar option for RAW
    134sockets, which is enabled by default for reasons of backwards compatibility.
    135This is so as to specify the output device with cmsg and IP_PKTINFO, but
    136using a socket not bound to the corresponding VRF. This allows e.g. older ping
    137implementations to be run with specifying the device but without executing it
    138in the VRF. This option can be disabled so that packets received in a VRF
    139context are only handled by a raw socket bound to the VRF, and packets in the
    140default VRF are only handled by a socket not bound to any VRF::
    141
    142    sysctl -w net.ipv4.raw_l3mdev_accept=0
    143
    144netfilter rules on the VRF device can be used to limit access to services
    145running in the default VRF context as well.
    146
    147Using VRF-aware applications (applications which simultaneously create sockets
    148outside and inside VRFs) in conjunction with ``net.ipv4.tcp_l3mdev_accept=1``
    149is possible but may lead to problems in some situations. With that sysctl
    150value, it is unspecified which listening socket will be selected to handle
    151connections for VRF traffic; ie. either a socket bound to the VRF or an unbound
    152socket may be used to accept new connections from a VRF. This somewhat
    153unexpected behavior can lead to problems if sockets are configured with extra
    154options (ex. TCP MD5 keys) with the expectation that VRF traffic will
    155exclusively be handled by sockets bound to VRFs, as would be the case with
    156``net.ipv4.tcp_l3mdev_accept=0``. Finally and as a reminder, regardless of
    157which listening socket is selected, established sockets will be created in the
    158VRF based on the ingress interface, as documented earlier.
    159
    160--------------------------------------------------------------------------------
    161
    162Using iproute2 for VRFs
    163=======================
    164iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this
    165section lists both commands where appropriate -- with the vrf keyword and the
    166older form without it.
    167
    1681. Create a VRF
    169
    170   To instantiate a VRF device and associate it with a table::
    171
    172       $ ip link add dev NAME type vrf table ID
    173
    174   As of v4.8 the kernel supports the l3mdev FIB rule where a single rule
    175   covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first
    176   device create.
    177
    1782. List VRFs
    179
    180   To list VRFs that have been created::
    181
    182       $ ip [-d] link show type vrf
    183	 NOTE: The -d option is needed to show the table id
    184
    185   For example::
    186
    187       $ ip -d link show type vrf
    188       11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    189	   link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0
    190	   vrf table 1 addrgenmode eui64
    191       12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    192	   link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0
    193	   vrf table 10 addrgenmode eui64
    194       13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    195	   link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0
    196	   vrf table 66 addrgenmode eui64
    197       14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    198	   link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0
    199	   vrf table 81 addrgenmode eui64
    200
    201
    202   Or in brief output::
    203
    204       $ ip -br link show type vrf
    205       mgmt         UP             72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP>
    206       red          UP             b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP>
    207       blue         UP             36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP>
    208       green        UP             e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP>
    209
    210
    2113. Assign a Network Interface to a VRF
    212
    213   Network interfaces are assigned to a VRF by enslaving the netdevice to a
    214   VRF device::
    215
    216       $ ip link set dev NAME master NAME
    217
    218   On enslavement connected and local routes are automatically moved to the
    219   table associated with the VRF device.
    220
    221   For example::
    222
    223       $ ip link set dev eth0 master mgmt
    224
    225
    2264. Show Devices Assigned to a VRF
    227
    228   To show devices that have been assigned to a specific VRF add the master
    229   option to the ip command::
    230
    231       $ ip link show vrf NAME
    232       $ ip link show master NAME
    233
    234   For example::
    235
    236       $ ip link show vrf red
    237       3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
    238	   link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
    239       4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
    240	   link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
    241       7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000
    242	   link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
    243
    244
    245   Or using the brief output::
    246
    247       $ ip -br link show vrf red
    248       eth1             UP             02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
    249       eth2             UP             02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP>
    250       eth5             DOWN           02:00:00:00:02:06 <BROADCAST,MULTICAST>
    251
    252
    2535. Show Neighbor Entries for a VRF
    254
    255   To list neighbor entries associated with devices enslaved to a VRF device
    256   add the master option to the ip command::
    257
    258       $ ip [-6] neigh show vrf NAME
    259       $ ip [-6] neigh show master NAME
    260
    261   For example::
    262
    263       $  ip neigh show vrf red
    264       10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
    265       10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE
    266
    267       $ ip -6 neigh show vrf red
    268       2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
    269
    270
    2716. Show Addresses for a VRF
    272
    273   To show addresses for interfaces associated with a VRF add the master
    274   option to the ip command::
    275
    276       $ ip addr show vrf NAME
    277       $ ip addr show master NAME
    278
    279   For example::
    280
    281	$ ip addr show vrf red
    282	3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
    283	    link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
    284	    inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1
    285	       valid_lft forever preferred_lft forever
    286	    inet6 2002:1::2/120 scope global
    287	       valid_lft forever preferred_lft forever
    288	    inet6 fe80::ff:fe00:202/64 scope link
    289	       valid_lft forever preferred_lft forever
    290	4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
    291	    link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
    292	    inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2
    293	       valid_lft forever preferred_lft forever
    294	    inet6 2002:2::2/120 scope global
    295	       valid_lft forever preferred_lft forever
    296	    inet6 fe80::ff:fe00:203/64 scope link
    297	       valid_lft forever preferred_lft forever
    298	7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000
    299	    link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
    300
    301   Or in brief format::
    302
    303	$ ip -br addr show vrf red
    304	eth1             UP             10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64
    305	eth2             UP             10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64
    306	eth5             DOWN
    307
    308
    3097. Show Routes for a VRF
    310
    311   To show routes for a VRF use the ip command to display the table associated
    312   with the VRF device::
    313
    314       $ ip [-6] route show vrf NAME
    315       $ ip [-6] route show table ID
    316
    317   For example::
    318
    319	$ ip route show vrf red
    320	unreachable default  metric 4278198272
    321	broadcast 10.2.1.0 dev eth1  proto kernel  scope link  src 10.2.1.2
    322	10.2.1.0/24 dev eth1  proto kernel  scope link  src 10.2.1.2
    323	local 10.2.1.2 dev eth1  proto kernel  scope host  src 10.2.1.2
    324	broadcast 10.2.1.255 dev eth1  proto kernel  scope link  src 10.2.1.2
    325	broadcast 10.2.2.0 dev eth2  proto kernel  scope link  src 10.2.2.2
    326	10.2.2.0/24 dev eth2  proto kernel  scope link  src 10.2.2.2
    327	local 10.2.2.2 dev eth2  proto kernel  scope host  src 10.2.2.2
    328	broadcast 10.2.2.255 dev eth2  proto kernel  scope link  src 10.2.2.2
    329
    330	$ ip -6 route show vrf red
    331	local 2002:1:: dev lo  proto none  metric 0  pref medium
    332	local 2002:1::2 dev lo  proto none  metric 0  pref medium
    333	2002:1::/120 dev eth1  proto kernel  metric 256  pref medium
    334	local 2002:2:: dev lo  proto none  metric 0  pref medium
    335	local 2002:2::2 dev lo  proto none  metric 0  pref medium
    336	2002:2::/120 dev eth2  proto kernel  metric 256  pref medium
    337	local fe80:: dev lo  proto none  metric 0  pref medium
    338	local fe80:: dev lo  proto none  metric 0  pref medium
    339	local fe80::ff:fe00:202 dev lo  proto none  metric 0  pref medium
    340	local fe80::ff:fe00:203 dev lo  proto none  metric 0  pref medium
    341	fe80::/64 dev eth1  proto kernel  metric 256  pref medium
    342	fe80::/64 dev eth2  proto kernel  metric 256  pref medium
    343	ff00::/8 dev red  metric 256  pref medium
    344	ff00::/8 dev eth1  metric 256  pref medium
    345	ff00::/8 dev eth2  metric 256  pref medium
    346	unreachable default dev lo  metric 4278198272  error -101 pref medium
    347
    3488. Route Lookup for a VRF
    349
    350   A test route lookup can be done for a VRF::
    351
    352       $ ip [-6] route get vrf NAME ADDRESS
    353       $ ip [-6] route get oif NAME ADDRESS
    354
    355   For example::
    356
    357	$ ip route get 10.2.1.40 vrf red
    358	10.2.1.40 dev eth1  table red  src 10.2.1.2
    359	    cache
    360
    361	$ ip -6 route get 2002:1::32 vrf red
    362	2002:1::32 from :: dev eth1  table red  proto kernel  src 2002:1::2  metric 256  pref medium
    363
    364
    3659. Removing Network Interface from a VRF
    366
    367   Network interfaces are removed from a VRF by breaking the enslavement to
    368   the VRF device::
    369
    370       $ ip link set dev NAME nomaster
    371
    372   Connected routes are moved back to the default table and local entries are
    373   moved to the local table.
    374
    375   For example::
    376
    377    $ ip link set dev eth0 nomaster
    378
    379--------------------------------------------------------------------------------
    380
    381Commands used in this example::
    382
    383     cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF
    384     1  mgmt
    385     10 red
    386     66 blue
    387     81 green
    388     EOF
    389
    390     function vrf_create
    391     {
    392	 VRF=$1
    393	 TBID=$2
    394
    395	 # create VRF device
    396	 ip link add ${VRF} type vrf table ${TBID}
    397
    398	 if [ "${VRF}" != "mgmt" ]; then
    399	     ip route add table ${TBID} unreachable default metric 4278198272
    400	 fi
    401	 ip link set dev ${VRF} up
    402     }
    403
    404     vrf_create mgmt 1
    405     ip link set dev eth0 master mgmt
    406
    407     vrf_create red 10
    408     ip link set dev eth1 master red
    409     ip link set dev eth2 master red
    410     ip link set dev eth5 master red
    411
    412     vrf_create blue 66
    413     ip link set dev eth3 master blue
    414
    415     vrf_create green 81
    416     ip link set dev eth4 master green
    417
    418
    419     Interface addresses from /etc/network/interfaces:
    420     auto eth0
    421     iface eth0 inet static
    422	   address 10.0.0.2
    423	   netmask 255.255.255.0
    424	   gateway 10.0.0.254
    425
    426     iface eth0 inet6 static
    427	   address 2000:1::2
    428	   netmask 120
    429
    430     auto eth1
    431     iface eth1 inet static
    432	   address 10.2.1.2
    433	   netmask 255.255.255.0
    434
    435     iface eth1 inet6 static
    436	   address 2002:1::2
    437	   netmask 120
    438
    439     auto eth2
    440     iface eth2 inet static
    441	   address 10.2.2.2
    442	   netmask 255.255.255.0
    443
    444     iface eth2 inet6 static
    445	   address 2002:2::2
    446	   netmask 120
    447
    448     auto eth3
    449     iface eth3 inet static
    450	   address 10.2.3.2
    451	   netmask 255.255.255.0
    452
    453     iface eth3 inet6 static
    454	   address 2002:3::2
    455	   netmask 120
    456
    457     auto eth4
    458     iface eth4 inet static
    459	   address 10.2.4.2
    460	   netmask 255.255.255.0
    461
    462     iface eth4 inet6 static
    463	   address 2002:4::2
    464	   netmask 120