cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

ipoib.rst (4524B)


      1==================
      2IP over InfiniBand
      3==================
      4
      5  The ib_ipoib driver is an implementation of the IP over InfiniBand
      6  protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
      7  working group.  It is a "native" implementation in the sense of
      8  setting the interface type to ARPHRD_INFINIBAND and the hardware
      9  address length to 20 (earlier proprietary implementations
     10  masqueraded to the kernel as ethernet interfaces).
     11
     12Partitions and P_Keys
     13=====================
     14
     15  When the IPoIB driver is loaded, it creates one interface for each
     16  port using the P_Key at index 0.  To create an interface with a
     17  different P_Key, write the desired P_Key into the main interface's
     18  /sys/class/net/<intf name>/create_child file.  For example::
     19
     20    echo 0x8001 > /sys/class/net/ib0/create_child
     21
     22  This will create an interface named ib0.8001 with P_Key 0x8001.  To
     23  remove a subinterface, use the "delete_child" file::
     24
     25    echo 0x8001 > /sys/class/net/ib0/delete_child
     26
     27  The P_Key for any interface is given by the "pkey" file, and the
     28  main interface for a subinterface is in "parent."
     29
     30  Child interface create/delete can also be done using IPoIB's
     31  rtnl_link_ops, where children created using either way behave the same.
     32
     33Datagram vs Connected modes
     34===========================
     35
     36  The IPoIB driver supports two modes of operation: datagram and
     37  connected.  The mode is set and read through an interface's
     38  /sys/class/net/<intf name>/mode file.
     39
     40  In datagram mode, the IB UD (Unreliable Datagram) transport is used
     41  and so the interface MTU has is equal to the IB L2 MTU minus the
     42  IPoIB encapsulation header (4 bytes).  For example, in a typical IB
     43  fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes.
     44
     45  In connected mode, the IB RC (Reliable Connected) transport is used.
     46  Connected mode takes advantage of the connected nature of the IB
     47  transport and allows an MTU up to the maximal IP packet size of 64K,
     48  which reduces the number of IP packets needed for handling large UDP
     49  datagrams, TCP segments, etc and increases the performance for large
     50  messages.
     51
     52  In connected mode, the interface's UD QP is still used for multicast
     53  and communication with peers that don't support connected mode. In
     54  this case, RX emulation of ICMP PMTU packets is used to cause the
     55  networking stack to use the smaller UD MTU for these neighbours.
     56
     57Stateless offloads
     58==================
     59
     60  If the IB HW supports IPoIB stateless offloads, IPoIB advertises
     61  TCP/IP checksum and/or Large Send (LSO) offloading capability to the
     62  network stack.
     63
     64  Large Receive (LRO) offloading is also implemented and may be turned
     65  on/off using ethtool calls.  Currently LRO is supported only for
     66  checksum offload capable devices.
     67
     68  Stateless offloads are supported only in datagram mode.
     69
     70Interrupt moderation
     71====================
     72
     73  If the underlying IB device supports CQ event moderation, one can
     74  use ethtool to set interrupt mitigation parameters and thus reduce
     75  the overhead incurred by handling interrupts.  The main code path of
     76  IPoIB doesn't use events for TX completion signaling so only RX
     77  moderation is supported.
     78
     79Debugging Information
     80=====================
     81
     82  By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
     83  to 'y', tracing messages are compiled into the driver.  They are
     84  turned on by setting the module parameters debug_level and
     85  mcast_debug_level to 1.  These parameters can be controlled at
     86  runtime through files in /sys/module/ib_ipoib/.
     87
     88  CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
     89  virtual filesystem.  By mounting this filesystem, for example with::
     90
     91    mount -t debugfs none /sys/kernel/debug
     92
     93  it is possible to get statistics about multicast groups from the
     94  files /sys/kernel/debug/ipoib/ib0_mcg and so on.
     95
     96  The performance impact of this option is negligible, so it
     97  is safe to enable this option with debug_level set to 0 for normal
     98  operation.
     99
    100  CONFIG_INFINIBAND_IPOIB_DEBUG_DATA enables even more debug output in
    101  the data path when data_debug_level is set to 1.  However, even with
    102  the output disabled, enabling this configuration option will affect
    103  performance, because it adds tests to the fast path.
    104
    105References
    106==========
    107
    108  Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
    109    http://ietf.org/rfc/rfc4391.txt
    110
    111  IP over InfiniBand (IPoIB) Architecture (RFC 4392)
    112    http://ietf.org/rfc/rfc4392.txt
    113
    114  IP over InfiniBand: Connected Mode (RFC 4755)
    115    http://ietf.org/rfc/rfc4755.txt