cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

devlink-port.rst (10434B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3.. _devlink_port:
      4
      5============
      6Devlink Port
      7============
      8
      9``devlink-port`` is a port that exists on the device. It has a logically
     10separate ingress/egress point of the device. A devlink port can be any one
     11of many flavours. A devlink port flavour along with port attributes
     12describe what a port represents.
     13
     14A device driver that intends to publish a devlink port sets the
     15devlink port attributes and registers the devlink port.
     16
     17Devlink port flavours are described below.
     18
     19.. list-table:: List of devlink port flavours
     20   :widths: 33 90
     21
     22   * - Flavour
     23     - Description
     24   * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL``
     25     - Any kind of physical port. This can be an eswitch physical port or any
     26       other physical port on the device.
     27   * - ``DEVLINK_PORT_FLAVOUR_DSA``
     28     - This indicates a DSA interconnect port.
     29   * - ``DEVLINK_PORT_FLAVOUR_CPU``
     30     - This indicates a CPU port applicable only to DSA.
     31   * - ``DEVLINK_PORT_FLAVOUR_PCI_PF``
     32     - This indicates an eswitch port representing a port of PCI
     33       physical function (PF).
     34   * - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
     35     - This indicates an eswitch port representing a port of PCI
     36       virtual function (VF).
     37   * - ``DEVLINK_PORT_FLAVOUR_PCI_SF``
     38     - This indicates an eswitch port representing a port of PCI
     39       subfunction (SF).
     40   * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
     41     - This indicates a virtual port for the PCI virtual function.
     42
     43Devlink port can have a different type based on the link layer described below.
     44
     45.. list-table:: List of devlink port types
     46   :widths: 23 90
     47
     48   * - Type
     49     - Description
     50   * - ``DEVLINK_PORT_TYPE_ETH``
     51     - Driver should set this port type when a link layer of the port is
     52       Ethernet.
     53   * - ``DEVLINK_PORT_TYPE_IB``
     54     - Driver should set this port type when a link layer of the port is
     55       InfiniBand.
     56   * - ``DEVLINK_PORT_TYPE_AUTO``
     57     - This type is indicated by the user when driver should detect the port
     58       type automatically.
     59
     60PCI controllers
     61---------------
     62In most cases a PCI device has only one controller. A controller consists of
     63potentially multiple physical, virtual functions and subfunctions. A function
     64consists of one or more ports. This port is represented by the devlink eswitch
     65port.
     66
     67A PCI device connected to multiple CPUs or multiple PCI root complexes or a
     68SmartNIC, however, may have multiple controllers. For a device with multiple
     69controllers, each controller is distinguished by a unique controller number.
     70An eswitch is on the PCI device which supports ports of multiple controllers.
     71
     72An example view of a system with two controllers::
     73
     74                 ---------------------------------------------------------
     75                 |                                                       |
     76                 |           --------- ---------         ------- ------- |
     77    -----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
     78    | server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
     79    | pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
     80    | connect |  | -------                       -------                 |
     81    -----------  |     | controller_num=1 (no eswitch)                   |
     82                 ------|--------------------------------------------------
     83                 (internal wire)
     84                       |
     85                 ---------------------------------------------------------
     86                 | devlink eswitch ports and reps                        |
     87                 | ----------------------------------------------------- |
     88                 | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
     89                 | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
     90                 | ----------------------------------------------------- |
     91                 | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
     92                 | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
     93                 | ----------------------------------------------------- |
     94                 |                                                       |
     95                 |                                                       |
     96    -----------  |           --------- ---------         ------- ------- |
     97    | smartNIC|  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
     98    | pci rc  |==| -------   ----/---- ---/----- ------- ---/--- ---/--- |
     99    | connect |  | | pf0 |______/________/       | pf1 |___/_______/     |
    100    -----------  | -------                       -------                 |
    101                 |                                                       |
    102                 |  local controller_num=0 (eswitch)                     |
    103                 ---------------------------------------------------------
    104
    105In the above example, the external controller (identified by controller number = 1)
    106doesn't have the eswitch. Local controller (identified by controller number = 0)
    107has the eswitch. The Devlink instance on the local controller has eswitch
    108devlink ports for both the controllers.
    109
    110Function configuration
    111======================
    112
    113A user can configure the function attribute before enumerating the PCI
    114function. Usually it means, user should configure function attribute
    115before a bus specific device for the function is created. However, when
    116SRIOV is enabled, virtual function devices are created on the PCI bus.
    117Hence, function attribute should be configured before binding virtual
    118function device to the driver. For subfunctions, this means user should
    119configure port function attribute before activating the port function.
    120
    121A user may set the hardware address of the function using
    122'devlink port function set hw_addr' command. For Ethernet port function
    123this means a MAC address.
    124
    125Subfunction
    126============
    127
    128Subfunction is a lightweight function that has a parent PCI function on which
    129it is deployed. Subfunction is created and deployed in unit of 1. Unlike
    130SRIOV VFs, a subfunction doesn't require its own PCI virtual function.
    131A subfunction communicates with the hardware through the parent PCI function.
    132
    133To use a subfunction, 3 steps setup sequence is followed.
    134(1) create - create a subfunction;
    135(2) configure - configure subfunction attributes;
    136(3) deploy - deploy the subfunction;
    137
    138Subfunction management is done using devlink port user interface.
    139User performs setup on the subfunction management device.
    140
    141(1) Create
    142----------
    143A subfunction is created using a devlink port interface. A user adds the
    144subfunction by adding a devlink port of subfunction flavour. The devlink
    145kernel code calls down to subfunction management driver (devlink ops) and asks
    146it to create a subfunction devlink port. Driver then instantiates the
    147subfunction port and any associated objects such as health reporters and
    148representor netdevice.
    149
    150(2) Configure
    151-------------
    152A subfunction devlink port is created but it is not active yet. That means the
    153entities are created on devlink side, the e-switch port representor is created,
    154but the subfunction device itself is not created. A user might use e-switch port
    155representor to do settings, putting it into bridge, adding TC rules, etc. A user
    156might as well configure the hardware address (such as MAC address) of the
    157subfunction while subfunction is inactive.
    158
    159(3) Deploy
    160----------
    161Once a subfunction is configured, user must activate it to use it. Upon
    162activation, subfunction management driver asks the subfunction management
    163device to instantiate the subfunction device on particular PCI function.
    164A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`.
    165At this point a matching subfunction driver binds to the subfunction's auxiliary device.
    166
    167Rate object management
    168======================
    169
    170Devlink provides API to manage tx rates of single devlink port or a group.
    171This is done through rate objects, which can be one of the two types:
    172
    173``leaf``
    174  Represents a single devlink port; created/destroyed by the driver. Since leaf
    175  have 1to1 mapping to its devlink port, in user space it is referred as
    176  ``pci/<bus_addr>/<port_index>``;
    177
    178``node``
    179  Represents a group of rate objects (leafs and/or nodes); created/deleted by
    180  request from the userspace; initially empty (no rate objects added). In
    181  userspace it is referred as ``pci/<bus_addr>/<node_name>``, where
    182  ``node_name`` can be any identifier, except decimal number, to avoid
    183  collisions with leafs.
    184
    185API allows to configure following rate object's parameters:
    186
    187``tx_share``
    188  Minimum TX rate value shared among all other rate objects, or rate objects
    189  that parts of the parent group, if it is a part of the same group.
    190
    191``tx_max``
    192  Maximum TX rate value.
    193
    194``parent``
    195  Parent node name. Parent node rate limits are considered as additional limits
    196  to all node children limits. ``tx_max`` is an upper limit for children.
    197  ``tx_share`` is a total bandwidth distributed among children.
    198
    199Driver implementations are allowed to support both or either rate object types
    200and setting methods of their parameters.
    201
    202Terms and Definitions
    203=====================
    204
    205.. list-table:: Terms and Definitions
    206   :widths: 22 90
    207
    208   * - Term
    209     - Definitions
    210   * - ``PCI device``
    211     - A physical PCI device having one or more PCI buses consists of one or
    212       more PCI controllers.
    213   * - ``PCI controller``
    214     -  A controller consists of potentially multiple physical functions,
    215        virtual functions and subfunctions.
    216   * - ``Port function``
    217     -  An object to manage the function of a port.
    218   * - ``Subfunction``
    219     -  A lightweight function that has parent PCI function on which it is
    220        deployed.
    221   * - ``Subfunction device``
    222     -  A bus device of the subfunction, usually on a auxiliary bus.
    223   * - ``Subfunction driver``
    224     -  A device driver for the subfunction auxiliary device.
    225   * - ``Subfunction management device``
    226     -  A PCI physical function that supports subfunction management.
    227   * - ``Subfunction management driver``
    228     -  A device driver for PCI physical function that supports
    229        subfunction management using devlink port interface.
    230   * - ``Subfunction host driver``
    231     -  A device driver for PCI physical function that hosts subfunction
    232        devices. In most cases it is same as subfunction management driver. When
    233        subfunction is used on external controller, subfunction management and
    234        host drivers are different.