cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

netdevices.rst (9580B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=====================================
      4Network Devices, the Kernel, and You!
      5=====================================
      6
      7
      8Introduction
      9============
     10The following is a random collection of documentation regarding
     11network devices.
     12
     13struct net_device lifetime rules
     14================================
     15Network device structures need to persist even after module is unloaded and
     16must be allocated with alloc_netdev_mqs() and friends.
     17If device has registered successfully, it will be freed on last use
     18by free_netdev(). This is required to handle the pathological case cleanly
     19(example: ``rmmod mydriver </sys/class/net/myeth/mtu``)
     20
     21alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver
     22private data which gets freed when the network device is freed. If
     23separately allocated data is attached to the network device
     24(netdev_priv()) then it is up to the module exit handler to free that.
     25
     26There are two groups of APIs for registering struct net_device.
     27First group can be used in normal contexts where ``rtnl_lock`` is not already
     28held: register_netdev(), unregister_netdev().
     29Second group can be used when ``rtnl_lock`` is already held:
     30register_netdevice(), unregister_netdevice(), free_netdevice().
     31
     32Simple drivers
     33--------------
     34
     35Most drivers (especially device drivers) handle lifetime of struct net_device
     36in context where ``rtnl_lock`` is not held (e.g. driver probe and remove paths).
     37
     38In that case the struct net_device registration is done using
     39the register_netdev(), and unregister_netdev() functions:
     40
     41.. code-block:: c
     42
     43  int probe()
     44  {
     45    struct my_device_priv *priv;
     46    int err;
     47
     48    dev = alloc_netdev_mqs(...);
     49    if (!dev)
     50      return -ENOMEM;
     51    priv = netdev_priv(dev);
     52
     53    /* ... do all device setup before calling register_netdev() ...
     54     */
     55
     56    err = register_netdev(dev);
     57    if (err)
     58      goto err_undo;
     59
     60    /* net_device is visible to the user! */
     61
     62  err_undo:
     63    /* ... undo the device setup ... */
     64    free_netdev(dev);
     65    return err;
     66  }
     67
     68  void remove()
     69  {
     70    unregister_netdev(dev);
     71    free_netdev(dev);
     72  }
     73
     74Note that after calling register_netdev() the device is visible in the system.
     75Users can open it and start sending / receiving traffic immediately,
     76or run any other callback, so all initialization must be done prior to
     77registration.
     78
     79unregister_netdev() closes the device and waits for all users to be done
     80with it. The memory of struct net_device itself may still be referenced
     81by sysfs but all operations on that device will fail.
     82
     83free_netdev() can be called after unregister_netdev() returns on when
     84register_netdev() failed.
     85
     86Device management under RTNL
     87----------------------------
     88
     89Registering struct net_device while in context which already holds
     90the ``rtnl_lock`` requires extra care. In those scenarios most drivers
     91will want to make use of struct net_device's ``needs_free_netdev``
     92and ``priv_destructor`` members for freeing of state.
     93
     94Example flow of netdev handling under ``rtnl_lock``:
     95
     96.. code-block:: c
     97
     98  static void my_setup(struct net_device *dev)
     99  {
    100    dev->needs_free_netdev = true;
    101  }
    102
    103  static void my_destructor(struct net_device *dev)
    104  {
    105    some_obj_destroy(priv->obj);
    106    some_uninit(priv);
    107  }
    108
    109  int create_link()
    110  {
    111    struct my_device_priv *priv;
    112    int err;
    113
    114    ASSERT_RTNL();
    115
    116    dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup);
    117    if (!dev)
    118      return -ENOMEM;
    119    priv = netdev_priv(dev);
    120
    121    /* Implicit constructor */
    122    err = some_init(priv);
    123    if (err)
    124      goto err_free_dev;
    125
    126    priv->obj = some_obj_create();
    127    if (!priv->obj) {
    128      err = -ENOMEM;
    129      goto err_some_uninit;
    130    }
    131    /* End of constructor, set the destructor: */
    132    dev->priv_destructor = my_destructor;
    133
    134    err = register_netdevice(dev);
    135    if (err)
    136      /* register_netdevice() calls destructor on failure */
    137      goto err_free_dev;
    138
    139    /* If anything fails now unregister_netdevice() (or unregister_netdev())
    140     * will take care of calling my_destructor and free_netdev().
    141     */
    142
    143    return 0;
    144
    145  err_some_uninit:
    146    some_uninit(priv);
    147  err_free_dev:
    148    free_netdev(dev);
    149    return err;
    150  }
    151
    152If struct net_device.priv_destructor is set it will be called by the core
    153some time after unregister_netdevice(), it will also be called if
    154register_netdevice() fails. The callback may be invoked with or without
    155``rtnl_lock`` held.
    156
    157There is no explicit constructor callback, driver "constructs" the private
    158netdev state after allocating it and before registration.
    159
    160Setting struct net_device.needs_free_netdev makes core call free_netdevice()
    161automatically after unregister_netdevice() when all references to the device
    162are gone. It only takes effect after a successful call to register_netdevice()
    163so if register_netdevice() fails driver is responsible for calling
    164free_netdev().
    165
    166free_netdev() is safe to call on error paths right after unregister_netdevice()
    167or when register_netdevice() fails. Parts of netdev (de)registration process
    168happen after ``rtnl_lock`` is released, therefore in those cases free_netdev()
    169will defer some of the processing until ``rtnl_lock`` is released.
    170
    171Devices spawned from struct rtnl_link_ops should never free the
    172struct net_device directly.
    173
    174.ndo_init and .ndo_uninit
    175~~~~~~~~~~~~~~~~~~~~~~~~~
    176
    177``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device
    178registration and de-registration, under ``rtnl_lock``. Drivers can use
    179those e.g. when parts of their init process need to run under ``rtnl_lock``.
    180
    181``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit``
    182runs during de-registering after device is closed but other subsystems
    183may still have outstanding references to the netdevice.
    184
    185MTU
    186===
    187Each network device has a Maximum Transfer Unit. The MTU does not
    188include any link layer protocol overhead. Upper layer protocols must
    189not pass a socket buffer (skb) to a device to transmit with more data
    190than the mtu. The MTU does not include link layer header overhead, so
    191for example on Ethernet if the standard MTU is 1500 bytes used, the
    192actual skb will contain up to 1514 bytes because of the Ethernet
    193header. Devices should allow for the 4 byte VLAN header as well.
    194
    195Segmentation Offload (GSO, TSO) is an exception to this rule.  The
    196upper layer protocol may pass a large socket buffer to the device
    197transmit routine, and the device will break that up into separate
    198packets based on the current MTU.
    199
    200MTU is symmetrical and applies both to receive and transmit. A device
    201must be able to receive at least the maximum size packet allowed by
    202the MTU. A network device may use the MTU as mechanism to size receive
    203buffers, but the device should allow packets with VLAN header. With
    204standard Ethernet mtu of 1500 bytes, the device should allow up to
    2051518 byte packets (1500 + 14 header + 4 tag).  The device may either:
    206drop, truncate, or pass up oversize packets, but dropping oversize
    207packets is preferred.
    208
    209
    210struct net_device synchronization rules
    211=======================================
    212ndo_open:
    213	Synchronization: rtnl_lock() semaphore.
    214	Context: process
    215
    216ndo_stop:
    217	Synchronization: rtnl_lock() semaphore.
    218	Context: process
    219	Note: netif_running() is guaranteed false
    220
    221ndo_do_ioctl:
    222	Synchronization: rtnl_lock() semaphore.
    223	Context: process
    224
    225        This is only called by network subsystems internally,
    226        not by user space calling ioctl as it was in before
    227        linux-5.14.
    228
    229ndo_siocbond:
    230        Synchronization: rtnl_lock() semaphore.
    231        Context: process
    232
    233        Used by the bonding driver for the SIOCBOND family of
    234        ioctl commands.
    235
    236ndo_siocwandev:
    237	Synchronization: rtnl_lock() semaphore.
    238	Context: process
    239
    240	Used by the drivers/net/wan framework to handle
    241	the SIOCWANDEV ioctl with the if_settings structure.
    242
    243ndo_siocdevprivate:
    244	Synchronization: rtnl_lock() semaphore.
    245	Context: process
    246
    247	This is used to implement SIOCDEVPRIVATE ioctl helpers.
    248	These should not be added to new drivers, so don't use.
    249
    250ndo_eth_ioctl:
    251	Synchronization: rtnl_lock() semaphore.
    252	Context: process
    253
    254ndo_get_stats:
    255	Synchronization: rtnl_lock() semaphore, dev_base_lock rwlock, or RCU.
    256	Context: atomic (can't sleep under rwlock or RCU)
    257
    258ndo_start_xmit:
    259	Synchronization: __netif_tx_lock spinlock.
    260
    261	When the driver sets NETIF_F_LLTX in dev->features this will be
    262	called without holding netif_tx_lock. In this case the driver
    263	has to lock by itself when needed.
    264	The locking there should also properly protect against
    265	set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated.
    266	Don't use it for new drivers.
    267
    268	Context: Process with BHs disabled or BH (timer),
    269		 will be called with interrupts disabled by netconsole.
    270
    271	Return codes:
    272
    273	* NETDEV_TX_OK everything ok.
    274	* NETDEV_TX_BUSY Cannot transmit packet, try later
    275	  Usually a bug, means queue start/stop flow control is broken in
    276	  the driver. Note: the driver must NOT put the skb in its DMA ring.
    277
    278ndo_tx_timeout:
    279	Synchronization: netif_tx_lock spinlock; all TX queues frozen.
    280	Context: BHs disabled
    281	Notes: netif_queue_stopped() is guaranteed true
    282
    283ndo_set_rx_mode:
    284	Synchronization: netif_addr_lock spinlock.
    285	Context: BHs disabled
    286
    287struct napi_struct synchronization rules
    288========================================
    289napi->poll:
    290	Synchronization:
    291		NAPI_STATE_SCHED bit in napi->state.  Device
    292		driver's ndo_stop method will invoke napi_disable() on
    293		all NAPI instances which will do a sleeping poll on the
    294		NAPI_STATE_SCHED napi->state bit, waiting for all pending
    295		NAPI activity to cease.
    296
    297	Context:
    298		 softirq
    299		 will be called with interrupts disabled by netconsole.