cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

spider_net.rst (10026B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3===========================
      4The Spidernet Device Driver
      5===========================
      6
      7Written by Linas Vepstas <linas@austin.ibm.com>
      8
      9Version of 7 June 2007
     10
     11Abstract
     12========
     13This document sketches the structure of portions of the spidernet
     14device driver in the Linux kernel tree. The spidernet is a gigabit
     15ethernet device built into the Toshiba southbridge commonly used
     16in the SONY Playstation 3 and the IBM QS20 Cell blade.
     17
     18The Structure of the RX Ring.
     19=============================
     20The receive (RX) ring is a circular linked list of RX descriptors,
     21together with three pointers into the ring that are used to manage its
     22contents.
     23
     24The elements of the ring are called "descriptors" or "descrs"; they
     25describe the received data. This includes a pointer to a buffer
     26containing the received data, the buffer size, and various status bits.
     27
     28There are three primary states that a descriptor can be in: "empty",
     29"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
     30to receive data from the hardware. A "full" descriptor has data in it,
     31and is waiting to be emptied and processed by the OS. A "not-in-use"
     32descriptor is neither empty or full; it is simply not ready. It may
     33not even have a data buffer in it, or is otherwise unusable.
     34
     35During normal operation, on device startup, the OS (specifically, the
     36spidernet device driver) allocates a set of RX descriptors and RX
     37buffers. These are all marked "empty", ready to receive data. This
     38ring is handed off to the hardware, which sequentially fills in the
     39buffers, and marks them "full". The OS follows up, taking the full
     40buffers, processing them, and re-marking them empty.
     41
     42This filling and emptying is managed by three pointers, the "head"
     43and "tail" pointers, managed by the OS, and a hardware current
     44descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
     45currently being filled. When this descr is filled, the hardware
     46marks it full, and advances the GDACTDPA by one.  Thus, when there is
     47flowing RX traffic, every descr behind it should be marked "full",
     48and everything in front of it should be "empty".  If the hardware
     49discovers that the current descr is not empty, it will signal an
     50interrupt, and halt processing.
     51
     52The tail pointer tails or trails the hardware pointer. When the
     53hardware is ahead, the tail pointer will be pointing at a "full"
     54descr. The OS will process this descr, and then mark it "not-in-use",
     55and advance the tail pointer.  Thus, when there is flowing RX traffic,
     56all of the descrs in front of the tail pointer should be "full", and
     57all of those behind it should be "not-in-use". When RX traffic is not
     58flowing, then the tail pointer can catch up to the hardware pointer.
     59The OS will then note that the current tail is "empty", and halt
     60processing.
     61
     62The head pointer (somewhat mis-named) follows after the tail pointer.
     63When traffic is flowing, then the head pointer will be pointing at
     64a "not-in-use" descr. The OS will perform various housekeeping duties
     65on this descr. This includes allocating a new data buffer and
     66dma-mapping it so as to make it visible to the hardware. The OS will
     67then mark the descr as "empty", ready to receive data. Thus, when there
     68is flowing RX traffic, everything in front of the head pointer should
     69be "not-in-use", and everything behind it should be "empty". If no
     70RX traffic is flowing, then the head pointer can catch up to the tail
     71pointer, at which point the OS will notice that the head descr is
     72"empty", and it will halt processing.
     73
     74Thus, in an idle system, the GDACTDPA, tail and head pointers will
     75all be pointing at the same descr, which should be "empty". All of the
     76other descrs in the ring should be "empty" as well.
     77
     78The show_rx_chain() routine will print out the locations of the
     79GDACTDPA, tail and head pointers. It will also summarize the contents
     80of the ring, starting at the tail pointer, and listing the status
     81of the descrs that follow.
     82
     83A typical example of the output, for a nearly idle system, might be::
     84
     85    net eth1: Total number of descrs=256
     86    net eth1: Chain tail located at descr=20
     87    net eth1: Chain head is at 20
     88    net eth1: HW curr desc (GDACTDPA) is at 21
     89    net eth1: Have 1 descrs with stat=x40800101
     90    net eth1: HW next desc (GDACNEXTDA) is at 22
     91    net eth1: Last 255 descrs with stat=xa0800000
     92
     93In the above, the hardware has filled in one descr, number 20. Both
     94head and tail are pointing at 20, because it has not yet been emptied.
     95Meanwhile, hw is pointing at 21, which is free.
     96
     97The "Have nnn decrs" refers to the descr starting at the tail: in this
     98case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
     99to all of the rest of the descrs, from the last status change. The "nnn"
    100is a count of how many descrs have exactly the same status.
    101
    102The status x4... corresponds to "full" and status xa... corresponds
    103to "empty". The actual value printed is RXCOMST_A.
    104
    105In the device driver source code, a different set of names are
    106used for these same concepts, so that::
    107
    108    "empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
    109    "full"  == SPIDER_NET_DESCR_FRAME_END == 0x4
    110    "not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
    111
    112
    113The RX RAM full bug/feature
    114===========================
    115
    116As long as the OS can empty out the RX buffers at a rate faster than
    117the hardware can fill them, there is no problem. If, for some reason,
    118the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
    119pointer will catch up to the head, notice the not-empty condition,
    120ad stop. However, RX packets may still continue arriving on the wire.
    121The spidernet chip can save some limited number of these in local RAM.
    122When this local ram fills up, the spider chip will issue an interrupt
    123indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
    124will be set in GHIINT1STS).  When the RX ram full condition occurs,
    125a certain bug/feature is triggered that has to be specially handled.
    126This section describes the special handling for this condition.
    127
    128When the OS finally has a chance to run, it will empty out the RX ring.
    129In particular, it will clear the descriptor on which the hardware had
    130stopped. However, once the hardware has decided that a certain
    131descriptor is invalid, it will not restart at that descriptor; instead
    132it will restart at the next descr. This potentially will lead to a
    133deadlock condition, as the tail pointer will be pointing at this descr,
    134which, from the OS point of view, is empty; the OS will be waiting for
    135this descr to be filled. However, the hardware has skipped this descr,
    136and is filling the next descrs. Since the OS doesn't see this, there
    137is a potential deadlock, with the OS waiting for one descr to fill,
    138while the hardware is waiting for a different set of descrs to become
    139empty.
    140
    141A call to show_rx_chain() at this point indicates the nature of the
    142problem. A typical print when the network is hung shows the following::
    143
    144    net eth1: Spider RX RAM full, incoming packets might be discarded!
    145    net eth1: Total number of descrs=256
    146    net eth1: Chain tail located at descr=255
    147    net eth1: Chain head is at 255
    148    net eth1: HW curr desc (GDACTDPA) is at 0
    149    net eth1: Have 1 descrs with stat=xa0800000
    150    net eth1: HW next desc (GDACNEXTDA) is at 1
    151    net eth1: Have 127 descrs with stat=x40800101
    152    net eth1: Have 1 descrs with stat=x40800001
    153    net eth1: Have 126 descrs with stat=x40800101
    154    net eth1: Last 1 descrs with stat=xa0800000
    155
    156Both the tail and head pointers are pointing at descr 255, which is
    157marked xa... which is "empty". Thus, from the OS point of view, there
    158is nothing to be done. In particular, there is the implicit assumption
    159that everything in front of the "empty" descr must surely also be empty,
    160as explained in the last section. The OS is waiting for descr 255 to
    161become non-empty, which, in this case, will never happen.
    162
    163The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
    164Since its already full, the hardware can do nothing more, and thus has
    165halted processing. Notice that descrs 0 through 254 are all marked
    166"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
    167descr 254, since tail was at 255.) Thus, the system is deadlocked,
    168and there can be no forward progress; the OS thinks there's nothing
    169to do, and the hardware has nowhere to put incoming data.
    170
    171This bug/feature is worked around with the spider_net_resync_head_ptr()
    172routine. When the driver receives RX interrupts, but an examination
    173of the RX chain seems to show it is empty, then it is probable that
    174the hardware has skipped a descr or two (sometimes dozens under heavy
    175network conditions). The spider_net_resync_head_ptr() subroutine will
    176search the ring for the next full descr, and the driver will resume
    177operations there.  Since this will leave "holes" in the ring, there
    178is also a spider_net_resync_tail_ptr() that will skip over such holes.
    179
    180As of this writing, the spider_net_resync() strategy seems to work very
    181well, even under heavy network loads.
    182
    183
    184The TX ring
    185===========
    186The TX ring uses a low-watermark interrupt scheme to make sure that
    187the TX queue is appropriately serviced for large packet sizes.
    188
    189For packet sizes greater than about 1KBytes, the kernel can fill
    190the TX ring quicker than the device can drain it. Once the ring
    191is full, the netdev is stopped. When there is room in the ring,
    192the netdev needs to be reawakened, so that more TX packets are placed
    193in the ring. The hardware can empty the ring about four times per jiffy,
    194so its not appropriate to wait for the poll routine to refill, since
    195the poll routine runs only once per jiffy.  The low-watermark mechanism
    196marks a descr about 1/4th of the way from the bottom of the queue, so
    197that an interrupt is generated when the descr is processed. This
    198interrupt wakes up the netdev, which can then refill the queue.
    199For large packets, this mechanism generates a relatively small number
    200of interrupts, about 1K/sec. For smaller packets, this will drop to zero
    201interrupts, as the hardware can empty the queue faster than the kernel
    202can fill it.