cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

sja1105.rst (21391B)


      1=========================
      2NXP SJA1105 switch driver
      3=========================
      4
      5Overview
      6========
      7
      8The NXP SJA1105 is a family of 10 SPI-managed automotive switches:
      9
     10- SJA1105E: First generation, no TTEthernet
     11- SJA1105T: First generation, TTEthernet
     12- SJA1105P: Second generation, no TTEthernet, no SGMII
     13- SJA1105Q: Second generation, TTEthernet, no SGMII
     14- SJA1105R: Second generation, no TTEthernet, SGMII
     15- SJA1105S: Second generation, TTEthernet, SGMII
     16- SJA1110A: Third generation, TTEthernet, SGMII, integrated 100base-T1 and
     17  100base-TX PHYs
     18- SJA1110B: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX
     19- SJA1110C: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX
     20- SJA1110D: Third generation, TTEthernet, SGMII, 100base-T1
     21
     22Being automotive parts, their configuration interface is geared towards
     23set-and-forget use, with minimal dynamic interaction at runtime. They
     24require a static configuration to be composed by software and packed
     25with CRC and table headers, and sent over SPI.
     26
     27The static configuration is composed of several configuration tables. Each
     28table takes a number of entries. Some configuration tables can be (partially)
     29reconfigured at runtime, some not. Some tables are mandatory, some not:
     30
     31============================= ================== =============================
     32Table                          Mandatory          Reconfigurable
     33============================= ================== =============================
     34Schedule                       no                 no
     35Schedule entry points          if Scheduling      no
     36VL Lookup                      no                 no
     37VL Policing                    if VL Lookup       no
     38VL Forwarding                  if VL Lookup       no
     39L2 Lookup                      no                 no
     40L2 Policing                    yes                no
     41VLAN Lookup                    yes                yes
     42L2 Forwarding                  yes                partially (fully on P/Q/R/S)
     43MAC Config                     yes                partially (fully on P/Q/R/S)
     44Schedule Params                if Scheduling      no
     45Schedule Entry Points Params   if Scheduling      no
     46VL Forwarding Params           if VL Forwarding   no
     47L2 Lookup Params               no                 partially (fully on P/Q/R/S)
     48L2 Forwarding Params           yes                no
     49Clock Sync Params              no                 no
     50AVB Params                     no                 no
     51General Params                 yes                partially
     52Retagging                      no                 yes
     53xMII Params                    yes                no
     54SGMII                          no                 yes
     55============================= ================== =============================
     56
     57
     58Also the configuration is write-only (software cannot read it back from the
     59switch except for very few exceptions).
     60
     61The driver creates a static configuration at probe time, and keeps it at
     62all times in memory, as a shadow for the hardware state. When required to
     63change a hardware setting, the static configuration is also updated.
     64If that changed setting can be transmitted to the switch through the dynamic
     65reconfiguration interface, it is; otherwise the switch is reset and
     66reprogrammed with the updated static configuration.
     67
     68Switching features
     69==================
     70
     71The driver supports the configuration of L2 forwarding rules in hardware for
     72port bridging. The forwarding, broadcast and flooding domain between ports can
     73be restricted through two methods: either at the L2 forwarding level (isolate
     74one bridge's ports from another's) or at the VLAN port membership level
     75(isolate ports within the same bridge). The final forwarding decision taken by
     76the hardware is a logical AND of these two sets of rules.
     77
     78The hardware tags all traffic internally with a port-based VLAN (pvid), or it
     79decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification
     80is not possible. Once attributed a VLAN tag, frames are checked against the
     81port's membership rules and dropped at ingress if they don't match any VLAN.
     82This behavior is available when switch ports are enslaved to a bridge with
     83``vlan_filtering 1``.
     84
     85Normally the hardware is not configurable with respect to VLAN awareness, but
     86by changing what TPID the switch searches 802.1Q tags for, the semantics of a
     87bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or
     88untagged), and therefore this mode is also supported.
     89
     90Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
     91all bridges should have the same level of VLAN awareness (either both have
     92``vlan_filtering`` 0, or both 1).
     93
     94Topology and loop detection through STP is supported.
     95
     96Offloads
     97========
     98
     99Time-aware scheduling
    100---------------------
    101
    102The switch supports a variation of the enhancements for scheduled traffic
    103specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
    104ensure deterministic latency for priority traffic that is sent in-band with its
    105gate-open event in the network schedule.
    106
    107This capability can be managed through the tc-taprio offload ('flags 2'). The
    108difference compared to the software implementation of taprio is that the latter
    109would only be able to shape traffic originated from the CPU, but not
    110autonomously forwarded flows.
    111
    112The device has 8 traffic classes, and maps incoming frames to one of them based
    113on the VLAN PCP bits (if no VLAN is present, the port-based default is used).
    114As described in the previous sections, depending on the value of
    115``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
    116either be the typical 0x8100 or a custom value used internally by the driver
    117for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
    118or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
    119EtherType. In these modes, injecting into a particular TX queue can only be
    120done by the DSA net devices, which populate the PCP field of the tagging header
    121on egress. Using ``vlan_filtering=1``, the behavior is the other way around:
    122offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
    123net devices are no longer able to do that. To inject frames into a hardware TX
    124queue with VLAN awareness active, it is necessary to create a VLAN
    125sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged
    126towards the switch, with the VLAN PCP bits set appropriately.
    127
    128Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
    129notable exception: the switch always treats it with a fixed priority and
    130disregards any VLAN PCP bits even if present. The traffic class for management
    131traffic has a value of 7 (highest priority) at the moment, which is not
    132configurable in the driver.
    133
    134Below is an example of configuring a 500 us cyclic schedule on egress port
    135``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
    136and the gates for all other traffic classes are open for 400 us::
    137
    138  #!/bin/bash
    139
    140  set -e -u -o pipefail
    141
    142  NSEC_PER_SEC="1000000000"
    143
    144  gatemask() {
    145          local tc_list="$1"
    146          local mask=0
    147
    148          for tc in ${tc_list}; do
    149                  mask=$((${mask} | (1 << ${tc})))
    150          done
    151
    152          printf "%02x" ${mask}
    153  }
    154
    155  if ! systemctl is-active --quiet ptp4l; then
    156          echo "Please start the ptp4l service"
    157          exit
    158  fi
    159
    160  now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
    161  # Phase-align the base time to the start of the next second.
    162  sec=$(echo "${now}" | gawk -F. '{ print $1; }')
    163  base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
    164
    165  tc qdisc add dev swp5 parent root handle 100 taprio \
    166          num_tc 8 \
    167          map 0 1 2 3 5 6 7 \
    168          queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
    169          base-time ${base_time} \
    170          sched-entry S $(gatemask 7) 100000 \
    171          sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
    172          flags 2
    173
    174It is possible to apply the tc-taprio offload on multiple egress ports. There
    175are hardware restrictions related to the fact that no gate event may trigger
    176simultaneously on two ports. The driver checks the consistency of the schedules
    177against this restriction and errors out when appropriate. Schedule analysis is
    178needed to avoid this, which is outside the scope of the document.
    179
    180Routing actions (redirect, trap, drop)
    181--------------------------------------
    182
    183The switch is able to offload flow-based redirection of packets to a set of
    184destination ports specified by the user. Internally, this is implemented by
    185making use of Virtual Links, a TTEthernet concept.
    186
    187The driver supports 2 types of keys for Virtual Links:
    188
    189- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and
    190  VLAN PCP.
    191- VLAN-unaware virtual links: these match on destination MAC address only.
    192
    193The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while
    194there are virtual link rules installed.
    195
    196Composing multiple actions inside the same rule is supported. When only routing
    197actions are requested, the driver creates a "non-critical" virtual link. When
    198the action list also contains tc-gate (more details below), the virtual link
    199becomes "time-critical" (draws frame buffers from a reserved memory partition,
    200etc).
    201
    202The 3 routing actions that are supported are "trap", "drop" and "redirect".
    203
    204Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
    205CPU and to swp3. This type of key (DA only) when the port's VLAN awareness
    206state is off::
    207
    208  tc qdisc add dev swp2 clsact
    209  tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
    210          action mirred egress redirect dev swp3 \
    211          action trap
    212
    213Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
    214of 100 and a PCP of 0::
    215
    216  tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
    217          dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
    218
    219Time-based ingress policing
    220---------------------------
    221
    222The TTEthernet hardware abilities of the switch can be constrained to act
    223similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in
    224IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform
    225tight timing-based admission control for up to 1024 flows (identified by a
    226tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which
    227are received outside their expected reception window are dropped.
    228
    229This capability can be managed through the offload of the tc-gate action. As
    230routing actions are intrinsic to virtual links in TTEthernet (which performs
    231explicit routing of time-critical traffic and does not leave that in the hands
    232of the FDB, flooding etc), the tc-gate action may never appear alone when
    233asking sja1105 to offload it. One (or more) redirect or trap actions must also
    234follow along.
    235
    236Example: create a tc-taprio schedule that is phase-aligned with a tc-gate
    237schedule (the clocks must be synchronized by a 1588 application stack, which is
    238outside the scope of this document). No packet delivered by the sender will be
    239dropped. Note that the reception window is larger than the transmission window
    240(and much more so, in this example) to compensate for the packet propagation
    241delay of the link (which can be determined by the 1588 application stack).
    242
    243Receiver (sja1105)::
    244
    245  tc qdisc add dev swp2 clsact
    246  now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
    247          sec=$(echo $now | awk -F. '{print $1}') && \
    248          base_time="$(((sec + 2) * 1000000000))" && \
    249          echo "base time ${base_time}"
    250  tc filter add dev swp2 ingress flower skip_sw \
    251          dst_mac 42:be:24:9b:76:20 \
    252          action gate base-time ${base_time} \
    253          sched-entry OPEN  60000 -1 -1 \
    254          sched-entry CLOSE 40000 -1 -1 \
    255          action trap
    256
    257Sender::
    258
    259  now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
    260          sec=$(echo $now | awk -F. '{print $1}') && \
    261          base_time="$(((sec + 2) * 1000000000))" && \
    262          echo "base time ${base_time}"
    263  tc qdisc add dev eno0 parent root taprio \
    264          num_tc 8 \
    265          map 0 1 2 3 4 5 6 7 \
    266          queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
    267          base-time ${base_time} \
    268          sched-entry S 01  50000 \
    269          sched-entry S 00  50000 \
    270          flags 2
    271
    272The engine used to schedule the ingress gate operations is the same that the
    273one used for the tc-taprio offload. Therefore, the restrictions regarding the
    274fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at
    275the same time (during the same 200 ns slot) still apply.
    276
    277To come in handy, it is possible to share time-triggered virtual links across
    278more than 1 ingress port, via flow blocks. In this case, the restriction of
    279firing at the same time does not apply because there is a single schedule in
    280the system, that of the shared virtual link::
    281
    282  tc qdisc add dev swp2 ingress_block 1 clsact
    283  tc qdisc add dev swp3 ingress_block 1 clsact
    284  tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \
    285          action gate index 2 \
    286          base-time 0 \
    287          sched-entry OPEN 50000000 -1 -1 \
    288          sched-entry CLOSE 50000000 -1 -1 \
    289          action trap
    290
    291Hardware statistics for each flow are also available ("pkts" counts the number
    292of dropped frames, which is a sum of frames dropped due to timing violations,
    293lack of destination ports and MTU enforcement checks). Byte-level counters are
    294not available.
    295
    296Limitations
    297===========
    298
    299The SJA1105 switch family always performs VLAN processing. When configured as
    300VLAN-unaware, frames carry a different VLAN tag internally, depending on
    301whether the port is standalone or under a VLAN-unaware bridge.
    302
    303The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but the
    304driver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-aware
    305bridge. Otherwise, it fills in the VLAN ID and PCP automatically, based on
    306whether the port is standalone or in a VLAN-unaware bridge, and accepts only
    307"VLAN-unaware" tc-flower keys (MAC DA).
    308
    309The existing tc-flower keys that are offloaded using virtual links are no
    310longer operational after one of the following happens:
    311
    312- port was standalone and joins a bridge (VLAN-aware or VLAN-unaware)
    313- port is part of a bridge whose VLAN awareness state changes
    314- port was part of a bridge and becomes standalone
    315- port was standalone, but another port joins a VLAN-aware bridge and this
    316  changes the global VLAN awareness state of the bridge
    317
    318The driver cannot veto all these operations, and it cannot update/remove the
    319existing tc-flower filters either. So for proper operation, the tc-flower
    320filters should be installed only after the forwarding configuration of the port
    321has been made, and removed by user space before making any changes to it.
    322
    323Device Tree bindings and board design
    324=====================================
    325
    326This section references ``Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml``
    327and aims to showcase some potential switch caveats.
    328
    329RMII PHY role and out-of-band signaling
    330---------------------------------------
    331
    332In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by
    333an external oscillator (but not by the PHY).
    334But the spec is rather loose and devices go outside it in several ways.
    335Some PHYs go against the spec and may provide an output pin where they source
    336the 50 MHz clock themselves, in an attempt to be helpful.
    337On the other hand, the SJA1105 is only binary configurable - when in the RMII
    338MAC role it will also attempt to drive the clock signal. To prevent this from
    339happening it must be put in RMII PHY role.
    340But doing so has some unintended consequences.
    341In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].
    342These are practically some extra code words (/J/ and /K/) sent prior to the
    343preamble of each frame. The MAC does not have this out-of-band signaling
    344mechanism defined by the RMII spec.
    345So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the
    346clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105
    347emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to
    348frame preambles, which the real PHY is not expected to understand. So the PHY
    349simply encodes the extra symbols received from the SJA1105-as-PHY onto the
    350100Base-Tx wire.
    351On the other side of the wire, some link partners might discard these extra
    352symbols, while others might choke on them and discard the entire Ethernet
    353frames that follow along. This looks like packet loss with some link partners
    354but not with others.
    355The take-away is that in RMII mode, the SJA1105 must be let to drive the
    356reference clock if connected to a PHY.
    357
    358RGMII fixed-link and internal delays
    359------------------------------------
    360
    361As mentioned in the bindings document, the second generation of devices has
    362tunable delay lines as part of the MAC, which can be used to establish the
    363correct RGMII timing budget.
    364When powered up, these can shift the Rx and Tx clocks with a phase difference
    365between 73.8 and 101.7 degrees.
    366The catch is that the delay lines need to lock onto a clock signal with a
    367stable frequency. This means that there must be at least 2 microseconds of
    368silence between the clock at the old vs at the new frequency. Otherwise the
    369lock is lost and the delay lines must be reset (powered down and back up).
    370In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25
    371MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the
    372AN process.
    373In the situation where the switch port is connected through an RGMII fixed-link
    374to a link partner whose link state life cycle is outside the control of Linux
    375(such as a different SoC), then the delay lines would remain unlocked (and
    376inactive) until there is manual intervention (ifdown/ifup on the switch port).
    377The take-away is that in RGMII mode, the switch's internal delays are only
    378reliable if the link partner never changes link speeds, or if it does, it does
    379so in a way that is coordinated with the switch port (practically, both ends of
    380the fixed-link are under control of the same Linux system).
    381As to why would a fixed-link interface ever change link speeds: there are
    382Ethernet controllers out there which come out of reset in 100 Mbps mode, and
    383their driver inevitably needs to change the speed and clock frequency if it's
    384required to work at gigabit.
    385
    386MDIO bus and PHY management
    387---------------------------
    388
    389The SJA1105 does not have an MDIO bus and does not perform in-band AN either.
    390Therefore there is no link state notification coming from the switch device.
    391A board would need to hook up the PHYs connected to the switch to any other
    392MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO
    393bus). Link state management then works by the driver manually keeping in sync
    394(over SPI commands) the MAC link speed with the settings negotiated by the PHY.
    395
    396By comparison, the SJA1110 supports an MDIO slave access point over which its
    397internal 100base-T1 PHYs can be accessed from the host. This is, however, not
    398used by the driver, instead the internal 100base-T1 and 100base-TX PHYs are
    399accessed through SPI commands, modeled in Linux as virtual MDIO buses.
    400
    401The microcontroller attached to the SJA1110 port 0 also has an MDIO controller
    402operating in master mode, however the driver does not support this either,
    403since the microcontroller gets disabled when the Linux driver operates.
    404Discrete PHYs connected to the switch ports should have their MDIO interface
    405attached to an MDIO controller from the host system and not to the switch,
    406similar to SJA1105.
    407
    408Port compatibility matrix
    409-------------------------
    410
    411The SJA1105 port compatibility matrix is:
    412
    413===== ============== ============== ==============
    414Port   SJA1105E/T     SJA1105P/Q     SJA1105R/S
    415===== ============== ============== ==============
    4160      xMII           xMII           xMII
    4171      xMII           xMII           xMII
    4182      xMII           xMII           xMII
    4193      xMII           xMII           xMII
    4204      xMII           xMII           SGMII
    421===== ============== ============== ==============
    422
    423
    424The SJA1110 port compatibility matrix is:
    425
    426===== ============== ============== ============== ==============
    427Port   SJA1110A       SJA1110B       SJA1110C       SJA1110D
    428===== ============== ============== ============== ==============
    4290      RevMII (uC)    RevMII (uC)    RevMII (uC)    RevMII (uC)
    4301      100base-TX     100base-TX     100base-TX
    431       or SGMII                                     SGMII
    4322      xMII           xMII           xMII           xMII
    433       or SGMII                                     or SGMII
    4343      xMII           xMII           xMII
    435       or SGMII       or SGMII                      SGMII
    436       or 2500base-X  or 2500base-X                 or 2500base-X
    4374      SGMII          SGMII          SGMII          SGMII
    438       or 2500base-X  or 2500base-X  or 2500base-X  or 2500base-X
    4395      100base-T1     100base-T1     100base-T1     100base-T1
    4406      100base-T1     100base-T1     100base-T1     100base-T1
    4417      100base-T1     100base-T1     100base-T1     100base-T1
    4428      100base-T1     100base-T1     n/a            n/a
    4439      100base-T1     100base-T1     n/a            n/a
    44410     100base-T1     n/a            n/a            n/a
    445===== ============== ============== ============== ==============