cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

phy.rst (23494B)


      1=====================
      2PHY Abstraction Layer
      3=====================
      4
      5Purpose
      6=======
      7
      8Most network devices consist of set of registers which provide an interface
      9to a MAC layer, which communicates with the physical connection through a
     10PHY.  The PHY concerns itself with negotiating link parameters with the link
     11partner on the other side of the network connection (typically, an ethernet
     12cable), and provides a register interface to allow drivers to determine what
     13settings were chosen, and to configure what settings are allowed.
     14
     15While these devices are distinct from the network devices, and conform to a
     16standard layout for the registers, it has been common practice to integrate
     17the PHY management code with the network driver.  This has resulted in large
     18amounts of redundant code.  Also, on embedded systems with multiple (and
     19sometimes quite different) ethernet controllers connected to the same
     20management bus, it is difficult to ensure safe use of the bus.
     21
     22Since the PHYs are devices, and the management busses through which they are
     23accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
     24In doing so, it has these goals:
     25
     26#. Increase code-reuse
     27#. Increase overall code-maintainability
     28#. Speed development time for new network drivers, and for new systems
     29
     30Basically, this layer is meant to provide an interface to PHY devices which
     31allows network driver writers to write as little code as possible, while
     32still providing a full feature set.
     33
     34The MDIO bus
     35============
     36
     37Most network devices are connected to a PHY by means of a management bus.
     38Different devices use different busses (though some share common interfaces).
     39In order to take advantage of the PAL, each bus interface needs to be
     40registered as a distinct device.
     41
     42#. read and write functions must be implemented. Their prototypes are::
     43
     44	int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
     45	int read(struct mii_bus *bus, int mii_id, int regnum);
     46
     47   mii_id is the address on the bus for the PHY, and regnum is the register
     48   number.  These functions are guaranteed not to be called from interrupt
     49   time, so it is safe for them to block, waiting for an interrupt to signal
     50   the operation is complete
     51
     52#. A reset function is optional. This is used to return the bus to an
     53   initialized state.
     54
     55#. A probe function is needed.  This function should set up anything the bus
     56   driver needs, setup the mii_bus structure, and register with the PAL using
     57   mdiobus_register.  Similarly, there's a remove function to undo all of
     58   that (use mdiobus_unregister).
     59
     60#. Like any driver, the device_driver structure must be configured, and init
     61   exit functions are used to register the driver.
     62
     63#. The bus must also be declared somewhere as a device, and registered.
     64
     65As an example for how one driver implemented an mdio bus driver, see
     66drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
     67for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
     68
     69(RG)MII/electrical interface considerations
     70===========================================
     71
     72The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
     73electrical signal interface using a synchronous 125Mhz clock signal and several
     74data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
     75between the clock line (RXC or TXC) and the data lines to let the PHY (clock
     76sink) have a large enough setup and hold time to sample the data lines correctly. The
     77PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
     78the PHY driver and optionally the MAC driver, implement the required delay. The
     79values of phy_interface_t must be understood from the perspective of the PHY
     80device itself, leading to the following:
     81
     82* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
     83  internal delay by itself, it assumes that either the Ethernet MAC (if capable)
     84  or the PCB traces insert the correct 1.5-2ns delay
     85
     86* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
     87  for the transmit data lines (TXD[3:0]) processed by the PHY device
     88
     89* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
     90  for the receive data lines (RXD[3:0]) processed by the PHY device
     91
     92* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
     93  both transmit AND receive data lines from/to the PHY device
     94
     95Whenever possible, use the PHY side RGMII delay for these reasons:
     96
     97* PHY devices may offer sub-nanosecond granularity in how they allow a
     98  receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
     99  precision may be required to account for differences in PCB trace lengths
    100
    101* PHY devices are typically qualified for a large range of applications
    102  (industrial, medical, automotive...), and they provide a constant and
    103  reliable delay across temperature/pressure/voltage ranges
    104
    105* PHY device drivers in PHYLIB being reusable by nature, being able to
    106  configure correctly a specified delay enables more designs with similar delay
    107  requirements to be operated correctly
    108
    109For cases where the PHY is not capable of providing this delay, but the
    110Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
    111should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
    112configured correctly in order to provide the required transmit and/or receive
    113side delay from the perspective of the PHY device. Conversely, if the Ethernet
    114MAC driver looks at the phy_interface_t value, for any other mode but
    115PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
    116disabled.
    117
    118In case neither the Ethernet MAC, nor the PHY are capable of providing the
    119required delays, as defined per the RGMII standard, several options may be
    120available:
    121
    122* Some SoCs may offer a pin pad/mux/controller capable of configuring a given
    123  set of pins'strength, delays, and voltage; and it may be a suitable
    124  option to insert the expected 2ns RGMII delay.
    125
    126* Modifying the PCB design to include a fixed delay (e.g: using a specifically
    127  designed serpentine), which may not require software configuration at all.
    128
    129Common problems with RGMII delay mismatch
    130-----------------------------------------
    131
    132When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
    133will most likely result in the clock and data line signals to be unstable when
    134the PHY or MAC take a snapshot of these signals to translate them into logical
    1351 or 0 states and reconstruct the data being transmitted/received. Typical
    136symptoms include:
    137
    138* Transmission/reception partially works, and there is frequent or occasional
    139  packet loss observed
    140
    141* Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
    142  or just discard them all
    143
    144* Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
    145  (since there is enough setup/hold time in that case)
    146
    147Connecting to a PHY
    148===================
    149
    150Sometime during startup, the network driver needs to establish a connection
    151between the PHY device, and the network device.  At this time, the PHY's bus
    152and drivers need to all have been loaded, so it is ready for the connection.
    153At this point, there are several ways to connect to the PHY:
    154
    155#. The PAL handles everything, and only calls the network driver when
    156   the link state changes, so it can react.
    157
    158#. The PAL handles everything except interrupts (usually because the
    159   controller has the interrupt registers).
    160
    161#. The PAL handles everything, but checks in with the driver every second,
    162   allowing the network driver to react first to any changes before the PAL
    163   does.
    164
    165#. The PAL serves only as a library of functions, with the network device
    166   manually calling functions to update status, and configure the PHY
    167
    168
    169Letting the PHY Abstraction Layer do Everything
    170===============================================
    171
    172If you choose option 1 (The hope is that every driver can, but to still be
    173useful to drivers that can't), connecting to the PHY is simple:
    174
    175First, you need a function to react to changes in the link state.  This
    176function follows this protocol::
    177
    178	static void adjust_link(struct net_device *dev);
    179
    180Next, you need to know the device name of the PHY connected to this device.
    181The name will look something like, "0:00", where the first number is the
    182bus id, and the second is the PHY's address on that bus.  Typically,
    183the bus is responsible for making its ID unique.
    184
    185Now, to connect, just call this function::
    186
    187	phydev = phy_connect(dev, phy_name, &adjust_link, interface);
    188
    189*phydev* is a pointer to the phy_device structure which represents the PHY.
    190If phy_connect is successful, it will return the pointer.  dev, here, is the
    191pointer to your net_device.  Once done, this function will have started the
    192PHY's software state machine, and registered for the PHY's interrupt, if it
    193has one.  The phydev structure will be populated with information about the
    194current state, though the PHY will not yet be truly operational at this
    195point.
    196
    197PHY-specific flags should be set in phydev->dev_flags prior to the call
    198to phy_connect() such that the underlying PHY driver can check for flags
    199and perform specific operations based on them.
    200This is useful if the system has put hardware restrictions on
    201the PHY/controller, of which the PHY needs to be aware.
    202
    203*interface* is a u32 which specifies the connection type used
    204between the controller and the PHY.  Examples are GMII, MII,
    205RGMII, and SGMII.  See "PHY interface mode" below.  For a full
    206list, see include/linux/phy.h
    207
    208Now just make sure that phydev->supported and phydev->advertising have any
    209values pruned from them which don't make sense for your controller (a 10/100
    210controller may be connected to a gigabit capable PHY, so you would need to
    211mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
    212for these bitfields. Note that you should not SET any bits, except the
    213SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
    214put into an unsupported state.
    215
    216Lastly, once the controller is ready to handle network traffic, you call
    217phy_start(phydev).  This tells the PAL that you are ready, and configures the
    218PHY to connect to the network. If the MAC interrupt of your network driver
    219also handles PHY status changes, just set phydev->irq to PHY_MAC_INTERRUPT
    220before you call phy_start and use phy_mac_interrupt() from the network
    221driver. If you don't want to use interrupts, set phydev->irq to PHY_POLL.
    222phy_start() enables the PHY interrupts (if applicable) and starts the
    223phylib state machine.
    224
    225When you want to disconnect from the network (even if just briefly), you call
    226phy_stop(phydev). This function also stops the phylib state machine and
    227disables PHY interrupts.
    228
    229PHY interface modes
    230===================
    231
    232The PHY interface mode supplied in the phy_connect() family of functions
    233defines the initial operating mode of the PHY interface.  This is not
    234guaranteed to remain constant; there are PHYs which dynamically change
    235their interface mode without software interaction depending on the
    236negotiation results.
    237
    238Some of the interface modes are described below:
    239
    240``PHY_INTERFACE_MODE_SMII``
    241    This is serial MII, clocked at 125MHz, supporting 100M and 10M speeds.
    242    Some details can be found in
    243    https://opencores.org/ocsvn/smii/smii/trunk/doc/SMII.pdf
    244
    245``PHY_INTERFACE_MODE_1000BASEX``
    246    This defines the 1000BASE-X single-lane serdes link as defined by the
    247    802.3 standard section 36.  The link operates at a fixed bit rate of
    248    1.25Gbaud using a 10B/8B encoding scheme, resulting in an underlying
    249    data rate of 1Gbps.  Embedded in the data stream is a 16-bit control
    250    word which is used to negotiate the duplex and pause modes with the
    251    remote end.  This does not include "up-clocked" variants such as 2.5Gbps
    252    speeds (see below.)
    253
    254``PHY_INTERFACE_MODE_2500BASEX``
    255    This defines a variant of 1000BASE-X which is clocked 2.5 times as fast
    256    as the 802.3 standard, giving a fixed bit rate of 3.125Gbaud.
    257
    258``PHY_INTERFACE_MODE_SGMII``
    259    This is used for Cisco SGMII, which is a modification of 1000BASE-X
    260    as defined by the 802.3 standard.  The SGMII link consists of a single
    261    serdes lane running at a fixed bit rate of 1.25Gbaud with 10B/8B
    262    encoding.  The underlying data rate is 1Gbps, with the slower speeds of
    263    100Mbps and 10Mbps being achieved through replication of each data symbol.
    264    The 802.3 control word is re-purposed to send the negotiated speed and
    265    duplex information from to the MAC, and for the MAC to acknowledge
    266    receipt.  This does not include "up-clocked" variants such as 2.5Gbps
    267    speeds.
    268
    269    Note: mismatched SGMII vs 1000BASE-X configuration on a link can
    270    successfully pass data in some circumstances, but the 16-bit control
    271    word will not be correctly interpreted, which may cause mismatches in
    272    duplex, pause or other settings.  This is dependent on the MAC and/or
    273    PHY behaviour.
    274
    275``PHY_INTERFACE_MODE_5GBASER``
    276    This is the IEEE 802.3 Clause 129 defined 5GBASE-R protocol. It is
    277    identical to the 10GBASE-R protocol defined in Clause 49, with the
    278    exception that it operates at half the frequency. Please refer to the
    279    IEEE standard for the definition.
    280
    281``PHY_INTERFACE_MODE_10GBASER``
    282    This is the IEEE 802.3 Clause 49 defined 10GBASE-R protocol used with
    283    various different mediums. Please refer to the IEEE standard for a
    284    definition of this.
    285
    286    Note: 10GBASE-R is just one protocol that can be used with XFI and SFI.
    287    XFI and SFI permit multiple protocols over a single SERDES lane, and
    288    also defines the electrical characteristics of the signals with a host
    289    compliance board plugged into the host XFP/SFP connector. Therefore,
    290    XFI and SFI are not PHY interface types in their own right.
    291
    292``PHY_INTERFACE_MODE_10GKR``
    293    This is the IEEE 802.3 Clause 49 defined 10GBASE-R with Clause 73
    294    autonegotiation. Please refer to the IEEE standard for further
    295    information.
    296
    297    Note: due to legacy usage, some 10GBASE-R usage incorrectly makes
    298    use of this definition.
    299
    300``PHY_INTERFACE_MODE_25GBASER``
    301    This is the IEEE 802.3 PCS Clause 107 defined 25GBASE-R protocol.
    302    The PCS is identical to 10GBASE-R, i.e. 64B/66B encoded
    303    running 2.5 as fast, giving a fixed bit rate of 25.78125 Gbaud.
    304    Please refer to the IEEE standard for further information.
    305
    306``PHY_INTERFACE_MODE_100BASEX``
    307    This defines IEEE 802.3 Clause 24.  The link operates at a fixed data
    308    rate of 125Mpbs using a 4B/5B encoding scheme, resulting in an underlying
    309    data rate of 100Mpbs.
    310
    311Pause frames / flow control
    312===========================
    313
    314The PHY does not participate directly in flow control/pause frames except by
    315making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
    316MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
    317controller supports such a thing. Since flow control/pause frames generation
    318involves the Ethernet MAC driver, it is recommended that this driver takes care
    319of properly indicating advertisement and support for such features by setting
    320the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
    321either before or after phy_connect() and/or as a result of implementing the
    322ethtool::set_pauseparam feature.
    323
    324
    325Keeping Close Tabs on the PAL
    326=============================
    327
    328It is possible that the PAL's built-in state machine needs a little help to
    329keep your network device and the PHY properly in sync.  If so, you can
    330register a helper function when connecting to the PHY, which will be called
    331every second before the state machine reacts to any changes.  To do this, you
    332need to manually call phy_attach() and phy_prepare_link(), and then call
    333phy_start_machine() with the second argument set to point to your special
    334handler.
    335
    336Currently there are no examples of how to use this functionality, and testing
    337on it has been limited because the author does not have any drivers which use
    338it (they all use option 1).  So Caveat Emptor.
    339
    340Doing it all yourself
    341=====================
    342
    343There's a remote chance that the PAL's built-in state machine cannot track
    344the complex interactions between the PHY and your network device.  If this is
    345so, you can simply call phy_attach(), and not call phy_start_machine or
    346phy_prepare_link().  This will mean that phydev->state is entirely yours to
    347handle (phy_start and phy_stop toggle between some of the states, so you
    348might need to avoid them).
    349
    350An effort has been made to make sure that useful functionality can be
    351accessed without the state-machine running, and most of these functions are
    352descended from functions which did not interact with a complex state-machine.
    353However, again, no effort has been made so far to test running without the
    354state machine, so tryer beware.
    355
    356Here is a brief rundown of the functions::
    357
    358 int phy_read(struct phy_device *phydev, u16 regnum);
    359 int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
    360
    361Simple read/write primitives.  They invoke the bus's read/write function
    362pointers.
    363::
    364
    365 void phy_print_status(struct phy_device *phydev);
    366
    367A convenience function to print out the PHY status neatly.
    368::
    369
    370 void phy_request_interrupt(struct phy_device *phydev);
    371
    372Requests the IRQ for the PHY interrupts.
    373::
    374
    375 struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
    376		                phy_interface_t interface);
    377
    378Attaches a network device to a particular PHY, binding the PHY to a generic
    379driver if none was found during bus initialization.
    380::
    381
    382 int phy_start_aneg(struct phy_device *phydev);
    383
    384Using variables inside the phydev structure, either configures advertising
    385and resets autonegotiation, or disables autonegotiation, and configures
    386forced settings.
    387::
    388
    389 static inline int phy_read_status(struct phy_device *phydev);
    390
    391Fills the phydev structure with up-to-date information about the current
    392settings in the PHY.
    393::
    394
    395 int phy_ethtool_ksettings_set(struct phy_device *phydev,
    396                               const struct ethtool_link_ksettings *cmd);
    397
    398Ethtool convenience functions.
    399::
    400
    401 int phy_mii_ioctl(struct phy_device *phydev,
    402                   struct mii_ioctl_data *mii_data, int cmd);
    403
    404The MII ioctl.  Note that this function will completely screw up the state
    405machine if you write registers like BMCR, BMSR, ADVERTISE, etc.  Best to
    406use this only to write registers which are not standard, and don't set off
    407a renegotiation.
    408
    409PHY Device Drivers
    410==================
    411
    412With the PHY Abstraction Layer, adding support for new PHYs is
    413quite easy. In some cases, no work is required at all! However,
    414many PHYs require a little hand-holding to get up-and-running.
    415
    416Generic PHY driver
    417------------------
    418
    419If the desired PHY doesn't have any errata, quirks, or special
    420features you want to support, then it may be best to not add
    421support, and let the PHY Abstraction Layer's Generic PHY Driver
    422do all of the work.
    423
    424Writing a PHY driver
    425--------------------
    426
    427If you do need to write a PHY driver, the first thing to do is
    428make sure it can be matched with an appropriate PHY device.
    429This is done during bus initialization by reading the device's
    430UID (stored in registers 2 and 3), then comparing it to each
    431driver's phy_id field by ANDing it with each driver's
    432phy_id_mask field.  Also, it needs a name.  Here's an example::
    433
    434   static struct phy_driver dm9161_driver = {
    435         .phy_id         = 0x0181b880,
    436	 .name           = "Davicom DM9161E",
    437	 .phy_id_mask    = 0x0ffffff0,
    438	 ...
    439   }
    440
    441Next, you need to specify what features (speed, duplex, autoneg,
    442etc) your PHY device and driver support.  Most PHYs support
    443PHY_BASIC_FEATURES, but you can look in include/mii.h for other
    444features.
    445
    446Each driver consists of a number of function pointers, documented
    447in include/linux/phy.h under the phy_driver structure.
    448
    449Of these, only config_aneg and read_status are required to be
    450assigned by the driver code.  The rest are optional.  Also, it is
    451preferred to use the generic phy driver's versions of these two
    452functions if at all possible: genphy_read_status and
    453genphy_config_aneg.  If this is not possible, it is likely that
    454you only need to perform some actions before and after invoking
    455these functions, and so your functions will wrap the generic
    456ones.
    457
    458Feel free to look at the Marvell, Cicada, and Davicom drivers in
    459drivers/net/phy/ for examples (the lxt and qsemi drivers have
    460not been tested as of this writing).
    461
    462The PHY's MMD register accesses are handled by the PAL framework
    463by default, but can be overridden by a specific PHY driver if
    464required. This could be the case if a PHY was released for
    465manufacturing before the MMD PHY register definitions were
    466standardized by the IEEE. Most modern PHYs will be able to use
    467the generic PAL framework for accessing the PHY's MMD registers.
    468An example of such usage is for Energy Efficient Ethernet support,
    469implemented in the PAL. This support uses the PAL to access MMD
    470registers for EEE query and configuration if the PHY supports
    471the IEEE standard access mechanisms, or can use the PHY's specific
    472access interfaces if overridden by the specific PHY driver. See
    473the Micrel driver in drivers/net/phy/ for an example of how this
    474can be implemented.
    475
    476Board Fixups
    477============
    478
    479Sometimes the specific interaction between the platform and the PHY requires
    480special handling.  For instance, to change where the PHY's clock input is,
    481or to add a delay to account for latency issues in the data path.  In order
    482to support such contingencies, the PHY Layer allows platform code to register
    483fixups to be run when the PHY is brought up (or subsequently reset).
    484
    485When the PHY Layer brings up a PHY it checks to see if there are any fixups
    486registered for it, matching based on UID (contained in the PHY device's phy_id
    487field) and the bus identifier (contained in phydev->dev.bus_id).  Both must
    488match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
    489wildcards for the bus ID and UID, respectively.
    490
    491When a match is found, the PHY layer will invoke the run function associated
    492with the fixup.  This function is passed a pointer to the phy_device of
    493interest.  It should therefore only operate on that PHY.
    494
    495The platform code can either register the fixup using phy_register_fixup()::
    496
    497	int phy_register_fixup(const char *phy_id,
    498		u32 phy_uid, u32 phy_uid_mask,
    499		int (*run)(struct phy_device *));
    500
    501Or using one of the two stubs, phy_register_fixup_for_uid() and
    502phy_register_fixup_for_id()::
    503
    504 int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
    505		int (*run)(struct phy_device *));
    506 int phy_register_fixup_for_id(const char *phy_id,
    507		int (*run)(struct phy_device *));
    508
    509The stubs set one of the two matching criteria, and set the other one to
    510match anything.
    511
    512When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module load
    513time, the module needs to unregister the fixup and free allocated memory when
    514it's unloaded.
    515
    516Call one of following function before unloading module::
    517
    518 int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
    519 int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
    520 int phy_register_fixup_for_id(const char *phy_id);
    521
    522Standards
    523=========
    524
    525IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
    526http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
    527
    528RGMII v1.3:
    529http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
    530
    531RGMII v2.0:
    532http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf