rocker.txt (36775B)
1Rocker Network Switch Register Programming Guide 2Copyright (c) Scott Feldman <sfeldma@gmail.com> 3Copyright (c) Neil Horman <nhorman@tuxdriver.com> 4Version 0.11, 12/29/2014 5 6LICENSE 7======= 8 9This program is free software; you can redistribute it and/or modify 10it under the terms of the GNU General Public License as published by 11the Free Software Foundation; either version 2 of the License, or 12(at your option) any later version. 13 14This program is distributed in the hope that it will be useful, 15but WITHOUT ANY WARRANTY; without even the implied warranty of 16MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17GNU General Public License for more details. 18 19SECTION 1: Introduction 20======================= 21 22Overview 23-------- 24 25This document describes the hardware/software interface for the Rocker switch 26device. The intended audience is authors of OS drivers and device emulation 27software. 28 29Notations and Conventions 30------------------------- 31 32o In register descriptions, [n:m] indicates a range from bit n to bit m, 33inclusive. 34o Use of leading 0x indicates a hexadecimal number. 35o Use of leading 0b indicates a binary number. 36o The use of RSVD or Reserved indicates that a bit or field is reserved for 37future use. 38o Field width is in bytes, unless otherwise noted. 39o Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear 40on read 41o TLV values in network-byte-order are designated with (N). 42 43 44SECTION 2: PCI Configuration Registers 45====================================== 46 47PCI Configuration Space 48----------------------- 49 50Each switch instance registers as a PCI device with PCI configuration space: 51 52 offset width description value 53 --------------------------------------------- 54 0x0 2 Vendor ID 0x1b36 55 0x2 2 Device ID 0x0006 56 0x4 4 Command/Status 57 0x8 1 Revision ID 0x01 58 0x9 3 Class code 0x2800 59 0xC 1 Cache line size 60 0xD 1 Latency timer 61 0xE 1 Header type 62 0xF 1 Built-in self test 63 0x10 4 Base address low 64 0x14 4 Base address high 65 0x18-28 Reserved 66 0x2C 2 Subsystem vendor ID * 67 0x2E 2 Subsystem ID * 68 0x30-38 Reserved 69 0x3C 1 Interrupt line 70 0x3D 1 Interrupt pin 0x00 71 0x3E 1 Min grant 0x00 72 0x3D 1 Max latency 0x00 73 0x40 1 TRDY timeout 74 0x41 1 Retry count 75 0x42 2 Reserved 76 77 78* Assigned by sub-system implementation 79 80SECTION 3: Memory-Mapped Register Space 81======================================= 82 83There are two memory-mapped BARs. BAR0 maps device register space and is 840x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in 85size, allowing for 256 MSI-X vectors. 86 87All registers are 4 or 8 bytes long. It is assumed host software will access 4 88byte registers with one 4-byte access, and 8 byte registers with either two 894-byte accesses or a single 8-byte access. In the case of two 4-byte accesses, 90access must be lower and then upper 4-bytes, in that order. 91 92BAR0 device register space is organized as follows: 93 94 offset description 95 ------------------------------------------------------ 96 0x0000-0x000f Bogus registers to catch misbehaving 97 drivers. Writes do nothing. Reads 98 back as 0xDEADBABE. 99 0x0010-0x00ff Test registers 100 0x0300-0x03ff General purpose registers 101 0x1000-0x1fff Descriptor control 102 103Holes in register space are reserved. Writes to reserved registers do nothing. 104Reads to reserved registers read back as 0. 105 106No fancy stuff like write-combining is enabled on any of the registers. 107 108BAR1 MSI-X register space is organized as follows: 109 110 offset description 111 ------------------------------------------------------ 112 0x0000-0x0fff MSI-X vector table (256 vectors total) 113 0x1000-0x1fff MSI-X PBA table 114 115 116SECTION 4: Interrupts, DMA, and Endianness 117========================================== 118 119PCI Interrupts 120-------------- 121 122The device supports only MSI-X interrupts. BAR1 memory-mapped region contains 123the MSI-X vector and PBA tables, with support for up to 256 MSI-X vectors. 124 125The vector assignment is: 126 127 vector description 128 ----------------------------------------------------- 129 0 Command descriptor ring completion 130 1 Event descriptor ring completion 131 2 Test operation completion 132 3 RSVD 133 4-255 Tx and Rx descriptor ring completion 134 Tx vector is even 135 Rx vector is odd 136 137A MSI-X vector table entry is 16 bytes: 138 139 field offset width description 140 ------------------------------------------------------------- 141 lower_addr 0x0 4 [31:2] message address[31:2] 142 [1:0] Rsvd (4 byte alignment 143 required) 144 upper_addr 0x4 4 [31:19] Rsvd 145 [14:0] message address[46:32] 146 data 0x8 4 message data[31:0] 147 control 0xc 4 [31:1] Rsvd 148 [0] mask (0 = enable, 149 1 = masked) 150 151Software should install the Interrupt Service Routine (ISR) before any ports 152are enabled or any commands are issued on the command ring. 153 154DMA Operations 155-------------- 156 157DMA operations are used for packet DMA to/from the CPU, command and event 158processing. Command processing includes statistical counters and table dumps, 159table insertion/deletion, and more. Event processing provides an async 160notification method for device-originating events. Each DMA operation has a 161set of control registers to manage a descriptor ring. The descriptor rings are 162allocated from contiguous host DMA-able memory and registers specify the rings 163base address, size and current head and tail indices. Software always writes 164the head, and hardware always writes the tail. 165 166The higher-order bit of DMA_DESC_COMP_ERR is used to mark hardware completion 167of a descriptor. Software will clear this bit when posting a descriptor to the 168ring, and hardware will set this bit when the descriptor is complete. 169 170Descriptor ring sizes must be a power of 2 and range from 2 to 64K entries. 171Descriptor rings' base address must be 8-byte aligned. Descriptors must be 172packed within ring. Each descriptor in each ring must also be aligned on an 8 173byte boundary. Each descriptor ring will have these registers: 174 175 DMA_DESC_xxx_BASE_ADDR, offset 0x1000 + (x * 32), 64-bit, (R/W) 176 DMA_DESC_xxx_SIZE, offset 0x1008 + (x * 32), 32-bit, (R/W) 177 DMA_DESC_xxx_HEAD, offset 0x100c + (x * 32), 32-bit, (R/W) 178 DMA_DESC_xxx_TAIL, offset 0x1010 + (x * 32), 32-bit, (R) 179 DMA_DESC_xxx_CTRL, offset 0x1014 + (x * 32), 32-bit, (W) 180 DMA_DESC_xxx_CREDITS, offset 0x1018 + (x * 32), 32-bit, (R/W) 181 DMA_DESC_xxx_RSVD1, offset 0x101c + (x * 32), 32-bit, (R/W) 182 183Where x is descriptor ring index: 184 185 index ring 186 -------------------- 187 0 CMD 188 1 EVENT 189 2 TX (port 0) 190 3 RX (port 0) 191 4 TX (port 1) 192 5 RX (port 1) 193 . 194 . 195 . 196 124 TX (port 61) 197 125 RX (port 61) 198 126 Resv 199 127 Resv 200 201Writing BASE_ADDR or SIZE will reset HEAD and TAIL to zero. HEAD cannot be 202written past TAIL. To do so would wrap the ring. An empty ring is when HEAD 203== TAIL. A full ring is when HEAD is one position behind TAIL. Both HEAD and 204TAIL increment and modulo wrap at the ring size. 205 206CTRL register bits: 207 208 bit name description 209 ------------------------------------------------------------------------ 210 [0] CTRL_RESET Reset the descriptor ring 211 [1:31] Reserved 212 213All descriptor types share some common fields: 214 215 field width description 216 ------------------------------------------------------------------- 217 DMA_DESC_BUF_ADDR 8 Phys addr of desc payload, 8-byte 218 aligned 219 DMA_DESC_COOKIE 8 Desc cookie for completion matching, 220 upper-most bit is reserved 221 DMA_DESC_BUF_SIZE 2 Desc payload size in bytes 222 DMA_DESC_TLV_SIZE 2 Desc payload total size in bytes 223 used for TLVs. Must be <= 224 DMA_DESC_BUF_SIZE. 225 DMA_DESC_COMP_ERR 2 Completion status of associated 226 desc payload. High order bit is 227 clear on new descs, toggled by 228 hw for completed items. 229 230To support forward- and backward-compatibility, descriptor and completion 231payloads are specified in TLV format. Fields are packed with Type=field name, 232Length=field length, and Value=field value. Software will ignore unknown fields 233filled in by the switch. Likewise, the switch will ignore unknown fields 234filled in by software. 235 236Descriptor payload buffer is 8-byte aligned and TLVs are 8-byte aligned. The 237value within a TLV is also 8-byte aligned. The (packed, 8 byte) TLV header is: 238 239 field width description 240 ----------------------------- 241 type 4 TLV type 242 len 2 TLV value length 243 pad 2 Reserved 244 245The alignment requirements for descriptors and TLVs are to avoid unaligned 246access exceptions in software. Note that the payload for each TLV is also 2478 byte aligned. 248 249Figure 1 shows an example descriptor buffer with two TLVs. 250 251 <------- 8 bytes -------> 252 253 8-byte +––––+ +–––––––––––+–––––+–––––+ +–+ 254 align | type | len | pad | TLV#1 hdr | 255 +–––––––––––+–––––+–––––+ (len=22) | 256 | | | 257 | value | TVL#1 value | 258 | | (padded to 8-byte | 259 | +–––––+ alignment) | 260 | |/////| | 261 8-byte +––––+ +–––––––––––+–––––––––––+ | 262 align | type | len | pad | TLV#2 hdr DESC_BUF_SIZE 263 +–––––+–––––+–––––+–––––+ (len=2) | 264 |value|/////////////////| TLV#2 value | 265 +–––––+/////////////////| | 266 |///////////////////////| | 267 |///////////////////////| | 268 |///////////////////////| | 269 |////////unused/////////| | 270 |////////space//////////| | 271 |///////////////////////| | 272 |///////////////////////| | 273 |///////////////////////| | 274 +–––––––––––––––––––––––+ +–+ 275 276 fig. 1 277 278TLVs can be nested within the NEST TLV type. 279 280Interrupt credits 281^^^^^^^^^^^^^^^^^ 282 283MSI-X vectors used for descriptor ring completions use a credit mechanism for 284efficient device, PCIe bus, OS and driver operations. Each descriptor ring has 285a credit count which represents the number of outstanding descriptors to be 286processed by the driver. As the device marks descriptors complete, the credit 287count is incremented. As the driver processes those outstanding descriptors, 288it returns credits back to the device. This way, the device knows the driver's 289progress and can make decisions about when to fire the next interrupt or not. 290When the credit count is zero, and the first descriptors are posted for the 291driver, a single interrupt is fired. Once the interrupt is fired, the 292interrupt is disabled (auto-masked*). In response to the interrupt, the driver 293will process descriptors and PIO write a returned credit value for that 294descriptor ring. If the driver returns all credits (the driver caught up with 295the device and there is no outstanding work), then the interrupt is unmasked, 296but not fired. If only partial credits are returned, the interrupt remains 297masked but the device generates an interrupt, signaling the driver that more 298outstanding work is available. 299 300(* this masking is unrelated to the MSI-X interrupt mask register) 301 302Endianness 303---------- 304 305Device registers are hard-coded to little-endian (LE). The driver should 306convert to/from host endianness to LE for device register accesses. 307 308Descriptors are LE. Descriptor buffer TLVs will have LE type and length 309fields, but the value field can either be LE or network-byte-order, depending 310on context. TLV values containing network packet data will be in network-byte 311order. A TLV value containing a field or mask used to compare against network 312packet data is network-byte order. For example, flow match fields (and masks) 313are network-byte-order since they're matched directly, byte-by-byte, against 314network packet data. All non-network-packet TLV multi-byte values will be LE. 315 316TLV values in network-byte-order are designated with (N). 317 318 319SECTION 5: Test Registers 320========================= 321 322Rocker has several test registers to support troubleshooting register access, 323interrupt generation, and DMA operations: 324 325 TEST_REG, offset 0x0010, 32-bit (R/W) 326 TEST_REG64, offset 0x0018, 64-bit (R/W) 327 TEST_IRQ, offset 0x0020, 32-bit (R/W) 328 TEST_DMA_ADDR, offset 0x0028, 64-bit (R/W) 329 TEST_DMA_SIZE, offset 0x0030, 32-bit (R/W) 330 TEST_DMA_CTRL, offset 0x0034, 32-bit (R/W) 331 332Reads to TEST_REG and TEST_REG64 will read a value equal to twice the last 333value written to the register. The 32-bit and 64-bit versions are for testing 33432-bit and 64-bit host accesses. 335 336A vector can be written to TEST_IRQ and the device will generate an interrupt 337for that vector. 338 339To test basic DMA operations, allocate a DMA-able host buffer and put the 340buffer address into TEST_DMA_ADDR and size into TEST_DMA_SIZE. Then, write to 341TEST_DMA_CTRL to manipulate the buffer contents. TEST_DMA_CTRL operations are: 342 343 operation value description 344 ----------------------------------------------------------- 345 TEST_DMA_CTRL_CLEAR 1 clear buffer 346 TEST_DMA_CTRL_FILL 2 fill buffer bytes with 0x96 347 TEST_DMA_CTRL_INVERT 4 invert bytes in buffer 348 349Various buffer address and sizes should be tested to verify no address boundary 350issue exists. In particular, buffers that start on odd-8-byte boundary and/or 351span multiple PAGE sizes should be tested. 352 353 354SECTION 6: Ports 355================ 356 357Physical and Logical Ports 358------------------------------------ 359 360The switch supports up to 62 physical (front-panel) ports. Register 361PORT_PHYS_COUNT returns the actual number of physical ports available: 362 363 PORT_PHYS_COUNT, offset 0x0304, 32-bit, (R) 364 365In addition to front-panel ports, the switch supports logical ports for 366tunnels. 367 368Front-panel ports and logical tunnel ports are mapped into a single 32-bit port 369space. A special CPU port is assigned port 0. The front-panel ports are 370mapped to ports 1-62. A special loopback port is assigned port 63. Logical 371tunnel ports are assigned ports 0x0001000-0x0001ffff. 372To summarize the port assignments: 373 374 port mapping 375 ------------------------------------------------------- 376 0 CPU port (for packets to/from host CPU) 377 1-62 front-panel physical ports 378 63 loopback port 379 64-0x0000ffff RSVD 380 0x00010000-0x0001ffff logical tunnel ports 381 0x00020000-0xffffffff RSVD 382 383Physical Port Mode 384------------------ 385 386Switch front-panel ports operate in a mode. Currently, the only mode is 387OF-DPA. OF-DPA[1] mode is based on OpenFlow Data Plane Abstraction (OF-DPA) 388Abstract Switch Specification, Version 1.0, from Broadcom Corporation. To 389set/get the mode for front-panel ports, see port settings, below. 390 391Port Settings 392------------- 393 394Link status for all front-panel ports is available via PORT_PHYS_LINK_STATUS: 395 396 PORT_PHYS_LINK_STATUS, offset 0x0310, 64-bit, (R) 397 398 Value is port bitmap. Bits 0 and 63 always read 0. Bits 1-62 399 read 1 for link UP and 0 for link DOWN for respective front-panel ports. 400 401Other properties for front-panel ports are available via DMA CMD descriptors: 402 403 Get PORT_SETTINGS descriptor: 404 405 field width description 406 ---------------------------------------------- 407 PORT_SETTINGS 2 CMD_GET 408 PPORT 4 Physical port # 409 410 Get PORT_SETTINGS completion: 411 412 field width description 413 ---------------------------------------------- 414 PPORT 4 Physical port # 415 SPEED 4 Current port interface speed, in Mbps 416 DUPLEX 1 1 = Full, 0 = Half 417 AUTONEG 1 1 = enabled, 0 = disabled 418 MACADDR 6 Port MAC address 419 MODE 1 0 = OF-DPA 420 LEARNING 1 MAC address learning on port 421 1 = enabled 422 0 = disabled 423 PHYS_NAME <var> Physical port name (string) 424 425 Set PORT_SETTINGS descriptor: 426 427 field width description 428 ---------------------------------------------- 429 PORT_SETTINGS 2 CMD_SET 430 PPORT 4 Physical port # 431 SPEED 4 Port interface speed, in Mbps 432 DUPLEX 1 1 = Full, 0 = Half 433 AUTONEG 1 1 = enabled, 0 = disabled 434 MACADDR 6 Port MAC address 435 MODE 1 0 = OF-DPA 436 437Port Enable 438----------- 439 440Front-panel ports are initially disabled, which means port ingress and egress 441packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE: 442 443 PORT_PHYS_ENABLE: offset 0x0318, 64-bit, (R/W) 444 445 Value is bitmap of first 64 ports. Bits 0 and 63 are ignored 446 and always read as 0. Write 1 to enable port; write 0 to disable it. 447 Default is 0. 448 449 450SECTION 7: Switch Control 451========================= 452 453This section covers switch-wide register settings. 454 455Control 456------- 457 458This register is used for low level control of the switch. 459 460 CONTROL: offset 0x0300, 32-bit, (W) 461 462 bit name description 463 ------------------------------------------------------------------------ 464 [0] CONTROL_RESET If set, device will perform reset 465 [1:31] Reserved 466 467Switch ID 468--------- 469 470The switch has a SWITCH_ID to be used by software to uniquely identify the 471switch: 472 473 SWITCH_ID: offset 0x0320, 64-bit, (R) 474 475 Value is opaque to switch software and no special encoding is implied. 476 477 478SECTION 8: Events 479================= 480 481Non-I/O asynchronous events from the device are notified to the host using the 482event ring. The TLV structure for events is: 483 484 field width description 485 --------------------------------------------------- 486 TYPE 4 Event type, one of: 487 1: LINK_CHANGED 488 2: MAC_VLAN_SEEN 489 INFO <nest> Event info (details below) 490 491Link Changed Event 492------------------ 493 494When link status changes on a physical port, this event is generated. 495 496 field width description 497 --------------------------------------------------- 498 INFO <nest> 499 PPORT 4 Physical port 500 LINKUP 1 Link status: 501 0: down 502 1: up 503 504MAC VLAN Seen Event 505------------------- 506 507When a packet ingresses on a port and the source MAC/VLAN isn't known to the 508device, the device will generate this event. In response to the event, the 509driver should install to the device the MAC/VLAN on the port into the bridge 510table. Once installed, the MAC/VLAN is known on the port and this event will 511no longer be generated. 512 513 field width description 514 --------------------------------------------------- 515 INFO <nest> 516 PPORT 4 Physical port 517 MAC 6 MAC address 518 VLAN 2 VLAN ID 519 520 521SECTION 9: CPU Packet Processing 522================================ 523 524Ingress packets directed to the host CPU for further processing are delivered 525in the DMA RX ring. Likewise, host CPU originating packets destined to egress 526on switch ports are scheduled by software using the DMA TX ring. 527 528Tx Packet Processing 529-------------------- 530 531Software schedules packets for egress on switch ports using the DMA TX ring. A 532TX descriptor buffer describes the packet location and size in host DMA-able 533memory, the destination port, and any hardware-offload functions (such as L3 534payload checksum offload). Software then bumps the descriptor head to signal 535hardware of new Tx work. In response, hardware will DMA read Tx descriptors up 536to head, DMA read descriptor buffer and packet data, perform offloading 537functions, and finally frame packet on wire (network). Once packet processing 538is complete, hardware will writeback status to descriptor(s) to signal to 539software that Tx is complete and software resources (e.g. skb) backing packet 540can be released. 541 542Figure 2 shows an example 3-fragment packet queued with one Tx descriptor. A 543TLV is used for each packet fragment. 544 545 pkt frag 1 546 +–––––––+ +–+ 547 +–––+ | | 548 desc buf | | | | 549 +––––––––+ | | | | 550 Tx ring +–––+ +–––––+ | | | 551 +–––––––––+ | | TLVs | +–––––––+ | 552 | +–––+ +––––––––+ pkt frag 2 | 553 | desc 0 | | +–––––+ +–––––––+ | 554 +–––––––––+ | TLVs | +–––+ | | 555 head+–+ | +––––––––+ | | | 556 | desc 1 | | +–––––+ +–––––––+ |pkt 557 +–––––––––+ | TLVs | | | 558 | | +––––––––+ | pkt frag 3 | 559 | | | +–––––––+ | 560 +–––––––––+ +–––+ | | 561 | | | | | 562 | | | | | 563 +–––––––––+ | | | 564 | | | | | 565 | | | | | 566 +–––––––––+ | | | 567 | | +–––––––+ +–+ 568 | | 569 +–––––––––+ 570 571 fig 2. 572 573The TLVs for Tx descriptor buffer are: 574 575 field width description 576 --------------------------------------------------------------------- 577 PPORT 4 Destination physical port # 578 TX_OFFLOAD 1 Hardware offload modes: 579 0: no offload 580 1: insert IP csum (ipv4 only) 581 2: insert TCP/UDP csum 582 3: L3 csum calc and insert 583 into csum offset (TX_L3_CSUM_OFF) 584 16-bit 1's complement csum value. 585 IPv4 pseudo-header and IP 586 already calculated by OS 587 and inserted. 588 4: TSO (TCP Segmentation Offload) 589 TX_L3_CSUM_OFF 2 For L3 csum offload mode, the offset, 590 from the beginning of the packet, 591 of the csum field in the L3 header 592 TX_TSO_MSS 2 For TSO offload mode, the 593 Maximum Segment Size in bytes 594 TX_TSO_HDR_LEN 2 For TSO offload mode, the 595 length of ethernet, IP, and 596 TCP/UDP headers, including IP 597 and TCP options. 598 TX_FRAGS <array> Packet fragments 599 TX_FRAG <nest> Packet fragment 600 TX_FRAG_ADDR 8 DMA address of packet fragment 601 TX_FRAG_LEN 2 Packet fragment length 602 603Possible status return codes in descriptor on completion are: 604 605 DESC_COMP_ERR reason 606 -------------------------------------------------------------------- 607 0 OK 608 -ROCKER_ENXIO address or data read err on desc buf or packet 609 fragment 610 -ROCKER_EINVAL bad pport or TSO or csum offloading error 611 -ROCKER_ENOMEM no memory for internal staging tx fragment 612 613Rx Packet Processing 614-------------------- 615 616For packets ingressing on switch ports that are not forwarded by the switch but 617rather directed to the host CPU for further processing are delivered in the DMA 618RX ring. Rx descriptor buffers are allocated by software and placed on the 619ring. Hardware will fill Rx descriptor buffers with packet data, write the 620completion, and signal to software that a new packet is ready. Since Rx packet 621size is not known a-priori, the Rx descriptor buffer must be allocated for 622worst-case packet size. A single Rx descriptor will contain the entire Rx 623packet data in one RX_FRAG. Other Rx TLVs describe and hardware offloads 624performed on the packet, such as checksum validation. 625 626The TLVs for Rx descriptor buffer are: 627 628 field width description 629 --------------------------------------------------- 630 PPORT 4 Source physical port # 631 RX_FLAGS 2 Packet parsing flags: 632 (1 << 0): IPv4 packet 633 (1 << 1): IPv6 packet 634 (1 << 2): csum calculated 635 (1 << 3): IPv4 csum good 636 (1 << 4): IP fragment 637 (1 << 5): TCP packet 638 (1 << 6): UDP packet 639 (1 << 7): TCP/UDP csum good 640 (1 << 8): Offload forward 641 RX_CSUM 2 IP calculated checksum: 642 IPv4: IP payload csum 643 IPv6: header and payload csum 644 (Only valid is RX_FLAGS:csum calc is set) 645 RX_FRAG_ADDR 8 DMA address of packet fragment 646 RX_FRAG_MAX_LEN 2 Packet maximum fragment length 647 RX_FRAG_LEN 2 Actual packet fragment length after receive 648 649Offload forward RX_FLAG indicates the device has already forwarded the packet 650so the host CPU should not also forward the packet. 651 652Possible status return codes in descriptor on completion are: 653 654 DESC_COMP_ERR reason 655 -------------------------------------------------------------------- 656 0 OK 657 -ROCKER_ENXIO address or data read err on desc buf 658 -ROCKER_ENOMEM no memory for internal staging desc buf 659 -ROCKER_EMSGSIZE Rx descriptor buffer wasn't big enough to contain 660 packet data TLV and other TLVs. 661 662 663SECTION 10: OF-DPA Mode 664====================== 665 666OF-DPA mode allows the switch to offload flow packet processing functions to 667hardware. An OpenFlow controller would communicate with an OpenFlow agent 668installed on the switch. The OpenFlow agent would (directly or indirectly) 669communicate with the Rocker switch driver, which in turn would program switch 670hardware with flow functionality, as defined in OF-DPA. The block diagram is: 671 672 +–––––––––––––––----–––+ 673 | OF | 674 | Remote Controller | 675 +––––––––+––----–––––––+ 676 | 677 | 678 +––––––––+–––––––––+ 679 | OF | 680 | Local Agent | 681 +––––––––––––––––––+ 682 | | 683 | Rocker Driver | 684 +––––––––––––––––––+ 685 <this spec> 686 +––––––––––––––––––+ 687 | | 688 | Rocker Switch | 689 +––––––––––––––––––+ 690 691To participate in flow functions, ports must be configure for OF-DPA mode 692during switch initialization. 693 694OF-DPA Flow Table Interface 695--------------------------- 696 697There are commands to add, modify, delete, and get stats of flow table entries. 698The commands are issued using the DMA CMD descriptor ring. The following 699commands are defined: 700 701 CMD_ADD: add an entry to flow table 702 CMD_MOD: modify an entry in flow table 703 CMD_DEL: delete an entry from flow table 704 CMD_GET_STATS: get stats for flow entry 705 706TLVs for add and modify commands are: 707 708 field width description 709 ---------------------------------------------------- 710 OF_DPA_CMD 2 CMD_[ADD|MOD] 711 OF_DPA_TBL 2 Flow table ID 712 0: ingress port 713 10: vlan 714 20: termination mac 715 30: unicast routing 716 40: multicast routing 717 50: bridging 718 60: ACL policy 719 OF_DPA_PRIORITY 4 Flow priority 720 OF_DPA_HARDTIME 4 Hard timeout for flow 721 OF_DPA_IDLETIME 4 Idle timeout for flow 722 OF_DPA_COOKIE 8 Cookie 723 724Additional TLVs based on flow table ID: 725 726Table ID 0: ingress port 727 728 field width description 729 ---------------------------------------------------- 730 OF_DPA_IN_PPORT 4 ingress physical port number 731 OF_DPA_GOTO_TBL 2 goto table ID; zero to drop 732 733Table ID 10: vlan 734 735 field width description 736 ---------------------------------------------------- 737 OF_DPA_IN_PPORT 4 ingress physical port number 738 OF_DPA_VLAN_ID 2 (N) vlan ID 739 OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask 740 OF_DPA_GOTO_TBL 2 goto table ID; zero to drop 741 OF_DPA_NEW_VLAN_ID 2 (N) new vlan ID 742 743Table ID 20: termination mac 744 745 field width description 746 ---------------------------------------------------- 747 OF_DPA_IN_PPORT 4 ingress physical port number 748 OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask 749 OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd 750 OF_DPA_DST_MAC 6 (N) destination MAC 751 OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask 752 OF_DPA_VLAN_ID 2 (N) vlan ID 753 OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask 754 OF_DPA_GOTO_TBL 2 only acceptable values are 755 unicast or multicast routing 756 table IDs 757 OF_DPA_OUT_PPORT 2 if specified, must be 758 controller, set zero otherwise 759 760Table ID 30: unicast routing 761 762 field width description 763 ---------------------------------------------------- 764 OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd 765 OF_DPA_DST_IP 4 (N) destination IPv4 address. 766 Must be unicast address 767 OF_DPA_DST_IP_MASK 4 (N) IP mask. Must be prefix mask 768 OF_DPA_DST_IPV6 16 (N) destination IPv6 address. 769 Must be unicast address 770 OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask. Must be prefix mask 771 OF_DPA_GOTO_TBL 2 goto table ID; zero to drop 772 OF_DPA_GROUP_ID 4 data for GROUP action must 773 be an L3 Unicast group entry 774 775Table ID 40: multicast routing 776 777 field width description 778 ---------------------------------------------------- 779 OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd 780 OF_DPA_VLAN_ID 2 (N) vlan ID 781 OF_DPA_SRC_IP 4 (N) source IPv4. Optional, 782 can contain IPv4 address, 783 must be completely masked 784 if not used 785 OF_DPA_SRC_IP_MASK 4 (N) IP Mask 786 OF_DPA_DST_IP 4 (N) destination IPv4 address. 787 Must be multicast address 788 OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional. 789 Can contain IPv6 address, 790 must be completely masked 791 if not used 792 OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask. 793 OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must 794 be multicast address 795 Must be multicast address 796 OF_DPA_GOTO_TBL 2 goto table ID; zero to drop 797 OF_DPA_GROUP_ID 4 data for GROUP action must 798 be an L3 multicast group entry 799 800Table ID 50: bridging 801 802 field width description 803 ---------------------------------------------------- 804 OF_DPA_VLAN_ID 2 (N) vlan ID 805 OF_DPA_TUNNEL_ID 4 tunnel ID 806 OF_DPA_DST_MAC 6 (N) destination MAC 807 OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask 808 OF_DPA_GOTO_TBL 2 goto table ID; zero to drop 809 OF_DPA_GROUP_ID 4 data for GROUP action must 810 be a L2 Interface, L2 811 Multicast, L2 Flood, 812 or L2 Overlay group entry 813 as appropriate 814 OF_DPA_TUNNEL_LPORT 4 unicast Tenant Bridging 815 flows specify a tunnel 816 logical port ID 817 OF_DPA_OUT_PPORT 2 data for OUTPUT action, 818 restricted to CONTROLLER, 819 set to 0 otherwise 820 821Table ID 60: acl policy 822 823 field width description 824 ---------------------------------------------------- 825 OF_DPA_IN_PPORT 4 ingress physical port number 826 OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask 827 OF_DPA_ETHERTYPE 2 (N) ethertype 828 OF_DPA_VLAN_ID 2 (N) vlan ID 829 OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask 830 OF_DPA_VLAN_PCP 2 (N) vlan Priority Code Point 831 OF_DPA_VLAN_PCP_MASK 2 (N) vlan Priority Code Point mask 832 OF_DPA_SRC_MAC 6 (N) source MAC 833 OF_DPA_SRC_MAC_MASK 6 (N) source MAC mask 834 OF_DPA_DST_MAC 6 (N) destination MAC 835 OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask 836 OF_DPA_TUNNEL_ID 4 tunnel ID 837 OF_DPA_SRC_IP 4 (N) source IPv4. Optional, 838 can contain IPv4 address, 839 must be completely masked 840 if not used 841 OF_DPA_SRC_IP_MASK 4 (N) IP Mask 842 OF_DPA_DST_IP 4 (N) destination IPv4 address. 843 Must be multicast address 844 OF_DPA_DST_IP_MASK 4 (N) IP Mask 845 OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional. 846 Can contain IPv6 address, 847 must be completely masked 848 if not used 849 OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask 850 OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must 851 be multicast address. 852 OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask 853 OF_DPA_SRC_ARP_IP 4 (N) source IPv4 address in the ARP 854 payload. Only used if ethertype 855 == 0x0806. 856 OF_DPA_SRC_ARP_IP_MASK 4 (N) IP Mask 857 OF_DPA_IP_PROTO 1 IP protocol 858 OF_DPA_IP_PROTO_MASK 1 IP protocol mask 859 OF_DPA_IP_DSCP 1 DSCP 860 OF_DPA_IP_DSCP_MASK 1 DSCP mask 861 OF_DPA_IP_ECN 1 ECN 862 OF_DPA_IP_ECN_MASK 1 ECN mask 863 OF_DPA_L4_SRC_PORT 2 (N) L4 source port, only for 864 TCP, UDP, or SCTP 865 OF_DPA_L4_SRC_PORT_MASK 2 (N) L4 source port mask 866 OF_DPA_L4_DST_PORT 2 (N) L4 source port, only for 867 TCP, UDP, or SCTP 868 OF_DPA_L4_DST_PORT_MASK 2 (N) L4 source port mask 869 OF_DPA_ICMP_TYPE 1 ICMP type, only if IP 870 protocol is 1 871 OF_DPA_ICMP_TYPE_MASK 1 ICMP type mask 872 OF_DPA_ICMP_CODE 1 ICMP code 873 OF_DPA_ICMP_CODE_MASK 1 ICMP code mask 874 OF_DPA_IPV6_LABEL 4 (N) IPv6 flow label 875 OF_DPA_IPV6_LABEL_MASK 4 (N) IPv6 flow label mask 876 OF_DPA_GROUP_ID 4 data for GROUP action 877 OF_DPA_QUEUE_ID_ACTION 1 write the queue ID 878 OF_DPA_NEW_QUEUE_ID 1 queue ID 879 OF_DPA_VLAN_PCP_ACTION 1 write the VLAN priority 880 OF_DPA_NEW_VLAN_PCP 1 VLAN priority 881 OF_DPA_IP_DSCP_ACTION 1 write the DSCP 882 OF_DPA_NEW_IP_DSCP 1 new DSCP 883 OF_DPA_TUNNEL_LPORT 4 restrct to valid tunnel 884 logical port, set to 0 885 otherwise. 886 OF_DPA_OUT_PPORT 2 data for OUTPUT action, 887 restricted to CONTROLLER, 888 set to 0 otherwise 889 OF_DPA_CLEAR_ACTIONS 4 if 1 packets matching flow are 890 dropped (all other instructions 891 ignored) 892 893TLVs for flow delete and get stats command are: 894 895 field width description 896 --------------------------------------------------- 897 OF_DPA_CMD 2 CMD_[DEL|GET_STATS] 898 OF_DPA_COOKIE 8 Cookie 899 900On completion of get stats command, the descriptor buffer is written back with 901the following TLVs: 902 903 field width description 904 --------------------------------------------------- 905 OF_DPA_STAT_DURATION 4 Flow duration 906 OF_DPA_STAT_RX_PKTS 8 Received packets 907 OF_DPA_STAT_TX_PKTS 8 Transmit packets 908 909Possible status return codes in descriptor on completion are: 910 911 DESC_COMP_ERR command reason 912 -------------------------------------------------------------------- 913 0 all OK 914 -ROCKER_EFAULT all head or tail index outside 915 of ring 916 -ROCKER_ENXIO all address or data read err on 917 desc buf 918 -ROCKER_EMSGSIZE GET_STATS cmd descriptor buffer wasn't 919 big enough to contain write-back 920 TLVs 921 -ROCKER_EINVAL all invalid parameters passed in 922 -ROCKER_EEXIST ADD entry already exists 923 -ROCKER_ENOSPC ADD no space left in flow table 924 -ROCKER_ENOENT MOD|DEL|GET_STATS cookie invalid 925 926Group Table Interface 927--------------------- 928 929There are commands to add, modify, delete, and get stats of group table 930entries. The commands are issued using the DMA CMD descriptor ring. The 931following commands are defined: 932 933 CMD_ADD: add an entry to group table 934 CMD_MOD: modify an entry in group table 935 CMD_DEL: delete an entry from group table 936 CMD_GET_STATS: get stats for group entry 937 938TLVs for add and modify commands are: 939 940 field width description 941 ----------------------------------------------------------- 942 FLOW_GROUP_CMD 2 CMD_[ADD|MOD] 943 FLOW_GROUP_ID 2 Flow group ID 944 FLOW_GROUP_TYPE 1 Group type: 945 0: L2 interface 946 1: L2 rewrite 947 2: L3 unicast 948 3: L2 multicast 949 4: L2 flood 950 5: L3 interface 951 6: L3 multicast 952 7: L3 ECMP 953 8: L2 overlay 954 FLOW_VLAN_ID 2 Vlan ID (types 0, 3, 4, 6) 955 FLOW_L2_PORT 2 Port (types 0) 956 FLOW_INDEX 4 Index (all types but 0) 957 FLOW_OVERLAY_TYPE 1 Overlay sub-type (type 8): 958 0: Flood unicast tunnel 959 1: Flood multicast tunnel 960 2: Multicast unicast tunnel 961 3: Multicast multicast tunnel 962 FLOW_GROUP_ACTION nest 963 FLOW_GROUP_ID 2 next group ID in chain (all 964 types except 0) 965 FLOW_OUT_PORT 4 egress port (types 0, 8) 966 FLOW_POP_VLAN_TAG 1 strip outer VLAN tag (type 1 967 only) 968 FLOW_VLAN_ID 2 (types 1, 5) 969 FLOW_SRC_MAC 6 (types 1, 2, 5) 970 FLOW_DST_MAC 6 (types 1, 2) 971 972TLVs for flow delete and get stats command are: 973 974 field width description 975 ----------------------------------------------------------- 976 FLOW_GROUP_CMD 2 CMD_[DEL|GET_STATS] 977 FLOW_GROUP_ID 2 Flow group ID 978 979On completion of get stats command, the descriptor buffer is written back with 980the following TLVs: 981 982 field width description 983 --------------------------------------------------- 984 FLOW_GROUP_ID 2 Flow group ID 985 FLOW_STAT_DURATION 4 Flow duration 986 FLOW_STAT_REF_COUNT 4 Flow reference count 987 FLOW_STAT_BUCKET_COUNT 4 Flow bucket count 988 989Possible status return codes in descriptor on completion are: 990 991 DESC_COMP_ERR command reason 992 -------------------------------------------------------------------- 993 0 all OK 994 -ROCKER_EFAULT all head or tail index outside 995 of ring 996 -ROCKER_ENXIO all address or data read err on 997 desc buf 998 -ROCKER_ENOSPC GET_STATS cmd descriptor buffer wasn't 999 big enough to contain write-back 1000 TLVs 1001 -ROCKER_EINVAL ADD|MOD invalid parameters passed in 1002 -ROCKER_EEXIST ADD entry already exists 1003 -ROCKER_ENOSPC ADD no space left in flow table 1004 -ROCKER_ENOENT MOD|DEL|GET_STATS group ID invalid 1005 -ROCKER_EBUSY DEL group reference count non-zero 1006 -ROCKER_ENODEV ADD next group ID doesn't exist 1007 1008 1009 1010References 1011========== 1012 1013[1] OpenFlow Data Plane Abstraction (OF-DPA) Abstract Switch Specification, 1014Version 1.0, from Broadcom Corporation, February 21, 2014.