cxgb.rst (13766B)
1.. SPDX-License-Identifier: GPL-2.0 2.. include:: <isonum.txt> 3 4============================================= 5Chelsio N210 10Gb Ethernet Network Controller 6============================================= 7 8Driver Release Notes for Linux 9 10Version 2.1.1 11 12June 20, 2005 13 14.. Contents 15 16 INTRODUCTION 17 FEATURES 18 PERFORMANCE 19 DRIVER MESSAGES 20 KNOWN ISSUES 21 SUPPORT 22 23 24Introduction 25============ 26 27 This document describes the Linux driver for Chelsio 10Gb Ethernet Network 28 Controller. This driver supports the Chelsio N210 NIC and is backward 29 compatible with the Chelsio N110 model 10Gb NICs. 30 31 32Features 33======== 34 35Adaptive Interrupts (adaptive-rx) 36--------------------------------- 37 38 This feature provides an adaptive algorithm that adjusts the interrupt 39 coalescing parameters, allowing the driver to dynamically adapt the latency 40 settings to achieve the highest performance during various types of network 41 load. 42 43 The interface used to control this feature is ethtool. Please see the 44 ethtool manpage for additional usage information. 45 46 By default, adaptive-rx is disabled. 47 To enable adaptive-rx:: 48 49 ethtool -C <interface> adaptive-rx on 50 51 To disable adaptive-rx, use ethtool:: 52 53 ethtool -C <interface> adaptive-rx off 54 55 After disabling adaptive-rx, the timer latency value will be set to 50us. 56 You may set the timer latency after disabling adaptive-rx:: 57 58 ethtool -C <interface> rx-usecs <microseconds> 59 60 An example to set the timer latency value to 100us on eth0:: 61 62 ethtool -C eth0 rx-usecs 100 63 64 You may also provide a timer latency value while disabling adaptive-rx:: 65 66 ethtool -C <interface> adaptive-rx off rx-usecs <microseconds> 67 68 If adaptive-rx is disabled and a timer latency value is specified, the timer 69 will be set to the specified value until changed by the user or until 70 adaptive-rx is enabled. 71 72 To view the status of the adaptive-rx and timer latency values:: 73 74 ethtool -c <interface> 75 76 77TCP Segmentation Offloading (TSO) Support 78----------------------------------------- 79 80 This feature, also known as "large send", enables a system's protocol stack 81 to offload portions of outbound TCP processing to a network interface card 82 thereby reducing system CPU utilization and enhancing performance. 83 84 The interface used to control this feature is ethtool version 1.8 or higher. 85 Please see the ethtool manpage for additional usage information. 86 87 By default, TSO is enabled. 88 To disable TSO:: 89 90 ethtool -K <interface> tso off 91 92 To enable TSO:: 93 94 ethtool -K <interface> tso on 95 96 To view the status of TSO:: 97 98 ethtool -k <interface> 99 100 101Performance 102=========== 103 104 The following information is provided as an example of how to change system 105 parameters for "performance tuning" an what value to use. You may or may not 106 want to change these system parameters, depending on your server/workstation 107 application. Doing so is not warranted in any way by Chelsio Communications, 108 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss 109 of data or damage to equipment. 110 111 Your distribution may have a different way of doing things, or you may prefer 112 a different method. These commands are shown only to provide an example of 113 what to do and are by no means definitive. 114 115 Making any of the following system changes will only last until you reboot 116 your system. You may want to write a script that runs at boot-up which 117 includes the optimal settings for your system. 118 119 Setting PCI Latency Timer:: 120 121 setpci -d 1425:: 122 123* 0x0c.l=0x0000F800 124 125 Disabling TCP timestamp:: 126 127 sysctl -w net.ipv4.tcp_timestamps=0 128 129 Disabling SACK:: 130 131 sysctl -w net.ipv4.tcp_sack=0 132 133 Setting large number of incoming connection requests:: 134 135 sysctl -w net.ipv4.tcp_max_syn_backlog=3000 136 137 Setting maximum receive socket buffer size:: 138 139 sysctl -w net.core.rmem_max=1024000 140 141 Setting maximum send socket buffer size:: 142 143 sysctl -w net.core.wmem_max=1024000 144 145 Set smp_affinity (on a multiprocessor system) to a single CPU:: 146 147 echo 1 > /proc/irq/<interrupt_number>/smp_affinity 148 149 Setting default receive socket buffer size:: 150 151 sysctl -w net.core.rmem_default=524287 152 153 Setting default send socket buffer size:: 154 155 sysctl -w net.core.wmem_default=524287 156 157 Setting maximum option memory buffers:: 158 159 sysctl -w net.core.optmem_max=524287 160 161 Setting maximum backlog (# of unprocessed packets before kernel drops):: 162 163 sysctl -w net.core.netdev_max_backlog=300000 164 165 Setting TCP read buffers (min/default/max):: 166 167 sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" 168 169 Setting TCP write buffers (min/pressure/max):: 170 171 sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" 172 173 Setting TCP buffer space (min/pressure/max):: 174 175 sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000" 176 177 TCP window size for single connections: 178 179 The receive buffer (RX_WINDOW) size must be at least as large as the 180 Bandwidth-Delay Product of the communication link between the sender and 181 receiver. Due to the variations of RTT, you may want to increase the buffer 182 size up to 2 times the Bandwidth-Delay Product. Reference page 289 of 183 "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens. 184 185 At 10Gb speeds, use the following formula:: 186 187 RX_WINDOW >= 1.25MBytes * RTT(in milliseconds) 188 Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000 189 190 RX_WINDOW sizes of 256KB - 512KB should be sufficient. 191 192 Setting the min, max, and default receive buffer (RX_WINDOW) size:: 193 194 sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>" 195 196 TCP window size for multiple connections: 197 The receive buffer (RX_WINDOW) size may be calculated the same as single 198 connections, but should be divided by the number of connections. The 199 smaller window prevents congestion and facilitates better pacing, 200 especially if/when MAC level flow control does not work well or when it is 201 not supported on the machine. Experimentation may be necessary to attain 202 the correct value. This method is provided as a starting point for the 203 correct receive buffer size. 204 205 Setting the min, max, and default receive buffer (RX_WINDOW) size is 206 performed in the same manner as single connection. 207 208 209Driver Messages 210=============== 211 212 The following messages are the most common messages logged by syslog. These 213 may be found in /var/log/messages. 214 215 Driver up:: 216 217 Chelsio Network Driver - version 2.1.1 218 219 NIC detected:: 220 221 eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit 222 223 Link up:: 224 225 eth#: link is up at 10 Gbps, full duplex 226 227 Link down:: 228 229 eth#: link is down 230 231 232Known Issues 233============ 234 235 These issues have been identified during testing. The following information 236 is provided as a workaround to the problem. In some cases, this problem is 237 inherent to Linux or to a particular Linux Distribution and/or hardware 238 platform. 239 240 1. Large number of TCP retransmits on a multiprocessor (SMP) system. 241 242 On a system with multiple CPUs, the interrupt (IRQ) for the network 243 controller may be bound to more than one CPU. This will cause TCP 244 retransmits if the packet data were to be split across different CPUs 245 and re-assembled in a different order than expected. 246 247 To eliminate the TCP retransmits, set smp_affinity on the particular 248 interrupt to a single CPU. You can locate the interrupt (IRQ) used on 249 the N110/N210 by using ifconfig:: 250 251 ifconfig <dev_name> | grep Interrupt 252 253 Set the smp_affinity to a single CPU:: 254 255 echo 1 > /proc/irq/<interrupt_number>/smp_affinity 256 257 It is highly suggested that you do not run the irqbalance daemon on your 258 system, as this will change any smp_affinity setting you have applied. 259 The irqbalance daemon runs on a 10 second interval and binds interrupts 260 to the least loaded CPU determined by the daemon. To disable this daemon:: 261 262 chkconfig --level 2345 irqbalance off 263 264 By default, some Linux distributions enable the kernel feature, 265 irqbalance, which performs the same function as the daemon. To disable 266 this feature, add the following line to your bootloader:: 267 268 noirqbalance 269 270 Example using the Grub bootloader:: 271 272 title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp) 273 root (hd0,0) 274 kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance 275 initrd /initrd-2.4.21-27.ELsmp.img 276 277 2. After running insmod, the driver is loaded and the incorrect network 278 interface is brought up without running ifup. 279 280 When using 2.4.x kernels, including RHEL kernels, the Linux kernel 281 invokes a script named "hotplug". This script is primarily used to 282 automatically bring up USB devices when they are plugged in, however, 283 the script also attempts to automatically bring up a network interface 284 after loading the kernel module. The hotplug script does this by scanning 285 the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking 286 for HWADDR=<mac_address>. 287 288 If the hotplug script does not find the HWADDRR within any of the 289 ifcfg-eth# files, it will bring up the device with the next available 290 interface name. If this interface is already configured for a different 291 network card, your new interface will have incorrect IP address and 292 network settings. 293 294 To solve this issue, you can add the HWADDR=<mac_address> key to the 295 interface config file of your network controller. 296 297 To disable this "hotplug" feature, you may add the driver (module name) 298 to the "blacklist" file located in /etc/hotplug. It has been noted that 299 this does not work for network devices because the net.agent script 300 does not use the blacklist file. Simply remove, or rename, the net.agent 301 script located in /etc/hotplug to disable this feature. 302 303 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic 304 on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset. 305 306 If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel 307 chipset, you may experience the "133-Mhz Mode Split Completion Data 308 Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the 309 bus PCI-X bus. 310 311 AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel 312 can provide stale data via split completion cycles to a PCI-X card that 313 is operating at 133 Mhz", causing data corruption. 314 315 AMD's provides three workarounds for this problem, however, Chelsio 316 recommends the first option for best performance with this bug: 317 318 For 133Mhz secondary bus operation, limit the transaction length and 319 the number of outstanding transactions, via BIOS configuration 320 programming of the PCI-X card, to the following: 321 322 Data Length (bytes): 1k 323 324 Total allowed outstanding transactions: 2 325 326 Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004, 327 section 56, "133-MHz Mode Split Completion Data Corruption" for more 328 details with this bug and workarounds suggested by AMD. 329 330 It may be possible to work outside AMD's recommended PCI-X settings, try 331 increasing the Data Length to 2k bytes for increased performance. If you 332 have issues with these settings, please revert to the "safe" settings 333 and duplicate the problem before submitting a bug or asking for support. 334 335 .. note:: 336 337 The default setting on most systems is 8 outstanding transactions 338 and 2k bytes data length. 339 340 4. On multiprocessor systems, it has been noted that an application which 341 is handling 10Gb networking can switch between CPUs causing degraded 342 and/or unstable performance. 343 344 If running on an SMP system and taking performance measurements, it 345 is suggested you either run the latest netperf-2.4.0+ or use a binding 346 tool such as Tim Hockin's procstate utilities (runon) 347 <http://www.hockin.org/~thockin/procstate/>. 348 349 Binding netserver and netperf (or other applications) to particular 350 CPUs will have a significant difference in performance measurements. 351 You may need to experiment which CPU to bind the application to in 352 order to achieve the best performance for your system. 353 354 If you are developing an application designed for 10Gb networking, 355 please keep in mind you may want to look at kernel functions 356 sched_setaffinity & sched_getaffinity to bind your application. 357 358 If you are just running user-space applications such as ftp, telnet, 359 etc., you may want to try the runon tool provided by Tim Hockin's 360 procstate utility. You could also try binding the interface to a 361 particular CPU: runon 0 ifup eth0 362 363 364Support 365======= 366 367 If you have problems with the software or hardware, please contact our 368 customer support team via email at support@chelsio.com or check our website 369 at http://www.chelsio.com 370 371------------------------------------------------------------------------------- 372 373:: 374 375 Chelsio Communications 376 370 San Aleso Ave. 377 Suite 100 378 Sunnyvale, CA 94085 379 http://www.chelsio.com 380 381This program is free software; you can redistribute it and/or modify 382it under the terms of the GNU General Public License, version 2, as 383published by the Free Software Foundation. 384 385You should have received a copy of the GNU General Public License along 386with this program; if not, write to the Free Software Foundation, Inc., 38759 Temple Place - Suite 330, Boston, MA 02111-1307, USA. 388 389THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED 390WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 391MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 392 393Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.