Mellanox Technologies ===================== =============================================================================== BXOFED 1.5.1 for Linux Release Notes Revision 1.3.6, August 2010 =============================================================================== Contents: ========= 1. Introduction 2. Supported Gateway Platforms and Firmware 3. Supported Operating Systems 4. Changes From Previous Versions 4.1. Changes From Version 1.4.1-1.3.6 4.2. Changes From Version 1.4.1-1.3.5 4.3. Changes From Version 1.4.1-1.3.5-G 4.4. Changes From Version 1.4.1-1.1.2-E 4.5. Changes From Version 1.4.1-1.1.2-A 4.6. Changes From Version 1.4.1-1.1.2 4.7. Changes From Version 1.4.1-1.1.1-rc3 4.8. Changes From Version 1.4.1-rc11.05 5. Known Issues 5.1. EoIB Host 5.2. FCoE/FCoIB Host 5.3. SDP Host 6. Other Resources 1. Introduction =============== These are the release notes for "Mellanox BXOFED 1.5.1", Rev 1.3.6. The BXOFED package is based on OFED 1.5.1 with the addition of the mlx4_vnic and mlx4_ofc modules for supporting Mellanox BridgeX(TM) gateway platforms. The SDP ULP included in this package was taken from OFED 1.5.2 sources. 2. Supported Gateway Platforms and Firmware =========================================== o Supported Gateway Platforms systems - BX4010 PPC 460 CPU - BX4020 Intel x86 CPU - BridgeX, firmware fw-BridgeX v8.3.0000 - BridgeX, FIT-BXM: 1.3.6-4 - BridgeX, BXM: bxm-1.3.6 o Host, ConnectX (CX & CX2), 2.7.000 for EoIB, and 2.7.700 for FCoE/FCoIB. o BridgeX Programmer's Reference Manual (PRM) Rev. 0.90. 3. Supported Operating Systems ============================== o CPU architectures: - x86_64 o Linux Operating Systems: - RedHat EL5 up4: 2.6.18-164.el5 - RedHat EL5 up5: 2.6.18-194.el5 (*) - SLES11: 2.6.27.19-5-default - OEL5.5 2.6.32.0707 (*) - kernel.org: 2.6.26 (*) 2.6.27 (*)(**) 2.6.30 (*)(**) (*) FCoE/FCoIB is not supported for this kernel. (**) This kernel was partially tested. 4. Changes From Previous versions ================================= 4.1 Changes From Version 1.4.1-1.3.6 ------------------------------------ - Rebase to OFED-1.5.1 (previously code was based on OFED-1.4.1) - Merged SDP driver from latest OFED-1.5.2 - Enhanced mlx4_vnic_info and mlx4_vnic_confd tools - supported operating systems list changed - Bug fixes - Enhanced FCoE/FCoIB error handling, syncing up with latest open-libfc/libfcoe 4.2 Changes From Version 1.4.1-1.3.5 ------------------------------------ - Modification to supported kernels - Memory corruption stabilization - vnic login retry limitation - Partial PKEY support - Modification to mlx4_vnic_info 4.3 Changes From Version 1.4.1-1.3.5-G ------------------------------------ - Fixed memory leak - PKEY SA query mask - Added mlx4_vnic_info -u flag - Minor modifications to the mlx4_vnic_info output 4.4 Changes From Version 1.4.1-1.1.2-E -------------------------------------- - Fixed rare server hangs when closing vNics under stress - Added support for BridgeX GT - Fixed rare bug that caused zombie vNics, that persisted after module unload - Replaced source git tree for librdmacm to OFED 1.4.2 GA. 4.5 Changes From Version 1.4.1-1.1.2-A -------------------------------------- - Fixed data RX ring restart due to ingress packet in access of MTU (bad flow) - Added capability to move control traffic to different SL according to ADV - Added the ability to send periodic mcast solicitation packets. Thus solveing issue of vNics that stay at WAIT-IOA FM #78804 - GW lookup in ingress control packets will use port_id and not eport name FM #79085 - Added support kernel 2.6.18-EL5.4 - Added ingress traffic destination MAC replacement for shared vNic according to module parameter. - Added shared vNic support. A shared vNic will attach to an MGID based on its IP address. - Added send of KA packets from HW interrupt if KA timer is slipping due to CPU or IRQ load. - Moved sending KA from work queue context to hr_timers, this gives better accuracy and display lesser delays under load. - Added module parameter "control_sl" that enables moving control traffic to another SL. This is might be needed in cases of high data load delaying control KA. It is possible and recommended to use BXM configuration in order to move data SL to a different value instead of using this parameter. - Moved EoIB control RX pre-processing from workqueue to IRQ context. - Added mlx4_vnic_helper module. This module prevents kernel calls to unregistered netdices that resulted in occasional server hangs. 4.6 Changes From Version 1.4.1-1.1.2 ------------------------------------ - Fixed hang in receive of control traffic under heavy control stress that caused vNics to close and re-open. 4.7 Changes From Version 1.4.1-1.1.1 (GA2) ------------------------------------------ - Changed vid value for no VLAN in sysfs command used by host administrated vNics from 0 to -1 - Added support for kernel 2.6.28 - Added EoIB user manual. - Added mlx4_vnic_confd service for easy management of host administered vNics. See EoIB_README.txt or User Manual. - Changed default ONBOOT behavior to YES. - Fixed vNic stability issues during vNic creation stress and events where vNics disappeared and reappeared (due to missed keep alive packet on BXM side). - Fixed spurious tx timeouts when downing an interface. - Fixed multicasts performance issue when uninitialized (ifup) vNics were available. - Fixed ethtool warnings - Solved various stability issues. - Solved stability issues with SM change / LID change event processing. - Added eport_state_enfroce module parameter to bring vNic link indication up only when corresponding External Port is up. This is needed for vNic failover. - Fixed issue where vNic neighbor tables got out of sync. This might cause loss of connectivity between different vNics on internal fabric. 4.8 Changes From Version 1.4.1-rc11.05 (GA1) --------------------------------------------- - Added SL support. Note: need to use 'opensm -Q' to support a non-zero SL setting. - OpenSM handover is now supported. - Added TSS support for kernels that enable it (2.6.27 and above) - Latency optimizations - Replacing the IPv6 kernel module on RHEL5 U2/U3, to fix a known kernel bug (#11469) - Bug fixes 5. Known Issues =============== 5.1. EoIB Host: --------------- - After installing EoIB on systems using kernels with versions lower than 2.6.27 (all RH distributions) it is recommended to reboot the host as the IPv6 module is replaced with a module that contains a bug fix for a know kernel bug. - The mlx4_vnic_helper module cannot be unloaded. This module is loaded automatically on the mlx4_vnic module load. To unload the module run: # echo1 > /sys/module/mlx4_vnic_helper/enable_unload - When running event and unload stress tests with debug kernel 2.6.26/28 the error message "BUG: MAX_LOCKDEP_ENTRIES too low!" is displayed. This message is valid for module recycling tests. - IPv6 is currently not supported. - Using many vNics on the same vHub may cause performance reduction. The reduction is a little above linear in the number of vNics. - vNics teardown can be long (up to 1 second per vNic). - Re-installing BXOFED does not preserve old OFED related values in modprobe.conf - When using auto network administered vNics the MAC address might change. On SLES this might cause the network interface name to change (due to SLES persistent SLES behavior), you may use udev rules for persistent naming. - For non-automatic vNics, the command 'mlx4_vnic_info -u' may generate non-persistent unique names in sequential runs. - You may see the following warning messages in /var/log/messages. These messages are harmless and can be ignored (tickets #78716, #87473): ethX:vnic_vhube_del:520: couldn't find 00:25:8b:aa:bb:cc unregister_netdevice: waiting for ethX to become free. Usage count = 3 - Do not open more than 50 host administrated vNics per port due to host memory availability. - In rare cases, the host assigned vNIC state is displayed as link down in the host and the vNIC is not shown in the FabricIT although the actual state should have been link up. Workaround: Restart the host's driver. Run: /etc/init.d/openibd restart; /etc/init.d/mlx4_vnic_confd start 5.2. FCoE/FCoIB Host: --------------------- See FCoE/FCoIB README for FCoIB known issues. 5.3. SDP Host: -------------- - SDP is at beta level on InfiniHost HCA family. - TCP allows connecting to IP_ANY - 0.0.0.0 (as a destination address!). SDP does not allow connecting to IP_ANY and will reject the connection. - The setsockopt(SO_RCVBUF) is not working in sdp socket. To limit top system wide sdp memory usage for recv, use the module parameter top_mem_usage. - Failures when using OOB. - Each SDP socket currently consumes up to 2 MBytes of memory. If this value is high for your installation, it is possible to trade off performance for lower memory utilization per socket by reducing the value of the "rcvbuf_scale" module parameter (default: 16). Note: The minimum legal value for the "rcvbuf_scale" module is 1. At this parameter value, each socket will consume approximately 128 KBytes. - Small message size performance is low when messages are sent by client at a rate lower than the rate at which they are consumed by server, and when TCP_CORK is not set. This is observed, for example, with iperf benchmark. As a workaround, set the TCP_CORK socket option to ensure data is sent in at least 32K byte chunks. - Performance is low on 32-bit kernels, as SDP utilizes high memory to ease memory pressure. Moving to a 64-bit kernel solves this problem even if the application remains a 32-bit one. - By default, SDP utilizes a 2 Kbyte MTU size. This may cause PCI-X cards using Mellanox Technologies "Infinihost" HCAs to experience low bandwidth. Workaround: reset the MTU size to 1K in this situation, using either of the two methods below: 1. Activate the "tavor quirk" workaround in opensm: a. Create an opensm options cache file (/var/cache/osm/opensm.opts): > opensm --cache-options -o b. Add the following line to /var/cache/osm/opensm.opts: enable_quirks TRUE c. Rerun opensm using your usual command line options to activate the opensm quirk option. 2. Activate the "tavor quirk" workaround in cma: set the tavor_quirk module parameter of the rdma_cm module to value 1 (default: 0). - ZCopy is enabled by default for blocks larger than 64K. ZCopy can be disabled by setting the module paramter sdp_zcopy_thresh to zero or to any other value by setting it to another non zero value. - ZCOPY mode gives efficient performance for large blocks with very small cpu utilization. When in use, all messages longer than 'sdp_zcopy_thresh' bytes in length will cause the user space buffer to be pinned and the data sent directly from the original buffer. This results in less CPU usage and on many systems in enhanced bandwidth. ZCOPY is most efficient with multi stream jobs and it performs better as the message size increases. The default 64K value for 'sdp_zcopy_thresh' is sometimes too low for some systems. You must experiment with your hardware to select the best value. - ZCOPY vs BCOPY: ZCOPY performance is more efficient in weak cpu and multi streams, whereas BCOPY is more efficient in single stream. - To use SDP over RoCE, please set 'sdp_link_layer_ib_only' module parameter to 0. - Infrequently, kernel assertion warning shows up for socket reference counter. - On errors and Zcopy traffic, Kernel panic might appear. 6. Other resources ================== - See the EOIB_README.txt available in the docs directory or EoIB chapter in the OFED user manual on how to install and configure EoIB on the host server. - See OFED-1.5.1 release notes for additional known issues.