Mellanox Technologies ********************* InfiniBand OFED Driver for VMware(R) Infrastructure 3 version 3.5 Version 1.1 Release Notes January 2008 =============================================================================== Table of Contents =============================================================================== 1. Overview 1.1 Contents 1.2 Supported Platforms and Guest Operating Systems 1.3 Supported HCA Cards and Firmware Versions 1.4 Supported Switches, Gateways and Storage 2. Special Configuration 3. Known Issues =============================================================================== 1. Overview =============================================================================== These are the release notes of "InfiniBand OFED Driver for VMware(R) Infrastructure 3 version 3.5" (GA release). The sources of this package are based on Linux OFED 1.2.5 (kernel 2.6.22). The package is provided in RPM format and documentation is available from: http://www.mellanox.com/products/ciov_ib_drivers_vi3-1.php. 1.1 Contents ------------ The package contains the following components: o OpenFabrics based core and Upper Layer Protocols (ULPs): - IB HCA drivers: mthca, mlx4 - core - ULPs: IPoIB and SRP Initiator o Mellanox Technologies firmware burning tool: mstflint o Documentation Note: The mlx4 driver for the ConnectX IB is not enabled by default. For instructions on how to enable it, please refer to the Installation Guide. 1.2 Supported Platforms and Guest Operating Systems --------------------------------------------------- o Supported HCL and Guest Operating Systems: Please refer to VMware support documentation at http://www.vmware.com/support, and follow the "Compatibility Guides" link o Tested Guest Operating Systems: Architecture OS Kernel -------------------------------------------------------------------- x86_64 SUSE Linux Enterprise Server 10 2.6.16.21-0.8 x86_64 Red Hat Enterprise Linux Server 5.0 2.6.18-8 x86_64 Red Hat Enterprise Linux AS 4.0-U4 2.6.9-42 x86_64 Microsoft Windows Server 2003 R2 SP2 x86_64 Microsoft Windows XP Professional SP2 x86 SUSE Linux Enterprise Server 10 2.6.16.21-0.8 x86 Red Hat Enterprise Linux Server 5.0 2.6.18-8 x86 Red Hat Enterprise Linux AS 4.0-U4 2.6.9-42 x86 Microsoft Windows Server 2003 R2 SP2 x86 Microsoft Windows XP Professional SP2 1.3 Supported Host Channel Adapters (HCAs) ------------------------------------------ This release supports Mellanox Technologies InfiniBand (IB) HCAs: o InfiniHost(TM) III Ex (MemFree: Firmware fw-25218 Rev 5.3.000 with memory: Firmware fw-25208 Rev 4.8.200) o InfiniHost(TM) III Lx (firmware fw-25204 Rev 1.2.000) o ConnectX(TM) IB (firmware fw-25408 Rev 2.3.000) For the latest firmware versions, visit http://www.mellanox.com and follow the firmware download page. 1.4 Supported Switches, Gateways and Storage -------------------------------------------- o All production InfiniBand switches and gateways are supported. o Tested platforms: - This release was tested with switches provided by Cisco, Voltaire and Flextronics. - Cisco Gateway: Basic testing was done with Cisco SFS 3012 Multifabric Server Switch Note that the OFED 1.3 Subnet Manager OpenSM v3.1.8 was used for testing. - Storage: o Full testing was performed using the OFED 1.3 SRP target. o Partial testing was performed using the EMC Clariion CX500 storage device. =============================================================================== 2. Special Configuration =============================================================================== The following sub-sections describe special configurations that you may need to apply for IB ULPs. 2.1 IPoIB's MTU ---------------- Operating systems other than VMware(R) ESX Server can set the MTU to values other than 1500 bytes, and their usual default for InfiniBand is 2K bytes. However, IPoIB under VMware ESX Server is currently limited to 1500 bytes, and larger received packets are silently dropped. Thus, set the IPoIB interface MTU for all OSs to 1500 bytes. Changing the MTU may vary depending on the OS. For example: o Linux OS - if the IPoIB interface is named ib0, run: ifconfig ib0 mtu 1500 o Microsoft Windows - execute the following steps: 1. Open "Network Connections" 2. Select the IPoIB adapter and right click on it 3. Select "Properties" 4. Press "Configure" and then go to the "Advanced" tab 5. Select the payload MTU size and change it to 1500 2.2 SRP Module Parameters -------------------------- o List of parameters for performance tuning: 1. srp_sg_tablesize: Maximum number of scatter entries supported per I/O. Default value is 32. 2. srp_cmd_per_lun: Maximum outstanding I/O commands that can be queued per target LUN. Default value is 63. 3. srp_can_queue: Maximum outstanding I/O commands that can be queued by each SCSI host (i.e., HCA). Default value is 128. o Other parameters: 1. dead_state_time: Maximum minutes a target can be in DEAD state before moving to REMOVED state. When it is in REMOVED state, the SRP will return DID_BAD_TARGET status and you may experience error/data corruption. However, this allows ESX Server rebooting cleanly after "dead_state_time" minutes. Default value is 3000 minutes. 2. max_srp_targets: Maximum number of SRP targets on the fabric. Default value is 128. o How to change the default values of SRP parameters: 1. Edit the driver management script configuration file /etc/sysconfig/infiniband/mgmt-mlx.conf 2. Include an SRP_MOD_PARAM variable with the assignments for the parameters you wish to change. For example: SRP_MOD_PARAM="srp_cmd_per_lun=32 srp_can_queue=256" =============================================================================== 3. Known Issues =============================================================================== The following is a list of general limitations and known issues of this release. IPoIB: o IPv6 is not supported by IPoIB. o The IPoIB uplink's logical name in the VMware(R) Virtual Infrastructure Client or VMware(R) VirtualCenter may be displayed as vmhba or as vmnic. o IPoIB networking failover is supported for the "Link Status Only" detection method. "Beacon Probing" failover detection is not supported. o When using VLAN Virtual Guest Tagging (VGT), at least one packet should be transmitted for each new interface created within the virtual machine. For example: 1. Create a new interface (e.g., ) with IP address . 2. To guarantee that a packet is transmitted from the new interface, run: arping -I -c 1 o Changing the IPoIB uplink when the virtual machine is powered on may disturb multicast communication. Workaround: Restart the processes that use a multicast connection within the virtual machine. SRP: o An SRP HBA's logical name may appear as a vmnic name instead of vmhba. This may prevent performing some operations on the HBA (e.g., storage adapters rescanning). Workaround: 1. Log into the ESX Server as root 2. Edit /etc/vmware/esx.conf 3. Delete the lines of the form /device//vmkname = "vmnic" for the corresponding SRP HBAs. 4. Save the file and reboot the machine o When shutting down ESX Server and all paths to the SRP target are dead (or the SRP target is offline), the ESX Server may hang up to "dead_state_time" minutes. Please see the section SRP Module Parameters (above) for details. Workaround: Power off the ESX Server machines or wait up to "dead_state_time" minutes. o When testing with multiple OFED SRP targets in 'blockio' mode and using the same block device, make sure to use a different and unique scsi_vdisk_ID value for each OFED SRP target. o After ESX Server reboot, SCSI targets connected to the SRP storage adapter may not be recognized. Workaround: 1. Use VMware Virtual Infrastructure Client (VIC) to connect to ESX Server 2. Go to the host "Configuration" tab 3. Select "Storage Adapters" 4. Press "Rescan" o Under a heavy load of I/Os from multiple ESX Servers to a single OFED SRP target, performing one of the following actions may cause ESX Servers to lose access to data storage, VMs, virtual or RDM disks located on that OFED SRP target: (1) rebooting an ESX server, (2) booting up a new VM, or (3) creating a new virtual disk for a VM. Consequently, ESX Servers I/Os may fail due to too many reservation conflicts, or LUNs of the OFED SRP target may appear to have 0MB capacity. Workaround: Reboot the relevant ESX Servers. General: o Ignore the following warning messages in the VMware ESX Server log file /var/log/vmkermel: - IPoIB dropped packets - Async event for bogus QP o For OFED 1.2.5 issues, check its release notes at http://www.openfabrics.org. o For OFED 1.3 SRP target issues, check its release notes at http://www.openfabrics.org.