Checklist Before Submitting an RMA Request

Based upon our prior experience, a short troubleshooting session may reveal the root cause of a failure and prevent redundant shipment of parts.

Kindly make sure to apply the below steps prior to submitting an RMA request:

  1. Please carry out the troubleshooting steps for an early fault determination.
  2. Please mention, in the "Problem Description" section the troubleshooting steps you performed.

Checklists are provided below for the following Mellanox product types (assets):


Network Adapter Card Checklist

  1. Reseat the card in the server PCI slot,and verify that the card is recognized by the OS.

    In Linux based OS:
    lspci | grep –i Mellanox

    In Windows OS:
    Under “Device Manager-> Other devices”, select “InfiniBand Controller” or “Unknown Devices”. If you cannot find the device, click “Action-> Scan for hardware changes”.

  2. Swap the card with a known working card.

    Note: If the issue recurs with the known working card, this would most likely indicate that the card is not faulty.  

  3. Replace the cable connected to the card with a known working cable.

  4. Connect the cable to another known working port destination.

  5. Read the port/s LEDs indication.

    Are the LEDs indicating a fault state?

  6. Verify that the card firmware version is up to date.

    For the network adapter card firmware query and upgrade procedure, please refer to Network Adapter Firmware Query and Upgrade Procedure.

  7. For InfiniBand working protocol, verify that the SM is running in the fabric.

  8. Verify that the driver version installed on the server is up to date.

    For the driver query and upgrade procedure please refer to Driver Query and Upgrade Procedure.

 

Switch Power Supply and Fan FRU (Field Replacement Unit) Checklist

  1. Reseat the FRU module.

  2. Swap the FRU module with a known working FRU.

    Note: If the issue recurs with the known working FRU, this would most likely indicate that the FRU slot is faulty rather than the FRU.

  3. Read the Power supply/Fan LED indication.

    Does the LED indicate a fault state?

  4. Verify that the FRU module is recognized by the switch’s OS by invoking the following commands:

    For IS5xxx/SX6xxx/SX1xxx switch series:
    show inventory
    show module

    For ISR4xxx switchl series:
    front show
    Or,
    rear show


Remotely-managed (Unmanaged) Switch Checklist

  1. Verify that the remotely-managed switch’s firmware version is up to date.

    For the remotely-managed switch firmware query and upgrade procedure, please refer to Remotely-managed Switch Firmware Query and Upgrade Procedure.

  2. If you encounter a setback with bringing up ports, please perform the following:

    a. Replace the connected cable/s with another known working cable/s.
    b. Connect the cable/s to another known working port/s.
    c. Perform a loopback test by connecting the faulty port/s to another known working port in the same leaf.
    d. Read port/s LEDs indication. Are the LEDs indicating a faulty state?

  3. Refer to the switch’s status LEDs indications – Is it in a faulty state?

  4. In order for the InfiniBand protocol to work, verify that the SM is running in the fabric.

 

Managed Switch Checklist

  1. For managed switches, Please verify that the managed switch’s software and firmware versions are up to date.

    The firmware version is automatically upgraded during the upgrade of the software.
    For the managed switch’s firmware query and upgrade procedure, please refer to Managed Switch Software Query and Upgrade Procedure.

  2. If you encounter an issue with bringing-up ports, please perform the following:

    a. Replace the connected cable/s with another known working cable/s.
    b. Connect the cable/s to another known working port/s.
    c.Perform a loopback test by connecting the faulty port/s to another known working port in the same leaf.
    d. Refer to the port/s LEDs indication – Is it in a faulty state? 

  3. In order For the InfiniBand protocol to work, verify that the SM is running in the fabric.

  4. Refer to the switch’s status LEDs indications – is it in a faulty state?

  5. Create the switch system dump file.

    To create the switch system dump file, please refer to Creating the Switch System Dump File.

  6.  

Leaf/Spine Module Checklist

  1. The leaf/spine is installed in a modular managed switch. Please verify that the managed switch software and firmware versions are up to date.

    The firmware version is automatically upgraded during the upgrade of the software.
    For the managed switch’s firmware query and upgrade procedure, please refer to Managed Switch Software Query and Upgrade Procedure.

  2. If you encounter an issue with bringing-up internal ports, please perform the following:

    a. Reseat the leaf/spine module and the corresponding spine/leaf module (respectively).
    b. Swap the leaf/spine module and the corresponding spine/leaf module (respectively).

    On each internal link, we should eliminate the failing part (leaf, spine or backplane). Swapping the relevant leaf and spine will pinpoint the part which is causing the issue. Normally, the issue migrates with the faulty part.  

  3. If you encounter an issue with bringing-up external ports, please perform the following:

    a. Replace the connected cable/s with another known working cable/s. b. Connect the cable/s to another known working port/s. c. Perform a loopback test by connecting the faulty port/s to another known working port in the same leaf. d. Refer to the port/s LEDs indication – Is it in a faulty state?

  4. In order For the InfiniBand protocol to work, verify that the SM is running in the fabric.

  5. Refer to the switch’s status LEDs indications – Is it in a faulty state?

  6. Create the switch system dump file.

    To create the switch system dump file, please refer to Managed Switch Software Query and Upgrade Procedure.


Cable Checklist

  1. In case you encounter an issue associated with link’s bring-up or link’s errors, verify that the connected devices’ (HCAs and/or switches) firmware versions are up to date.

    For the HCA firmware query and upgrade procedure, please refer to Network Adapter Firmware Query and Upgrade Procedure.
    For the remotely-managed switch firmware query and upgrade procedure, please refer to Remotely-managed Switch Firmware Query and Upgrade Procedure.
    For the managed switch firmware query and upgrade procedure, please refer to Managed Switch Software Query and Upgrade Procedure.


  2. Reseat the cable on both ends.

  3. Connect the cable to another known working port. Repeat the test on both ends of the cable.

  4. Replace the cable with a known working cable of similar P/N.

    Note: If the issue recurs with the replaced cable, this would most likely indicate that the cable is not faulty.