Checklist Before Submitting an RMA Request

Based on prior experience, a short troubleshooting session may reveal the root cause of a failure and prevent redundant shipment of parts.

Please apply the steps below before submitting an RMA request:

  1. Carry out the troubleshooting steps for an early fault determination.
  2. Please mention in the "Problem Description" section the troubleshooting steps you performed.

We recommend attaching any relevant troubleshooting-related data (e.g. logs, screenshots, etc.) to your RMA request, should you wish to proceed. This will enable Mellanox Support to identify the issue quicker and save you precious time. Checklists are provided below for the following Mellanox product types (assets):


Network Adapter Card Checklist

  1. If the card isn’t recognized by the OS, reseat the card in the server PCI slot and verify that it is recognized by the OS.

    In Linux based OS:
    lspci | grep –i Mellanox

    In Windows OS:
    Under “Device Manager-> Other devices”, select “InfiniBand Controller” or “Unknown Devices”. If you can't find the device, click “Action-> Scan for hardware changes”.

  2. Swap the card with a known working card of a similar P/N

    Note: If the issue recurs with the known working card, this would most likely indicate the card is not faulty.

  3. Replace the cable connected to the card with a known working cable.

  4. Connect the cable to another known working port destination.

  5. Read the port/s LEDs indication.

    Are the LEDs indicating a fault state?

  6. Verify that the card firmware version is up-to-date.

    For the network adapter card firmware query and upgrade procedure, please refer to Network Adapter Firmware Query and Upgrade Procedure.

  7. For the InfiniBand protocol to work, verify that the SM is running in the fabric.

  8. Verify that the driver version installed on the server is up to date.

    For the driver query and upgrade procedure, please refer to Driver Query and Upgrade Procedure.

 

Switch Power Supply and Fan FRU (Field Replacement Unit) Checklist

  1. Reseat the FRU module.

  2. Swap the FRU module with a known working FRU.

    Note: If the issue recurs with the known working FRU, this would most likely indicate that the FRU slot is faulty rather than the FRU.

  3. Read the Power supply/Fan LED indication.

    Does the LED indicate a fault state?

  4. Verify that the FRU module is recognized by the switch’s OS--Mellanox Onyx or MLNX-OS--by invoking the following commands:


    show inventory
    show module

Please note: before submitting an RMA Request, we recommend adding the top serial numbers of the chassis/switch. This may save time with identifying the asset.


Remotely-managed (Unmanaged) Switch Checklist

  1. 1. Verify that the firmware version of the remotely-managed switch is up to date.

    For the remotely-managed switch firmware query and upgrade procedure, please refer to Remotely-managed Switch Firmware Query and Upgrade Procedure.

  2. If you encounter a setback with bringing up ports, please perform the following:

    a. Replace the connected cable/s with another known working cable(s).
    b. Connect the cable/s to another known working port(s).
    c. Perform a loopback test by connecting the faulty port/s to another known working port in the same leaf.
    d. Read port/s LEDs indication. Are the LEDs indicating a faulty state?

  3. Refer to the switch’s status LEDs indications – Is it in a faulty state?

  4. For the InfiniBand protocol to work, verify that the SM is running in the fabric.

 

Managed Switch Checklist

  1. For managed switches, please verify that the managed switch’s software and firmware versions are up to date.

    The firmware version is automatically upgraded during the upgrade of the software.
    For the managed switch’s firmware query and upgrade procedure, please refer to Managed Switch Software Query and Upgrade Procedure.

  2. If you encounter an issue with bringing-up ports, please perform the following:

    a. Replace the connected cable/s with another known working cable(s).
    b. Connect the cable/s to another known working port(s).
    c. Perform a loopback test by connecting the faulty port/s to another known working port in the same leaf.
    d. Refer to the port/s LEDs indication – Is it in a faulty state?

  3. For the InfiniBand protocol to work, verify that the SM is running in the fabric.

  4. Refer to the switch’s status LEDs indications – is it in a faulty state?

  5. Create the switch system dump file.

    To create the switch system dump file, please refer to Creating the Switch System Dump File.

  6. Please note: we recommend attaching the Syssump file to your RMA request. This could help us identify the issue much faster and save you precious time.

Leaf/Spine Module Checklist

  1. The leaf/spine is installed in a modular managed switch. Please verify that the managed switch software and firmware versions are up-to-date.

    The firmware version is automatically upgraded during the upgrade of the software.
    For the managed switch’s firmware query and upgrade procedure, please refer to Managed Switch Software Query and Upgrade Procedure.

  2. To query the state of the modules via the OS (MLNX_OS, Mellanox Onxy), run the “show inventory” and “show module” commands.
  3. If you encounter an issue with bringing-up internal ports, please perform the following:

    a. Reseat the leaf/spine module and the corresponding spine/leaf module (respectively).
    b. Swap the leaf/spine module and the corresponding spine/leaf module (respectively).

    On each internal link, we should eliminate the failing part (leaf, spine or backplane). Swapping the relevant leaf and spine will pinpoint the part which is causing the issue. Normally, the issue migrates with the faulty part.

  4. If you encounter an issue with bringing-up external ports, please perform the following:

    a. Replace the connected cable/s with another known working cable/s.
    b. Connect the cable/s to another known working port/s.
    c. Perform a loopback test by connecting the faulty port/s to another known working port in the same leaf.
    d. Refer to the port/s LEDs indication – Is it in a faulty state?

  5. For the InfiniBand protocol to work, verify that the SM is running in the fabric.

  6. Refer to the switch’s status LEDs indications – Is it in a faulty state?

  7. Create the switch system dump file.

    To create the switch system dump file, please refer to Managed Switch Software Query and Upgrade Procedure.

    Please note: we recommend attaching the Syssump file to your RMA request. This could help us identify the issue much faster and save you precious time.


Cable Checklist

  1. In case you encounter an issue associated with link’s bring-up or link’s errors, verify that the connected devices’ (HCAs and/or switches) firmware versions are up to date.

    For the HCA firmware query and upgrade procedure, please refer to Network Adapter Firmware Query and Upgrade Procedure.
    For the remotely-managed switch firmware query and upgrade procedure, please refer to Remotely-managed Switch Firmware Query and Upgrade Procedure.
    For the managed switch firmware query and upgrade procedure, please refer to Managed Switch Software Query and Upgrade Procedure.


  2. Reseat the cable on both ends.

  3. Connect the cable to another known working port. Repeat the test on both ends of the cable.

  4. Replace the cable with a known working cable of a similar P/N.

    Note: If the issue recurs with the replaced cable, this would most likely indicate that the cable is not faulty.


NVIDIA Cookie Policy

This website uses cookies which may help to deliver content tailored to your preferences and interests, provide you with a better browsing experience, and to analyze our traffic. You may delete and/or block out cookies from this site, but it may affect how the site operates. Further information can be found in our Privacy Policy.