Mellanox Academy

Connect with us:  Follow the Mellanox Community  Connect with Mellanox on Facebook  Follow Mellanox on Twitter  Connect with Mellanox on Google +  Watch Mellanox on YouTube  Network with Mellanox on LinkedIn

Mellanox Academy Course


Mellanox InfiniBand Fabric Troubleshooting

Product: Mellanox MTR-IB-TS –ADVANCED
Duration: 2 Days

This course will help you to reduce maintenance dependency, maximizing your cluster performance in the fastest and most efficient way and enabling you to provide required information to the next maintenance tier, which shortens MTTR and improves your system’s performance.

This class provides common cluster troubleshooting skills, tools, methodology and practical analysis in order to assist you with daily maintenance tasks.

Upon completion of this course, students will be able to support and troubleshoot level 2 fabric debug functions and maintain primary InfiniBand fabrics using Mellanox tools, UFM® (unified fabric manager), performance test tools, and best practices.

What's in it for me?

Technologies often face complex ongoing technical problems and challenges. There are
many systems, parameters, interfaces, and other potential causes to explore in order
to be able to identify the source of a problem and to fix it.

Not all of the problems are the responsibility of the local technical staff, as some are
simple to solve, while others are more complex and require intervention from other
internal or external resources.

This new course from Mellanox Education Services will help you to reduce maintenance
dependency, maximizing your cluster performance in the fastest and most efficient way
and enabling you to provide required information to the next maintenance tier, which
shortens MTTR and improves your system’s performance.


Course Overview

This class provides common cluster troubleshooting skills, tools, methodology and
practical analysis in order to assist you with daily maintenance tasks.


Course Objectives

Upon completion of this course, students will be able to support and troubleshoot level
2 fabric debug functions and maintain primary InfiniBand fabrics using Mellanox tools,
UFM® (unified fabric manager), performance test tools, and best practices.
Upon completion of this course, students should be able to do the following:

  • Describe InfiniBand Cluster functionality
  • Explain InfiniBand Protocol main errors
  • Utilize InfiniBand UFM® Fabric Diagnostic tool
  • Describe InfiniBand Protocol Integral Diagnostic tool
  • Utilize SM log files
  • Perform InfiniBand Cluster troubleshooting analysis
  • Utilize InfiniBand Performance testing tools

Target Audience

  • Network and system administrators
  • Network and system engineers

Prerequisites

  • Data communications and Linux knowledge
  • Mellanox InfiniBand Foundations course (Product: Mellanox MTR-IB-OST-B)
  • Mellanox Foundations Advanced course (Product: Mellanox MTR-IB-OST-A)

Day 1 Agenda

  • InfiniBand Ports main error and data counters 
  • InfiniBand MADs – Management Datagram (MAD) packets
  • InfiniBand Protocol main traps and events 
  • Open SM log file
  • Open SM configuration file
  • InfiniBand DIAGNET troubleshooting tools  
  • InfiniBand DIAGNET fabric issues identification
  • Mellanox switches log and event files

Agenda: Day 2

  • Server – HCA-OFED issues analysis
  • Switch issues analysis
  • SM connectivity  issues analysis
  • Routing issues analysis
  • Congestion issues analysis
  • Performance issues analysis

Get assistance from
Mellanox Academy

Call us: +1-408-419-0461
Email us: training@mellanox.com


Contact Form
* Please note all fields required.

* First Name:
* Last Name:
* Email:
* Comment:
* What is 8-4?