Accelerate your TensorFlow with RDMA
for InfiniBand and Ethernet
How to Provide a Highly-scalable and Robust Fabric for Deep Learning/Machine Learning Clusters
Webinar Date: Wednesday, May 13, 2020
Webinar Time: 11:00am - 12:00pm India Standard Time
GPU accelerated computing and ever scaling Deep Learning/Machine Learning workloads are posing a unique challenge to network architects looking to design the perfect interconnect fabric. Efficient and sustainable scaling requires expanding the role of interconnect beyond standard message-passing agent to a more intelligent entity that can accelerate the overall compute process.
In this talk, we will present the next generation of InfiniBand and Ethernet solutions that are designed to provide a highly-scalable and robust fabric for Artificial Intelligence clusters. We will discuss how RDMA forms the backbone for accelerated computing and the capabilities required from interconnect to meet the new demands. We will touch upon both InfiniBand and Ethernet fabrics, introducing how to use TensorFlow RDMA in both InfiniBand and Ethernet GPU cluster, and conclude with some practical network designs for small to mid-sized GPU clusters.
In this webinar you will learn:
- To Design Highly scalable & robust fabric for DL / ML users
- RDMA accelerating overall computing capabilities
- Build IB and Ethernet Network design for GPU clusters
Principal Engineer, Solution Architect
NVIDIA, Mellanox Networking Business Unit