The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, has made graphics accelerators a compelling platform for computationally demanding tasks in a wide variety of application domains. Due to the great computational power of the GPU, the GPGPU method has proven valuable in various areas of science and technology.
GPU based clusters are being used to perform compute intensive tasks, like finite element computations, Computational Fluids Dynamics, Monte-Carlo simulations etc. Several of the world leading supercomputers are using GPUs in order to achieve the desired performance. Since the GPUs provide high core count and floating point operations capability, a high-speed networking such as InfiniBand is required to connect between the GPU platforms, in order to provide the needed throughput and the lowest latency for the GPU to GPU communications.
While GPUs have been shown to provide worthwhile performance acceleration yielding benefits to both price/performance and power/performance, several areas of GPU based clusters could be improved in order to provide higher performance and efficiency. One of the main performance issues with deploying clusters consisting of multi-GPU nodes involves the interaction between the GPUs, or the GPU to GPU communication model. Prior to the GPU-Direct technology, any communication between GPUs had to involve the host CPU and required buffer copy. The GPU communication model required the CPU to initiate and manage memory transfers between the GPUs and the InfiniBand network. Each GPU to GPU communication had to follow the following steps:
- The GPU writes data to a host memory dedicated to the GPU
- The host CPU copies the data from the GPU dedicated host memory to host memory available for the InfiniBand devices to use for RDMA communications
- The InfiniBand device reads data from that open area and send it to the remote node
Senior Director of HPC and Technical Marketing