RoCE in the Data Center

Today’s data centers demand that the underlying interconnect provide the utmost bandwidth and extremely low latency. While high bandwidth is important, it is not worth much without low latency. Moving large amounts of data through a network can be achieved with TCP/IP, but only RDMA can produce the low latency that avoids costly transmission delays.

The speedy transfer of data is critical to it being used efficiently. Interconnect based on Remote Direct Memory Access (RDMA) offers the ideal option for boosting data center efficiency, reducing overall complexity, and increasing data delivery performance. Mellanox RDMA enables sub-microsecond latency and up to 56Gb/s bandwidth, translating to screamingly fast application performance, better storage and data center utilization, and simplified network management.

Until recently, though, RDMA was only available in InfiniBand fabrics. With the advent of the industry standard “RDMA over Converged Ethernet (RoCE)”, the benefits of RDMA are now available to data centers that are based on an Ethernet or mixed-protocol fabric as well.

Mellanox RDMA was designed to address the challenges that make interconnect protocols based on TCP/IP inefficient. By building OS Bypass, Zero Copy, and CPU Offloading into the architecture, RDMA was planned with a view to high performance.

  • OS Bypass gives an application direct access to the network card, allowing the CPU to communicate directly with the I/O adapter, meaning there is no need for involvement from the OS or driver, which creates a much more efficient interconnect transaction compared to TCP/IP.
  • Zero Copy transfer enables the receive node to read data directly from the send node’s memory, thereby reducing the overhead created from CPU involvement. TCP/IP uses the Sockets API as the interface to the network, which requires two-sided communication, adding to the overall transfer time and burning compute resources.
  • With CPU Offloading, the transport protocol stack is handled by the hardware instead of software, equating to less CPU involvement and more reliable transport than with TCP/IP.

 

Figure 1: Typical TCP/IP Interconnect

Brian Klaff 061614 Figure 1

 Figure 2:  Typical RDMA Interconnect 

Brian Klaff 061614 Figure 2The overall effect is a significant reduction in CPU overhead, maximizing efficiency and providing lightning fast interconnect.

RDMA over Converged Ethernet (RoCE) is an industry InfiniBand Trade Association (IBTA) standard, which allows Storage and Compute clusters to enjoy the advantages of RDMA on Ethernet networks. Ethernet RoCE and InfiniBand RDMA use the same API, which allows developers to use the same code regardless of the transport mechanism.

The latest version of RoCE (RoCEv2) adds even greater functionality. By changing the packet encapsulation to include IP and UDP headers, RDMA can now be used across both L2 and L3 networks. This enables Layer 3 routing, which brings RDMA to networks with multiple subnets. IP multicast is now also possible thanks to the updated version.

RoCE finally makes it possible to experience the lowest available interconnect latency in a legacy Ethernet data center.

RESOURCES:

Data Center Solutions

Mellanox RDMA over Converged Ethernet (RoCE)

Brian Klaff
Author: Brian Klaff is a Senior Technical Communicator at Mellanox. Prior to Mellanox, Brian served as Director of Technical & Marketing Communications for ExperTeam, Ltd. He has also spent time as a Technical Communications Manager at Amdocs Israel, as a Product Marketing Manager at Versaware Technologies, and as a consultant specializing in mobile telecommunications for Mercer Management Consulting. Brian holds a BA in Economics & Near Eastern Studies from Johns Hopkins University.