Road to 100Gb/sec…Innovation Required! (Part 1 of 3)

Transport Layer Innovation: RDMA

During my undergraduate days at UC Berkeley in the 1980’s, I remember climbing through the attic of Cory Hall running 10Mbit/sec coaxial cables to professors’ offices. Man, that 10base2 coax was fast!! Here we are in 2014 right on the verge of 100Gbit/sec networks. Four orders of magnitude increase in bandwidth is no small engineering feat, and achieving 100Gb/s network communications requires innovation at every level of the seven layer OSI model.

To tell you the truth, I never really understood the top three layers of this OSI model: I prefer the TCP/IP model which collapses all of them into a single “Application” layer which makes more sense. Unfortunately, it also collapses the Link layer and the Physical layer and I actually don’t think this makes sense to combine these two.  I like to build my own ‘hybrid’ model that collapses the top three layers into an Application layer but allows you to consider the Link and Physical layers separately.

Kevin D Blog 081814 Fig1

It turns out that a tremendous amount of innovation is required in these bottom four layers to achieve effective 100Gb/s communications networks. The application layer needs to change as well to fully take advantage of 100Gb/s networks.   For now we’ll focus on the bottom four layers.

Transport Layer Innovation: RDMA

Let’s start at the transport layer and work down. The transport layer provides for reliable connections to move data from A to B. These reliable connections are most commonly based on the TCP/IP stack which was developed in the 1980’s during an era of 10 Mbit/sec link technologies that were assumed to be unreliable. TCP/IP uses something called Implicit Congestion Notification to detect congestion in the network. This relies on packet acknowledgement, sequence number checking, and software timeouts at the sender.

Kevin D 082514 Fig2 Car crash

Basically, TCP/IP intentionally allows the network to become congested and drop packets. The sender uses sequence numbers to keep track of packets, and waits for acknowledgements to return from the receiver to determine that the packets has safely arrived at the receiver.

If a packet gets lost then a software timeout at the sender eventually occurs that implicitly signals that congestion has occurred in the network. Once this congestion is detected the sender starts resending packets but also throttles back its sending rate to try to avoid future congestion. With modern high speed networks this is really bad idea!

It is like driving down the freeway at high speed and rear-ending someone to determine that there is congestion – and only then putting on the brakes!

While this works, it’s obviously not the best idea.

The original Ethernet coax used shared media where collisions and lost packets were the norm at the physical and link layers. All modern high speed networks use dedicated point to point links and switches and thus packet loss or corruption at the link and physical layers are the exception rather than the norm.

TCP/IP is very expensive both in terms of CPU utilization and more importantly in terms of latency. In addition using dropped packets to detect network congestion is not the right thing to do. You simply cannot use millisecond timeouts, dropped packets, and resends to deal with congestion in a 100Gbit/sec network.

With 100Gb/sec it critical to use a transport protocol that is high performance, low latency, and doesn’t rely on software. The transport protocol needs to allow applications on one server to share data with another application running on a remote server, as if that data were in the same physical machine. The ‘server’ can be a virtual machine running on a physical server performing a compute or storage function.

 

Kevin D 082514 Fig3 RDMA

RDMA (Remote Direct Memory Addressing) is the key 100Gb/s transport protocol that achieves low latency, high throughput, reliable connections.  It does all of the heavy lifting protocol processing in hardware and delivers data directly to and from applications without involving the CPU or software stack. RDMA is available over both InfiniBand and Ethernet connections and provides the offloaded, low latency connections needed to take advantage of 100Gb/s connections.

RDMA transport is absolutely critical to taking advantage of high performance networks, however a detailed discussion is beyond the scope of this overview. I’ll come back to this topic and cover this in more detail in another post.

About Kevin Deierling

Kevin Deierling has served as Mellanox's vice president of marketing since March 2013. Previously, he was chief architect at Silver Spring Networks from 2007 to 2012. From 2005 to 2007, he was vice president of marketing and business development at Spans Logic. From 1999 to 2005, Mr. Deierling was vice president of product marketing at Mellanox Technologies. Kevin has contributed to multiple technology standards through organization including the InfiniBand Trade Association and PCI Industrial Manufacturing Group. He has over 20 patents and was a contributing author of a text on BiCmos design. Kevin holds a BA in Solid State Physics from UC Berkeley. Follow Kevin on Twitter: @TechseerKD

One thought on “Road to 100Gb/sec…Innovation Required! (Part 1 of 3)

Comments are closed.