A typical metric used to evaluate network performance is its latency for point-to-point communications. But more important, and sometimes overlooked, is the latency for collective communications, such as barrier synchronization used to synchronize a set of processes, and all-reduce, used to perform distributed reductions. For many High-Performance-Computing applications, the performance of such collective operations play a critical role in determining overall application scalability and performance. As such, a system-oriented approach to network design is essential for achieving the network performance needed to reach extreme system scales.
The CORE-Direct technology introduced by Mellanox was a first step at taking a holistic system view, by implementing the execution of collective communications in the network. The SHArP technology being introduced is an extension of this technology, which moves support for collective communication from the network edges, e.g. the hosts, to the core of the network – the switch fabric. Processing of collective communication moves to dedicated silicon within the InfiniBand Switch-IB 2 switch, thus providing the means for accelerating the performance of these collective operations by an order of magnitude
The OpenStack Summit is a four-day conference for developers, users, and administrators of OpenStack cloud software. Held every six months, the conference schedule rotates based on the OpenStack software release cycle. This week, the summit is being held in Tokyo, Japan at the Grand Prince International Convention Center.
Today, we had a common session (with Irena Berezovsky – Midokura, Livnat Peer – Red Hat) about Quality of services in Cloud. I presented a customer use case and talk about Mellanox NEO, Containers, Virtualization, Auto Provisioning and SR-IOV LAG.
Tomorrow is the last chance to visit Mellanox’s booth (S8) and see the 100Gbps Cloud Solution based on Spectrum, ConnectX-4 and Ceph RDMA. Make sure to stop by and talk with us! Here are some photos from today’s session along with the Mellanox booth:
Mellanox has a long heritage in high bandwidth use-cases for high performance computing and enterprise applications, but one little-known development is with real-time video transports. Traditionally within the broadcast industry, they’ve used a proprietary interface called SDI (Serial Digital Interface) to move uncompressed video signals around a broadcast plant. SDI is a family of digital video interfaces first standardized by the Society of Motion Picture and Television Engineers (SMPTE) in 1989 for broadcast-grade video. The speed of SDI technology, however, has not kept up with the accelerating network speeds and bandwidth of Internet Protocol technology (IP for short).
On Tuesday, October 6, QCT opened its Cloud Solution Center located within QCT’s new U.S. corporate headquarters in San Jose. The new facility is designed to test and demonstrate modern cloud datacenter solutions that have been jointly developed by QCT and it’s technology partners. Among the demonstrated solutions, there was an innovative VDI deployment that has been jointly developed by QCT and Mellanox and based on a virtualized hyper-converged infrastructure with scale-out Software-Defined-Storage and connected over 40GbE.
VDI enables companies to centralize all of their desktop services over a virtualized data center. With VDI, users are not tied to a specific PC and can access their desktop and run applications from anywhere. VDI also helps IT administrators by creating more efficient and secure environments, which enables them to better serve their customers’ business needs.
VDI efficiency is measured by the number of virtual desktops that a specific infrastructure can support, or, in other words, by measuring the cost per user. The major limiting factor is the access time to storage. Replacing the traditional Storage Area Network (SAN) architecture with a modern scale-out software-defined storage architecture with fast interconnect supporting 40GbE significantly eliminates potential bottlenecks, enabling the lowest total cost of ownership (TCO) and highest efficiency.
With the explosion of data over the past few years, data storage has become a hot topic among corporate decision makers. It is no longer sufficient to have adequate space for the massive quantities of data that must be stored; it is just as critical that stored data be accessible without any bottlenecks that impede the ability to process and analyze data in real time.
Traditionally, accessing hard disk storage took tens of milliseconds, and the corresponding network and protocol overheads were in the hundreds of microseconds, a negligible percentage of the overall access time.
At that time, networks ran on 1Gb/s bandwidth, and SCSI was the protocol used for accessing storage locally while iSCSI based on TCP was developed for remote access.
However, once storage technology improved and Solid-State Disks (SSD) became the norm, access time dropped by two orders of magnitude to the hundreds of microseconds. Unless network and protocol access times decreased by a similar factor, they would create a bottleneck that negated the gains made by the new media technology.
This meant that the network had to handle larger bandwidths, such as 40Gb/s and now even 100Gb/s driving faster data transfers. For remote access, iSCSI is still the protocol of choice; however, TCP was no longer efficient enough, such that RDMA (RoCE) became the transport of choice for data plane operation and iSER was developed as an enhancement of iSCSI.
In my first blog on Ceph, I explained what it is and why it’s hot. But what does Mellanox, a networking company, have to do with Ceph, a software-defined storage solution? The answer lies in the Ceph scale-out design. And some empirical results are found in the new “Red Hat Ceph Storage Clusters on Supermicro storage servers”reference architecturepublished August 10th.
Ceph has two logical networks, the client-facing (public) and the cluster (private) networks. Communication with clients or application servers is via the former while replication, heartbeat, and reconstruction traffic run on the latter. You can run both logical networks on one physical network or separate the networks if you have a large cluster or lots of activity.
Figure 1: Logical diagram of the two Ceph networks
Just one more week to go before VMworld 2015begins at Moscone Center in San Francisco. VMworld is the go-to event where business and technical decision makers converge. In recent years, this week-long conference has become the major virtualization technologies event, and this year is expected to be the biggest ever.
We are thrilled to co-present a breakout session in the Technology Deep Dives and Futures track: Delivering Maximum Performance for Scale-Out Applications with ESX 6 [Tuesday, September 1, 2015: 11AM-Noon]
Session CTO6454: Presented by Josh Simons, Office of the CTO, HPC – VMware and Liran Liss, Senior Principal Architect, Mellanox.
An increasing number of important scale-out workloads – Telco Network Function Virtualization (NFV), in-memory distributed databases, parallel file systems, Microsoft Server Message Block (SMB) Direct, and High Performance Computing – benefit significantly from network interfaces that provide ultra-low latency, high bandwidth, and high packet rates. Prior to ESX 6.0,Single-Root-IO-Virtualization (SR-IOV) and Fixed Pass through (FPT), which allow placing hardware network interfaces directly under VM control, introduced significant latency and CPU overheads relative to bare-metal configurations. ESXi 6.0 introduces support for Write Combining, which eliminates these overheads, resulting in near-native performance on this important class of workloads. The benefits of these improvements will be demonstrated using several prominent workloads, including a High Performance Computing (HPC) application, a Data-Plane-Development-Kit (DPDK) based NFV appliance, and the Windows SMB-direct storage protocol Detailed information will be provided to show attendees how to configure systems to achieve these results.
Back in April 2015, during the Ethernet Technology Summit conference, my colleague Rob Davis wrote a great blog about NVMe Over Fabrics. He outlined the basics of what NVMe is and why Mellanox is collaborating on a standard to access NVMe devices over networks (over fabrics). We had two demos from two vendors in our booth:
Mangstor’s NX-Series array with NVMe Over Fabrics, using Mellanox 56GbE RoCE (or FDR InfiniBand), demonstrated >10GB/s read throughput and >2.5 million 4KB random read IOPS.
Saratoga Speed’s Altamont XP-L with iSER (iSCSI RDMA), using Mellanox 56Gb RoCE to reach 11.6GB/s read throughput and 2.7 million 4KB sequential read IOPs
These numbers were pretty impressive, but in the technology world, nothing stands still. One must always strive to be faster, cheaper, more reliable, and/or more efficient.
The Story Gets Better
Today—four months after Ethernet Technology Summit—kicked off the Flash Memory Summit in Santa Clara, California. Mellanox issued a press releasehighlighting the fact that we now have NINE vendors showing TWELVE demos of flash (or other non-volatile memory) being accessed using high-speed Mellanox networks at 40, 56, or even 100Gb/s speeds. Mangstor and Saratoga Speed are both back with faster, more impressive demos and we have other demos from Apeiron, HGST, Memblaze, Micron, NetApp, PMC-Sierra, and Samsung. Here’s a quick summary:
In my previous post, I outlined how Gartner and The Register were predicting a gloomy outcome for Fibre Channel over Ethernet (FCoE) and made the assertion that in contrast RDMA over Converged Ethernet (RoCE) had quite a rosy future. The key here is that RoCE has crossed the chasm from technology enthusiasts and early adopters to mainstream buyers.
In his eponymous book, Moore outlines that the main challenge of Crossing the Chasm is that the Early Majority are pragmatists interested in the quality, reliability, and business value of a technology. Whereas visionaries and enthusiasts relish new, disruptive technologies; the pragmatist values solutions that integrate smoothly into the existing infrastructure. Pragmatists prefer well established suppliers and seek references from other mature customers in their industry. And pragmatists look for technologies where there is a competitive multi-vendor eco-system that gives them flexibility, bargaining power, and leverage.
To summarize the three key requirements needed for a technology to cross the chasm are:
Demonstration that the technology delivers clear business value
Penetration of key beachhead in a mainstream market
I was talking with my colleague, Rob Davis, recently and he commented that “RoCE has leaped the canyon.” Now Rob is from Minnesota and they talk kind of funny there, but despite the rewording, I realized instantly what he meant. RoCE, of course refers to RDMA over Converged Ethernet technology and has “leaped the canyon” was a more emphatic way of saying has “crossed the chasm.”
This is, of course, the now proverbial CHASM: the gap between early adopters and mainstream users made famous by the book, “Crossing the Chasm” by @GeoffreyAMoore. If you are serious about high-tech marketing and haven’t read this book, then you should cancel your afternoon meetings, download it onto your Kindle, and dive in! Moore’s Chasm along with Clayton Christianson’s Innovator’s Dilemma, and Michael Porter’s Competitive Strategy comprise the sacred trilogy for technology marketers.