Enabling Higher Azure Stack Efficiency – Networking Matters

 
Azure

A couple of weeks ago, Mellanox’s ConnectX®-3/ConnectX-3 Pro and ConnectX-4/ConnectX-4 Lx NICs became the first to pass Microsoft Server Software Defined Data Center (SDDC) Premium certification for Microsoft Windows at all Ethernet standard speeds, means, 10, 25, 40, 50 & 100 GbE. This was the latest significant milestone and crucial in the journey that Microsoft and Mellanox started more than six years ago to enable our networking hardware to deliver the most efficient solutions for the new Windows Server and Azure Stack-based deployments. These ConnectX NICs have already been certified by the world’s leading server OEMs (HPE, Dell, Lenovo, and others¹), and when deployed with the most advanced switches and cables, like Mellanox’s Spectrum switch and LinkX copper and optical cables, they have been proven to provide the most efficient Azure Stack solutions. This latest milestone is a good cause to look back to analyze all of the progress that has led to this point. For brevity’s sake, let’s start in 2012.

In 2012, Microsoft first released its first Windows Server 2012 product. In that release, Microsoft launched a game-changing Enterprise-Class storage solution, called Storage Spaces. The new solution was developed to handle the exponential growth of data, which created significant challenges for IT. Traditional database block storage, no longer proved as effective for query and analysis. Compared to older storage solutions, Storage Spaces doubled the performance at half of the cost, enabling significantly higher efficiency in Windows-based data centers.

To accomplish this, Microsoft’s storage team leveraged the availability of new technologies, like Flash-based storage and high performance networking, and made a couple brave decisions that helped them to achieve their goals. The first was to enable SMB 3.0 to run over an RDMA-enabled network (SMB Direct), and the second was to replace the traditional Fibre Channel (FC)-based Storage Area Network (SAN) architecture, which fell short of addressing modern data centers’ storage needs, with the advanced Storage Spaces solution running over SMB Direct, which meant using only Ethernet networking, a faster and lower cost replacement to FC and SAN.

Figure 1: Windows Server 2012 Storage Spaces over SMB Direct

 

Windows Server 2012 supports only a Converged System architecture (in which there are dedicated servers for compute and dedicated servers for storage), so Storage Spaces could only run only as a Scale-out File Systems (SOFS). The efficiency boost of replacing FC with an RDMA-enabled network, as published by Microsoft in Windows Server 2012 R2 Storage, showed 50 percent lower cost per GB for storage.

Figure 2: A comparison of cost acquisition between model scenarios

 

Immediately after the release of Windows Server 2012, several papers were published, all demonstrating the higher efficiency of the solution, including, “Achieving Over 1-Million IOPS from Hyper-V VMs in a Scale-Out File Server Cluster Using Windows Server 2012 R2” or, Optimizing MS-SQL AlwaysOn Availability Groups With Server SSD. All showed the advantages of using Mellanox’s RDMA-enabled network solution in the scale-out deployments.

However, Microsoft continued to develop and enhance their Storage Space features and capabilities, and in 2016, in the Windows Server 2016 release, they added support for Hyperconverged systems, a solution that uses Software-Defined Storage (SDS) to run compute and storage over the same server by using Storage Spaces over RDMA-enabled networks (Storage Spaces Direct, or S2D).

Figure 3: Windows Server, past and future Storage Solutions (source: Microsoft Storage Spaces Direct – the Future of Hyper-V and Azure Stack)

 

The efficiency boost that Microsoft’s new Hyperconverged S2D system delivers is clearly illustrated in Figure 3. However, building a Hyperconverged system requires special attention to network performance, as the network must handle all data communication, including:

  • Application to application
  • Applications to storage
  • Management
  • User access
  • Backup and recovery
  • Compute, storage and management traffic

Figure 4: Networking matters in a Hyperconverged deployment – CapEx only

 

When building a system in which 25GbE is replacing the more traditional 10GbE, the higher bandwidth enables close to two times higher efficiency, as displayed in Figure 4. However, above and beyond the higher bandwidth, an RDMA-enabled network, like RDMA over Converged Ethernet (RoCE) reduces the overall data communication latency and maximizes server utilization, resulting in improved deployment efficiency.

Jose Barreto delivered a fascinating presentation at Microsoft’s Ignite 2015, where he showed in real-time the performance boost that RDMA enables.  In a three minute video, Barreto compared the performance of TCP/IP vs RDMA (Ethernet vs. RoCE) and clearly showed that RoCE delivers almost two times higher bandwidth and two times lower latency than Ethernet at 50 percent of the CPU utilization required for the data communication task.

Figure 5: Mellanox RoCE solutions maximize S2D efficiency

 

The presentation also analyzed the, “magic” behind RoCE’s advantage, showing that when running over TCP/IP, all of the CPU cores that were assigned to run communication tasks were 100 percent utilized. This is opposed to RoCE, in which the cores were used very little. As such, the TCP/IP protocol stack could not scale, while RoCE could support a much larger scale-out cluster size. With such performance advantages, Mellanox’s RoCE solutions became the de facto standard for Windows Server 2016 S2D benchmarks and products, to the degree that at Ignite’16, a number of record-level benchmarks and products were announced.

Figure 6: Storage Spaces Direct record performance of 1.2 Terabit/sec over 12-node cluster

 

One of most impressive demos at the show was a record-breaking benchmark performed over a 12-node clusters using HPE’s DL380 G9 servers, each with four Micron 9100MAX NVMe storage cards and dual Mellanox ConnectX-4 100GbE NICs, all connected by Mellanox’s Spectrum 100GbE switch and LinkX cables. The cluster delivered an astonishing sustainable 1.2Tb/s bandwidth across application-to-application communication, which, of course, enables higher data center efficiency.  A separate benchmark, using a smaller cluster of only 4 nodes to run MS SQL, achieved a record performance of 2.5 million transactions per minute (using SOFS). In addition, many other blogs have been published showing the competitive advantages that Mellanox networking solutions enable when used in Windows Server 2016-based deployments, including:

 

At Ignite 2016, Microsoft also announced that it is expecting the lead OEMs to release Azure Stack-based Hyperconverged systems in June 2017. Such systems will be compatible with Microsoft’s Azure cloud, which already runs over RoCE, enabling seamless operation between on premise (private) and off-premise (Azure public) clouds.

Figure 7: Quote from Albert Greenberg, Microsoft / Azure, presenting ONS2014 Keynote

 

Azure stack can run over Hyperconverged systems that use traditional networking, requiring the SDDC Standard certification, or over Software Defined Network (SDN)-based Hyperconverged systems, requiring the SDDC Premium certification. DataON’s Hyperconverged system, for example, is connected by Mellanox’s end-to-end networking solution. A DataON appliance that was launched at Ignite’16 has already been deployed and delivers significant competitive advantages to its users.

The Storage Spaces Direct journey that started six years ago with uncertainty has led to its establishment as the next-generation Windows-based deployment. S2D leverages the high performance that RoCE delivers, enabling higher performance at lower cost, and replacing traditional FC-based SAN. That journey continues, such that additional networking enhancements to the solution’s capabilities are under development and will be added soon.

¹References:

About Motti Beck

Motti Beck is Sr. Director Enterprise Market Development at Mellanox Technologies Inc. Before joining Mellanox Motti was a founder of BindKey Technologies an EDC startup that provided deep submicron semiconductors verification solutions and was acquired by DuPont Photomask and Butterfly Communications a pioneering startup provider of Bluetooth solutions that was acquired by Texas Instrument. Prior to that, he was a Business Unit Director at National Semiconductors. Motti hold B.Sc in computer engineering from the Technion – Israel Institute of Technology. Follow Motti on Twitter: @MottiBeck

Comments are closed.