Monthly Archives: September 2013

Advantages of RDMA for Big Data Applications

Hadoop MapReduce is the leading Big Data analytics framework. This framework enables data scientists to process data volumes and variety never processed before. The result from this data processing is new business creation and operation efficiency.  

As MapReduce and Hadoop advance, more organizations try to use the frameworks in near real-time capabilities. Leveraging RDMA (Remote Direct Memory Access) capabilities for faster Hadoop MapReduce capabilities has proven to be a successful method.

In our presentation at Oracle Open World 2013, we show the advantages RDMA brings to enterprises deploying Hadoop and other Big Data applications:

-        Double analytics performance, accelerating MapReduce framework

-        Double Hadoop file system ingress capabilities

-        Reducing NoSQL Databases’ latencies by 30%

On the analytics side, UDA (Unstructured Data Accelerator), doubles the computation power by offloading networking and buffer copying from the server’s CPU to the network controller. In addition, a novel shuffle and merge approach helped to achieve the needed performance acceleration. The UDA package is and open source package available here (https://code.google.com/p/uda-plugin/).  The HDFS (Hadoop Distributed File System) layer is also getting its share of performance boost.

While the community continues to improve the feature, work conducted at Ohio State University brings the RDMA capabilities to the data ingress process of HDFS. Initial testing shows over 80% improvement in the data write path to the HDFS repository. The RDMA HDFS acceleration research and downloadable package is available from the  Ohio State University website at: http://hadoop-rdma.cse.ohio-state.edu/

We are expecting more RDMA acceleration enablement to different Big Data frameworks in the future.  If you have a good use case, we will be glad to discuss the need and help with the implementation.

Contact us through the comments section below or at bigdata@mellanox.com

 

eyal gutkind
Author: Eyal Gutkind is a Senior Manager, Enterprise Market Development at Mellanox Technologies focusing on Web 2.0 and Big Data applications. Eyal held several engineering and management roles at Mellanox Technologies over the last 11 years. Eyal Gutkind holds a BSc. degree in Electrical Engineering from Ben Gurion University in Israel and MBA from Fuqua School of Business at Duke University, North Carolina.

Accelerating Red Hat’s new OpenStack cloud platform with Mellanox Interconnect

Red Hat Enterprise Linux OpenStack Platform is a new leading Infrastructure-as-a-Service (IaaS) open-source solution for building and deploying cloud-enabled workloads. This new cloud platform gives customers the agility to scale and quickly meet customer demands without compromising on availability, security, or performance.

Red Hat built an industry leading certification program for their OpenStack platform. By achieving this technology certification, partners can assure customers that their solutions have been validated with Red Hat OpenStack technology.  Anyone who earns this new certification will be able to show that they can accomplish the following tasks:

•             Install and configure Red Hat Enterprise Linux OpenStack Platform.

•             Manage users, projects, flavors, and rules.

•             Configure and manage images.

•             Add compute nodes.

•             Manage storage using Swift and Cinder.

 

Mellanox is listed in the Red Hat marketplace as a certified Hardware partner for Networking (Neutron) and Block Storage (Cinder) services. This ensures that Mellanox ConnectX-3 hardware was tested, certified, and now supported with Red Hat OpenStack technology.

Mellanox Technologies offers seamless integration between its products and Red Hat OpenStack services and provides unique functionality that includes application and storage acceleration, network provisioning, automation, hardware-based security, and isolation. Furthermore, using Mellanox interconnect products allows cloud providers to save significant capital and operational expenses through network and I/O consolidation and by increasing the number of virtual machines (VMs) per server.

With Mellanox ConnectX-3 card and OpenStack plugins, customers will benefit from superior performance and native integration with Neutron:

 

Mellanox OpenStack solution extends the Cinder project by adding iSCSI running over RDMA (iSER). Leveraging RDMA, Mellanox OpenStack delivers 5x better data throughput (for example, increasing from 1GB/s to 5GB/s) and requires up to 80% less CPU utilization.

Eli Blog 092013 Img1

 

Mellanox ConnectX-3 adapters equipped with onboard embedded switch (eSwitch) are capable of performing layer-2 switching for the different VMs running on the server. Using the eSwitch will gain higher performance levels in addition to security and QoS. The eSwitch configuration is transparent to the Red Hat Enterprise Linux OpenStack Platform administrator by using the Mellanox neutron plugin. By implementing a technology called SR-IOV (Single Root IO Virtualization) and running RDMA over eSwitch, we were able to show a dramatic difference (x20) compared to when using para-virtualized vNIC running a TCP stream connectivity.

Eli Blog 092013 Img2

Learn more:

Mellanox and Red Hat OpenStack joint solution - click here

View the Mellanox certificationclick here

eli karpilovski
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel.

Virtual Modular Switch (VMS): A Network Evolution Story – Part 1

Traditionally, while Ethernet networks were serving low end and non-performance driven applications, the network topology was based on an access layer with a very high port count and a very low rate of traffic generation. This drove a very high and acceptable blocking ratio and a situation where a single (or two in case of need for high availability) uplink would serve for all purposes and connect to an all mighty aggregation chassis that catered for the whole network.

While applications were continuously evolving into becoming more bandwidth hungry, latency sensitive and capacity driven, the need for a wider pipe between the access and aggregation elements in the network became the enabler for the entire evolution of the network. This in turn, drove users towards usage of more interfaces on the aggregation chassis and the network into a gridlock of price to performance ratio.

The need for a high port count of high capacity interfaces on the aggregation switch translates to a very large and complicated chassis. Now although these are available, they are traditionally a step behind the physical evolution or Ethernet technologies;  late to arrive with the proper amount of higher speeds interfaces and limiting in terms of their capability to carry the extra volume in terms of power, cooling, control tables and switching matrix. This situation can be resolved by eventually replacing the existing chassis with a newer model with the promise to be more future tolerant than its predecessor and of course accepting the additional cost spent on a huge device (or two in case of need for high availability).

VMS Part 1

An alternative to hanging your entire network from a single element is to use a fabric of smaller, simpler and more cost effective elements, in order to create a network entity with the required port count, capacity and other performance attributes. This essentially means replacing your modular switch with a Virtual Modular Switch– or how we like to call it–the VMS.

A VMS is a fat tree topology of Ethernet switches with OSPF routing used for topology discovery and ECMP used for load balancing traffic between leaf (access) elements of the VMS via spine (core) elements of it.

Stay tuned to further exploration of the pros and cons in deploying a VMS vs. deploying a modular chassis.

 ran-almog Author: Since 2011, Ran has served as Sr. Product Manager for Ethernet Products. Prior to joining Mellanox, Ran worked at Nokia Siemens Networks as a solution sales and marketing specialist for the packet networks business unit. Ran holds a BSc. In Electrical Engineering and Computer Sciences from the University of Tel Aviv, Israel.

Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions

High-performance simulations require the most efficient compute platforms. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores and their utilization factor and the interconnect performance, efficiency, and scalability. Efficient high-performance computing systems require high-bandwidth, low-latency connections between thousands of multi-processor nodes, as well as high-speed storage systems.

Mellanox has released “Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions”.  This guide describes how to design, build, and test a high performance compute (HPC) cluster using Mellanox® InfiniBand interconnect covering the installation and setup of the infrastructure including:

  • HPC cluster design
  • Installation and configuration of the Mellanox Interconnect components
  • Cluster configuration and performance testing

 

 Scot Schlultz Author: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining the Mellanox team in March 2013 as Director of HPC and Technical Computing, Schultz is 25-year veteran of the computing industry. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Scot currently maintains his role as Director of Educational Outreach, founding member of the HPC Advisory Council and of various other industry organizations.

ConnectX-3 Pro Hardware Offload Engines

ConnectX-3 Pro,  a new addition to the ConnectX-3 family, is showing significant CPU overhead reduction and performance improvement while running NVGRE, dramatically improving ROI for cloud providers by reducing the application running cost.

We conducted initial tests to measure the performance improvements and the CPU overhead reduction while utilizing the ConnectX-3 Pro NVGRE hardware offload engines.

Blog 091613 Pic 1

Results show 2x performance improvement and 40% CPU overhead reduction!

Blog 091613 Pic 2

ConnectX-3 Pro supports VXLAN hardware offload engines on top of the NVGRE one and is the first adapter in the market that supports hardware offload engines for overlay networks, i.e., NVGRE and VXLAN.

 Gadi Singer Author: Gadi Singer – Product Manager, Adapter Drivers. Gadi manages the Adapters Product Line at Mellanox Technologies. He served as Marketing Product Manager for the HCA Software division at Mellanox Technologies from 2012 to 2013. Prior to joining Mellanox, Gadi worked at Anobit and PMC-Sierra as a Product Line Manager. Mr. Singer holds a BSc degree in Electrical Engineering from Ben-Gurion University in Israel.

How to Increase Virtual Desktop Infrastructure (VDI) Efficiency

Every IT professional’s goal is to improve TCO. In a Virtual Desktop Infrastructure (VDI) application, the objective is to increase the efficiency by maximizing the number of virtual desktops per server while maintaining response times to users that would be comparable to a physical desktop. In addition, the solution must be resilient since downtime of the VDI application causes the idling of hundreds to thousands of users and consequently reduces overall organizational productivity and increases user frustration.

Low latency data requests from storage or other servers are the key to enabling more VDI sessions without increasing user response times. Legacy Fibre Channel-connected storage subsystems provide shared storage which enables moving virtual machines between physical servers. Leveraging an existing Ethernet infrastructure saves costs by combining networking and storage I/O over the same cable. iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). It basically uses the upper layers of iSCSI for session management, discovery, recovery, etc., and thus compatible with all the features and functions supported by iSCSI. However, using iSER eliminates the bottleneck through the following mechanisms:

  • Uses zero copy via RDMA technology
  • CRC is calculated by hardware
  • Works with message boundaries instead of streams
  • The transport protocol is implemented in hardware (minimal CPU cycles per IO)

Motti - diagram2 for blog 091313

Recently, at VMworld’13, LSI Corporation and Mellanox Technologies presented a joint solution that accelerates the access storage. The solution includes LSI’s Nytro MegaRAID NMR 8110-4i card which has 200GB of on-card flash and eight SAS HDDs and Mellanox’s ConnectX®-3 Pro adapter supports 10Gb/s RoCE storage connectivity between the servers. VDI performance (over TCP/IP and RoCE) was measured using Login VSI’s VDI load generator which creates the actual workload of a typical Windows user using Microsoft Office.

Running Login VSI showed that when running over 10GE TCP/IP only 65 virtual desktop responded within 5 seconds or less, versus 140 when running over 10GE RoCE. This translates into more than 2X cost saving of the VDI hardware infrastructure and proven to be an excellent economical alternative to legacy Fibre Channel based storage subsystems.

 mottibeck Author:  Motti Beck is the Director of Marketing, Enterprise Data Center market segment at Mellanox Technologies, Inc. Before joining Mellanox, Motti was a founder of several setup companies including BindKey Technologies that was acquired by DuPont Photomask (today Toppan Printing Company LTD) and Butterfly Communications that was acquired by Texas Instruments. Prior to that, he was a Business Unit Director at National Semiconductors. Motti holds a B.Sc in computer engineering from the Technion – Israel Institute of Technology.

Advancing Applications Performance With InfiniBand

High-performance scientific applications typically require the lowest possible latency in order to have the parallel processes be in sync as much as possible.  In the past, this requirement drove the adoption of SMP machines, where the floating point elements (CPU, GPUs) were placed as much as possible on the same board. With the increased demands for higher compute capability, and lowering the cost of adoption for making large scale HPC more available, we have witnessed the increase of clustering as the preferred architecture for high-performance computing.

 

 

We introduce and explore some of the latest advancements in the areas of high speed networking and suggest new usage models that leverage the latest technologies that meet the desired requirements of today’s demanding applications.   The recently launched Mellanox Connect-IB™ InfiniBand adapter introduced a novel high-performance and scalable architecture for high-performance clusters.  The architecture was designed from the ground up to provide high performance and scalability for the largest supercomputers in the world, today and in the future.

The device includes a new network transport mechanism called Dynamically Connected Transport™ Service (DCT), which was invented to provide a Reliable Connection Transport mechanism — the service that provides many of InfiniBand’s advanced capabilities such as RDMA, large message sends, and low latency kernel bypass — at an unlimited cluster size.  We will also discuss optimizations for MPI collectives communications, that are frequently used for processes synchronization and show how their performance is critical for scalable, high-performance applications.

 

Presented by:  Pak Lui, Application Performance Manager, Mellanox – August 12, 2013 – International Computing for the Atmospheric Sciences Symposium, Annecy, France

 

 

Mellanox Delivers High Speed Interconnect Solutions for New IBM NeXtScale System

IBM recently introduced their new NeXtScale System, a flexible computing platform that provides 3X as many cores as current one-unit rack servers, making it ideal for the fastest growing workloads such as social media, analytics, technical computing and cloud delivery.

NeXtScale n1200 Enclosure
IBM NeXtScale System Chassis front fully loaded

IBM and Mellanox have worked closely to develop a platform that addresses multiple large-scale markets and solves a variety of complex research and business issues.

Through the use of ConnectX-3 FDR 56Gb/s InfiniBand and 10/40GbE adapters and SwitchX-2 FDR 56Gb/s InfiniBand and 10/40GbE switches, we can provide IBM NeXtScale customers with unrivaled interconnect performance to address the needs for:

  • Large data centers requiring efficiency, density, scale, and scalability;
  • Public, private and hybrid cloud infrastructures;
  • Data analytics applications like customer relationship management, operational optimization, risk/financial management, and new business models;
  • Internet media applications such as online gaming and video streaming;
  • High-resolution imaging for applications ranging from medicine to oil and gas exploration;
  • “Departmental” uses where a small solution can increase the speed of outcome prediction, engineering analysis, and design and modeling

Mellanox’s technology, combined with the IBM NeXtScale compute density, provides customers with sustainable competitive advantage in building scale out compute infrastructures. Customers deploying the joint Mellanox-IBM solution will receive maximum bandwidth, lower power consumption and superior application performance.

cecilia-blog-IBM-v2

Resources:

 

 

Driving Innovation with OpenEthernet

Authored by: Amir Sheffer, Sr. Product Manager

For years, data center Ethernet switching equipment has been based on closed, proprietary vendor implementation, providing very limited flexibility for the user. The progress made in open source applications and software can be leveraged in Ethernet switches to create a new generation of open, flexible and customizable solutions.  

Open Source Enables New Solutions / Trends / Technologies

Open Source Enables New Solutions / Trends / Technologies

Switches based on the OpenEthernet approach will replace traditional closed-code switches and will allow data center customization for optimized and efficient operation. The OpenEthernet switch is based on functionality developed by the equipment vendor and integration with public, open cores and tools that can be freely downloaded from the internet.

As a leader of this approach, Mellanox is investing in the integration and development of such tools, which when combined, can provide complete functionality. Examples for such tools can be OpenFlow–for flow configuration; Puppet and Chef–for switch configuration, Quagga for routing protocols, etc.

Open Ethernet

Mellanox switch software runs over Linux.  Even if the Linux kernel provides good infrastructure for the switch, it lacks functionality to connect it to the switching and routing functions. For example, a routing reflector unit is required to synchronize between the Linux kernel, the routing stack and the silicon data path. For this purpose, we are developing and opening such “reflector” units to the open community.

Another example can be the hardware driver or the software development kit (SDK) application interface (API) for the switch. By opening the API to the community, we will be the first ones to enable full flexibility and ease-of implementation to our customers and we believe other will follow.

In parallel, Mellanox is participating in industry-wide groups that are taking a similar approach.  One example can be the OpenStack community, in which Mellanox is an active member. Another example for such group can be the Open Compute Project (OCP), which is defining open and standard equipment for data centers. Mellanox already builds OCP-compatible NICs and has recently contributed the hardware design documents of the SX1024 switch system to OCP.

So far, we have briefly touched several aspects of OpenEthernet. An important feature that will be explained in the coming weeks is the hardware – software separation.

 To be continued…..