Category Archives: InfiniBand

Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions

High-performance simulations require the most efficient compute platforms. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores and their utilization factor and the interconnect performance, efficiency, and scalability. Efficient high-performance computing systems require high-bandwidth, low-latency connections between thousands of multi-processor nodes, as well as high-speed storage systems.

Mellanox has released “Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions”.  This guide describes how to design, build, and test a high performance compute (HPC) cluster using Mellanox® InfiniBand interconnect covering the installation and setup of the infrastructure including:

  • HPC cluster design
  • Installation and configuration of the Mellanox Interconnect components
  • Cluster configuration and performance testing

 

 Scot Schlultz Author: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining the Mellanox team in March 2013 as Director of HPC and Technical Computing, Schultz is 25-year veteran of the computing industry. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Scot currently maintains his role as Director of Educational Outreach, founding member of the HPC Advisory Council and of various other industry organizations.

Advancing Applications Performance With InfiniBand

High-performance scientific applications typically require the lowest possible latency in order to have the parallel processes be in sync as much as possible.  In the past, this requirement drove the adoption of SMP machines, where the floating point elements (CPU, GPUs) were placed as much as possible on the same board. With the increased demands for higher compute capability, and lowering the cost of adoption for making large scale HPC more available, we have witnessed the increase of clustering as the preferred architecture for high-performance computing.

 

 

We introduce and explore some of the latest advancements in the areas of high speed networking and suggest new usage models that leverage the latest technologies that meet the desired requirements of today’s demanding applications.   The recently launched Mellanox Connect-IB™ InfiniBand adapter introduced a novel high-performance and scalable architecture for high-performance clusters.  The architecture was designed from the ground up to provide high performance and scalability for the largest supercomputers in the world, today and in the future.

The device includes a new network transport mechanism called Dynamically Connected Transport™ Service (DCT), which was invented to provide a Reliable Connection Transport mechanism — the service that provides many of InfiniBand’s advanced capabilities such as RDMA, large message sends, and low latency kernel bypass — at an unlimited cluster size.  We will also discuss optimizations for MPI collectives communications, that are frequently used for processes synchronization and show how their performance is critical for scalable, high-performance applications.

 

Presented by:  Pak Lui, Application Performance Manager, Mellanox – August 12, 2013 – International Computing for the Atmospheric Sciences Symposium, Annecy, France

 

 

Why I left HP after 19 years to join ProfitBricks

On 02.12.13, In Cloud Computing, by

Pete Johnson, new Platform Evangelist

Woz once said, “I thought I’d be an HPer for life.” While I don’t usually claim to have a whole lot in common with the man who designed the first computer I ever saw (an Apple II, summer ’78), in this instance it’s true. As it turns out, we were both wrong.

Pete Johnson, new Platform Evangelist for ProfitBricks

I stayed at HP as long as I did for lots of reasons. Business model diversity is one:  over the last two decades, I was lucky enough to be a front line coder, a tech lead, a project manager, and an enterprise architect while working on web sites for enterprise support, consumer ecommerce sales, enterprise online sales, all forms of marketing, and even post-sales printing press supplies reordering.   Most recently I was employee #37 for HP’s new public cloud offering where I performed a lot of roles including project management of web development teams, customer facing demonstrations at trade shows, and sales pitches for Fortune 500 CIOs.  But I also remained at HP because of the culture and values that came straight from Bill Hewlett and Dave Packard, which my early mentors instilled in me. You can still find those values there today if you look hard enough, and if anybody gets that, Meg Whitman does.

Why leave HP for ProfitBricks then?

So if I still have such a rosy view of HP, despite recent bumpiness, why did I leave to become the Platform Evangelist for ProfitBricks?

Three reasons:

  1. InfiniBand
  2. InfiniBand
  3. InfiniBand

If you are anything like the sample of computer industry veterans I told about my move last week, you just said, “What the heck is InfiniBand?” Let me explain what it is and why it is poised to fundamentally change the cloud computing.

Ethernet is the dominant network technology used in data centers today. Originally created during the Carter administration, it uses a hierarchical structure of LAN segments, which ultimately means that packets have exactly one path to traverse when moving from point A to point B anywhere in the network. InfiniBand, which is a popular 21st century technology in the supercomputing and high-performance computing (HPC) communities, uses a grid or mesh system that gives packets multiple paths from point A to point B. This key difference, among other nuances, gives InfiniBand a top speed of 80 Gbits/sec, resulting in a speed that is 80x faster than Amazon’s AWS 1Gbit/sec standard Ethernet connections.

What’s the big deal about InfiniBand?

“So what?” you may be thinking. “A faster cloud network is nice, but it doesn’t seem like THAT big a deal.”

Actually, it is a VERY big deal when you stop and think about how a cloud computing provider can take advantage of a network like this.

As founder and CMO Andreas Gauger put it to me during the interview process, virtualization is a game of Tetris in which you are trying to fit various sizes of Virtual Machines on top of physical hardware to maximize utilization. This is particularly critical for a public cloud provider. With InfiniBand, Profit Bricks can rearrange the pieces, and at 80 Gbits/sec, our hyper-visor can move a VM from one physical machine to another without the VM ever knowing. This helps us maximize the physical hardware and keep prices competitive, but it also means two other things for our customers:

  • You can provision any combination of CPU cores and RAM you want, up to and including the size of the full physical hardware we use
  • You can change the number of CPU cores or amount of RAM on-the-fly, live, without rebooting the VM

In a world where other public cloud providers force you into cookie cutter VM sizes in an attempt to simplify the game of Tetris for themselves, the first feature is obviously differentiating. But when most people hear the second one, their reaction is that it can’t possibly be true — it must be a lie. You can’t change virtual hardware on a VM without rebooting it, can you?

No way you can change CPU or RAM without rebooting a VM!

Do you suppose I’d check that out before leaving the only employer I’ve ever known in my adult life?

I spun up a VM, installed Apache, launched a load test from my desktop against the web server I just created, changed both the CPU Cores and RAM on the server instance, confirmed the change at the VM command line, and allowed the load test to end.  You know what the load test log showed?

Number of errors: 0.

The Apache web server never went down, despite the virtual hardware change, and handled HTTP requests every 40 milliseconds. I never even lost my remote login session. Whoa.

But wait, there’s more (and more to come)

Throw in the fact that the ProfitBricks block storage platform takes advantage of InfiBand to not only provide RAID 10 redundancy, but RAID 10 mirrored across two availability zones, and I was completely sold.  I realized that ProfitBricks founder, CTO, and CEO Achim Weiss took the data center efficiency knowledge that gave 1&1 a tremendous price advantage and combined it with supercomputing technology to create a cloud computing game-changer that his engineering team is just beginning to tap into. I can’t wait to see what they do with object storage, databases, and everything else that you’d expect from a fully IaaS offering. I had to be a part of that.

Simply put: ProfitBricks uses InfiniBand to enable Cloud Computing 2.0.

And that’s why, after 19 years, I left HP.

Mellanox InfiniBand and Ethernet Switches Receive IPv6 Certification

I am proud to announce that Mellanox’s SwitchX® line of InfiniBand and Ethernet switches have received a gold certification for Internet Protocol v6 (IPv6) by the Internet Protocol Forum.  Adding IPv6 support to our SwitchX series is another milestone for Mellanox’s InfiniBand and Ethernet interconnect solutions, and demonstrates our commitment to producing quality, interoperable InfiniBand and Ethernet products optimized for the latest Internet Protocols.

SX1036 - 36-port 40GbE Switch

Mellanox’s drive to satisfy strong requirements has led to receiving the gold certification as part of the IPv6 Ready Logo Program which is a conformance and interoperability testing program designed to increase user confidence by demonstrating that IPv6 is the future of network architecture.

We at Mellanox feel that as global technology adoption rates increase, there is a greater need for larger networks and subsequently more IP addresses. Just as background, Internet Protocol version 4 (IPv4), still in dominant use, is now reaching the limit of its capacity. The next generation of IP – IPv6 – is designed to provide a vastly expanded address space and quadruples the number of network address bits from 32 bits in IPv4 to 128 bits, providing more than enough globally unique IP addresses for every networked device on the planet.

Regards,

Amit Katz

Director, Product Management

National Supercomputing Centre in Shenzhen (NSCS) – #2 on June 2010 Top500 list

I had the pleasure to be little bit involved in the creation of the fastest supercomputer in Asia, and the second fastest supercomputer in the world – the Dawning “Nebulae” Petaflop Supercomputer at SIAT. If we look on the peak flops capacity of the system – nearly 3 Petaflops, it is the largest supercomputer in the world. I visited the supercomputer site in April and saw how fast it was assembled. It took around 3 weeks to get it up and running – amazing, well, this is one of the benefits of using cluster architecture instead of the expensive proprietary systems. The first picture by the way was taken during the system setup in Shenzhen.

 

 

 

 

 

 

 

 

 

The system includes 5200 Dawning TC3600 Blades, each with NVIDIA Fermi GPU to provide 120K cores, all connected with Mellanox ConnectX InfiniBand QDR adapters, IS5000 switches and the fabric management. It is the 3rd system in the world to provide more than sustained Petaflop performance (after Roadrunner and Jaguar). Unlike Jaguar (from Cray) that requires 20K nodes to reach the required performance, Nebulae does it with only 5.2K nodes – reducing the needed real-estate etc, making is much more cost effective. It is yet another prove that commodity-based supercomputers can deliver better performance, cost/performance and other x/performance metrics compared to the proprietary systems. As GPUs gain higher popularity, we also witness the effort that is being done to create and port the needed applications to GPU-based environments, which will bring a new era of GPU computing. It is clear that GPUs will drive the next phase of supercomputers, and of course the new speeds and feeds of the interconnect solutions (such as the IBTA’s new specifications for the FDR/EDR InfiniBand speeds).

The second picture was taken at the ISC’10 conference, after the Top500 award ceremony. You can see the Top500 certificates…

 

 

 

 

 

 

 

 

 

Regards,

Gilad Shainer
Shainer@mellanox.com

The biggest winner of the new June 2010 Top500 Supercomputers list? InfiniBand!

Published twice a year, the Top500 supercomputers list ranks the world fastest supercomputers and provides a great indication for HPC market trends, usage models and a tool for future predictions. The 35th release of the Top500 list was just published and according to the new results InfiniBand has become the de-facto interconnect technology for high performance computing.

What wasn’t said on InfiniBand from the competitor world? Too many time I have heard that InfiniBand is dead and that Ethernet is the killer. I am just sitting in my chair and laughing. InfiniBand is the only interconnect that is growing on the Top500 list, more than 30% growth year over year (YoY) and it is growing by continuing to uproot Ethernet and the proprietary solutions. Ethernet is 14% down YoY and it has become very difficult to spot a proprietary clustered interconnect…  Even more, in the hard core of HPC, the Top100, 64% of the systems are InfiniBand and are using solutions from Mellanox. InfiniBand is definitely proven to provide the needed scalability, efficiency and performance, and to really deliver the highest CPU or GPU availability to the user or to the applications. Connecting 208 systems from the list is only steps away from connecting the majority of the systems.

What makes InfiniBand so strong? The fact that it solves issues and does not migrate them to other parts of the systems. In a balanced HPC system, each components needs to do its work, and not rely on other components to do overhead tasks. Mellanox is doing a great job in providing solutions that offload all the communications and can provide the needed accelerations for the CPU or GPU, and maximize the CPU/GPU cycles for the applications. The collaborations with NVIDIA on the NVIDA GPUDirect, Mellanox CORE-Direct and so forth are just few examples.

The GPUDIrect is a great example on how Mellanox can offload the CPU from being involved in the GPU-to-GPU communications. No other InfiniBand vendor can do it without using Mellanox technology. GPUDirect requires network offloading or it does not work. Simple. When you want to offload the CPU from being involved in the GPU to GPU communications, and your interconnect needs the CPU to do the transports (since it is an onloading solution), the CPU is involved in every GPU transaction. Only offloading interconnects, such as Mellanox InfiniBand can really deliver the benefits of the GPUDirect.

If you want more information on the GPUDirect and other solutions, feel free to drop a note to hpc@mellanox.com.

Gilad

Visit Mellanox at ISC’10

It’s almost time for ISC’10 in Hamburg, Germany (May 31-June 3), please stop by and visit Mellanox Technologies booth (#331) to learn more about how our products deliver market-leading bandwidth, high-performance, scalability, power conservation and cost-effectiveness while converging multiple legacy network technologies into one future-proof solution.  

Mellanox’s end-to-end 40Gb/s InfiniBand connectivity products deliver the industry’s leading CPU efficiency rating on the TOP500. Come see our application acceleration and offload technologies that decrease run time and increase cluster productivity.

Hear from our HPC Industry Exports

Exhibitor Forum Session – Tuesday, June 1, 9:40AM – 10:10AM

Speaking: Gilad Shainer, Sr. Director of HPC Marketing / Michael Kagan, CTO

HOT SEAT SESSION – Tuesday, June 1, 3:15PM – 3:30PM

Speaking: Michael Kagan, CTO

JuRoPa breakfast Session – Wednesday, June 2, 7:30AM – 8:45AM

Speaking: Gilad Shainer, Sr. Director of HPC Marketing / Michael Kagan, CTO

“Low Latency, High Throughput, RDMA & the Cloud In-Between” – Wednesday, June 2, 10:00AM – 10:30AM

Speaking: Gilad Shainer, Sr. Director of HPC Marketing

“Collectives Offloads for Large Scale Systems” – Thursday, June 3, 11:40AM – 12:20PM

Speaking: Gilad Shainer, Mellanox Technologies; Prof. Dr. Richard Graham, Oak Ridge National Laboratory

“RoCE – New Concept of RDMA over Ethernet” – Thursday, June 3, 12:20PM – 1:00PM

Speaking: Gilad Shainer, Sr. Director of HPC Marketing and Bill Lee, Sr. Product Marketing Manager

GPU-Direct Technology – Accelerating GPU based Systems

The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, has made graphics accelerators a compelling platform for computationally demanding tasks in a wide variety of application domains. Due to the great computational power of the GPU, the GPGPU method has proven valuable in various areas of science and technology.

GPU based clusters are being used to perform compute intensive tasks, like finite element computations, Computational Fluids Dynamics, Monte-Carlo simulations etc. Several of the world leading supercomputers are using GPUs in order to achieve the desired performance. Since the GPUs provide high core count and floating point operations capability, a high-speed networking such as InfiniBand is required to connect between the GPU platforms, in order to provide the needed throughput and the lowest latency for the GPU to GPU communications.

While GPUs have been shown to provide worthwhile performance acceleration yielding benefits to both price/performance and power/performance, several areas of GPU based clusters could be improved in order to provide higher performance and efficiency. One of the main performance issues with deploying clusters consisting of multi-GPU nodes involves the interaction between the GPUs, or the GPU to GPU communication model. Prior to the GPU-Direct technology, any communication between GPUs had to involve the host CPU and required buffer copy. The GPU communication model required the CPU to initiate and manage memory transfers between the GPUs and the InfiniBand network. Each GPU to GPU communication had to follow the following steps:

  1. The GPU writes data to a host memory dedicated to the GPU
  2. The host CPU copies the data from the GPU dedicated host memory to host memory available for the InfiniBand devices to use for RDMA communications
  3. The InfiniBand device reads data from that open area and send it to the remote node

Gilad Shainer
Senior Director of HPC and Technical Marketing

InfiniBand Leads the Russian Top50 Supercomputers List; Connects 74 Percent, Including Seven of the Top10 Supercomputers

Announced last week, the Russia TOP50 lists the fastest computers in Russia ranked according to Linpack benchmark results.  This list provides an important tool for tracking usage trends in high-performance computing in Russia.

Mellanox 40Gb/s InfiniBand adapters and switches enable the fastest supercomputer on the Russian Top50 Supercomputer list with peak performance of 414 teraflops. More importantly, it is clear that InfiniBand is dominating the list as the most used interconnect solution, connecting 37 systems, including the top three systems and seven of the Top10. According to the Linpack benchmark, InfiniBand’s high system efficiency and utilization allow users to maximize their return-on-investment for their high-performance computing server and storage infrastructure by demonstrating up to 92 percent efficiency. Nearly three quarters of the list, represented by leading research laboratories, universities, industrial companies and banks in Russia, rely on industry-leading InfiniBand solutions to provide the highest in bandwidth, efficiency, scalability, and application performance.

Highlights of InfiniBand usage on the April 2009 Russia TOP50 list include:

  • Mellanox InfiniBand connects 74 percent of the Top50 list, including seven of the Top10 most prestigious positions (#1, #2, #3, #6, #8, #9 and #10)
  • Mellanox InfiniBand provides world-leading system utilization, up to 92 percent efficiency as measured by the Linpack benchmark
  • The list showed a sharp increase in the aggregated performance – the total peak performance of the list exceeded 1PFlops to reach 1152.9TFlops, an increase of 120 percent compared to the September 2009 list – highlighting the  increasing demand for higher performance
  • Ethernet connects only 14 percent of the list (seven systems); and there were no 10GigE clusters
  • Proprietary clustering interconnects declined 40 percent to connect only three systems on the list

I look forward to seeing the results of the Top500 in June at the International Supercomputing Conference.  I will be attending the conference, and I look forward to seeing all of our HPC friends in Germany.

Brian Sparks
Sr. Director of Marketing Communications

Partners Healthcare Cuts Latency of Cloud-based Storage Solution Using Mellanox InfiniBand Technology

Interesting article just came out from Dave Raffo at SearchStorage.com. I have a quick summary below but you should certainly read the full article here: “Health care system rolls its own data storage ‘cloud’ for researchers.”

Partners HealthCare, a non-profit organization founded in 1994 by Brigham and Women’s Hospital and Massachusetts General Hospital, is an integrated health care system that offers patients a continuum of coordinated high-quality care.

Over the past few years, ever-increasing advances in the resolution and accuracy of medical devices and instrumentation technologies have led to an explosion of data in biomedical research. Partners recognized early on that a Cloud-based research compute and storage infrastructure could be a compelling alternative for their researchers. Not only would it enable them to distribute costs and provide storage services on demand, but it would save on IT management time that was spent fixing all the independent research computers distributed across the Partners network.

Initially, Partners Healthcare chose Ethernet as the network transport technology. As demand grew the solution began hitting significant performance bottlenecks, particularly during read/write of 100’s of thousands of small files. The issue was found to lie with the interconnect—Ethernet created problems due to its high natural latency. In order to provide a scalable low latency solution, Partners Healthcare turned to InfiniBand. With InfiniBand on the storage back end, Partners experienced roughly two orders of magnitude faster read times. “One user had over 1,000 files, but only took up 100 gigs or so,”said Brent Richter corporate manager for enterprise research infrastructure and services, Partners HealthCare System.”Doing that with Ethernet would take about 40 minutes just to list that directory. With InfiniBand, we reduced that to about a minute.”

Also, Partners chose InfiniBand over 10-Gigabit Ethernet because InfiniBand is a lower latency protocol. “InfiniBand was price competitive and has lower latency than 10-Gig Ethernet,” he said.

Richter said the final price tag came to about $1 per gigabyte.

By integrating Mellanox InfiniBand into the storage solution, Partners Healthcare was able to reduce latency close to zero and increase its performance, providing their customers with faster response and higher capacity.

Till next time,

Brian Sparks

Sr. Director, Marketing Communication