Tag Archives: High Performance Computing (HPC)

Mellanox Results are the Best on TopCrunch

The HPC Advisory Council published a best practices paper showing record application performance for LS-DYNA® Automotive Crash Simulation, one of the automotive industry’s most computational and network intensive applications for automotive design and safety.  The paper can be downloaded here:  HPC Advisory Council : LS-Dyna Performance Benchmark and Profiling.

 

The LS-DYNA benchmarks were tested on a Dell™ PowerEdge R720 based-cluster comprised of 32 nodes and with networking provided by Mellanox Connect-IB™ 56Gb/s InfiniBand adapters and switch.  The results demonstrate that the combined solution delivers world-leading performance versus any given system at these sizes, or versus larger core count system based on Ethernet or proprietary interconnect solution based supercomputers.

 

The TopCrunch project is used to track the aggregate performance trends of high performance computer systems and engineering software.  Rather than using a synthetic benchmark, actual engineering software applications are used with real datasets and run on high performance computer systems.

 

TopCrunch.png

Scot Schultz
Author: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Follow him on Twitter: @ScotSchultz.

 

InfiniBand Enables the Most Powerful Cloud: Windows Azure

windows_Azure_logo12Windows Azure continues to be the leader in High-Performance Computing Cloud services. Delivering a HPC solution built on top of Windows Server technology and Microsoft HPC Pack, Windows Azure offers the performance and scalability of a world-class supercomputing center to everyone, on demand, in the cloud.

 

Customers can now run compute-intensive workloads such as parallel Message Passing Interface (MPI) applications with HPC Pack in Windows Azure. By choosing compute intensive instances such as A8 and A9 for the cloud compute resources, customers can deploy these compute resources on demand in Windows Azure in a “burst to the cloud” configuration, and take advantage of InfiniBand interconnect technology with low-latency and high-throughput, including Remote Direct Memory Access (RDMA) technology for maximum efficiency. The new high performance A8 and A9 compute instances also provide customers with ample memory and the latest CPU technology.

 

The new Windows Azure services can burst and scale on-demand, deploy Virtual Machines and Cloud Services when users require them.  Learn more about Azure new services: http://www.windowsazure.com/en-us/solutions/big-compute/

eli karpilovski
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel. Follow him on Twitter.

Process Affinity: Hop on the Bus, Gus!

This is an excerpt of a post published today on the Cisco HPC Networking blog by Joshua Ladd, Mellanox:

At some point in the process of pondering this blog post I noticed that my subconscious had, much to my annoyance, registered a snippet of the chorus to Paul Simon’s timeless classic “50 Ways to Leave Your Lover” with my brain’s internal progress thread. Seemingly, endlessly repeating, billions of times over (well, at least ten times over) the catchy hook that offers one, of presumably 50, possible ways to leave one’s lover – “Hop on the bus, Gus.” Assuming Gus does indeed wish to extricate himself from a passionate predicament, this seems a reasonable suggestion. But, supposing Gus has a really jilted lover; his response to Mr. Simon’s exhortation might be “Just how many hops to that damn bus, Paul?”

HPC practitioners may find themselves asking a similar question, though in a somewhat less contentious context (pun intended.) Given the complexity of modern HPC systems with their increasingly stratified memory subsystems and myriad ways of interconnecting memory, networking, computing, and storage components such as NUMA nodes, computational accelerators, host channel adapters, NICs, VICs, JBODs, Target Channel Adapters, etc., reasoning about process placement has become a much more complex task with much larger performance implications between the “best” and the “worst” placement policies. To compound this complexity, the “best” and “worse” placement necessarily depends upon the specific application instance and its communication and I/O pattern. Indeed, an in-depth discussion on Open MPI’s sophisticated process affinity system is far beyond the scope of this humble blog post and I refer the interested reader to the deep dive talk Jeff Squyres (Cisco) gave at Euro MPI on this topic.

In this posting I’ll only consider the problem framed by Gus’ hypothetical query; How can one map MPI processes as close to an I/O device as possible thereby minimizing data movement or ‘hops’ through the intranode interconnect for those processes? This is a very reasonable request but the ability to automate this process has remained mostly absent in modern HPC middleware. Fortunately, powerful tools such as “hwloc” are available to help us with just such a task. Hwloc usually manipulates processing units and memory, but it can also discover I/O devices and report their locality as well. In simplest terms, this can be leveraged to place I/O intensive applications on cores near the I/O devices they use. Whereas Gus probably didn’t have the luxury to choose his locality so as to minimize the number of hops necessary to get on his bus, Open MPI, with the help of hwloc, now provides a mechanism for mapping MPI processes to NUMA nodes “closest” to an I/O device.

Read the full text of the blog here.

Joshua Ladd is an Open MPI developer & HPC algorithms engineer at Mellanox Technologies.  His primary interests reside in algorithm design and development for extreme-scale high performance computing systems. Prior to joining Mellanox Technologies, Josh was a staff research scientist at the Oak Ridge National Lab where he was engaged in R&D on high-performance communication middleware.  Josh holds a B.S., M.S., and Ph.D. all in applied mathematics.

 

Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions

High-performance simulations require the most efficient compute platforms. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores and their utilization factor and the interconnect performance, efficiency, and scalability. Efficient high-performance computing systems require high-bandwidth, low-latency connections between thousands of multi-processor nodes, as well as high-speed storage systems.

Mellanox has released “Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions”.  This guide describes how to design, build, and test a high performance compute (HPC) cluster using Mellanox® InfiniBand interconnect covering the installation and setup of the infrastructure including:

  • HPC cluster design
  • Installation and configuration of the Mellanox Interconnect components
  • Cluster configuration and performance testing

 

 Scot Schlultz Author: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining the Mellanox team in March 2013 as Director of HPC and Technical Computing, Schultz is 25-year veteran of the computing industry. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Scot currently maintains his role as Director of Educational Outreach, founding member of the HPC Advisory Council and of various other industry organizations.

Why I left HP after 19 years to join ProfitBricks

On 02.12.13, In Cloud Computing, by

Pete Johnson, new Platform Evangelist

Woz once said, “I thought I’d be an HPer for life.” While I don’t usually claim to have a whole lot in common with the man who designed the first computer I ever saw (an Apple II, summer ’78), in this instance it’s true. As it turns out, we were both wrong.

Pete Johnson, new Platform Evangelist for ProfitBricks

I stayed at HP as long as I did for lots of reasons. Business model diversity is one:  over the last two decades, I was lucky enough to be a front line coder, a tech lead, a project manager, and an enterprise architect while working on web sites for enterprise support, consumer ecommerce sales, enterprise online sales, all forms of marketing, and even post-sales printing press supplies reordering.   Most recently I was employee #37 for HP’s new public cloud offering where I performed a lot of roles including project management of web development teams, customer facing demonstrations at trade shows, and sales pitches for Fortune 500 CIOs.  But I also remained at HP because of the culture and values that came straight from Bill Hewlett and Dave Packard, which my early mentors instilled in me. You can still find those values there today if you look hard enough, and if anybody gets that, Meg Whitman does.

Why leave HP for ProfitBricks then?

So if I still have such a rosy view of HP, despite recent bumpiness, why did I leave to become the Platform Evangelist for ProfitBricks?

Three reasons:

  1. InfiniBand
  2. InfiniBand
  3. InfiniBand

If you are anything like the sample of computer industry veterans I told about my move last week, you just said, “What the heck is InfiniBand?” Let me explain what it is and why it is poised to fundamentally change the cloud computing.

Ethernet is the dominant network technology used in data centers today. Originally created during the Carter administration, it uses a hierarchical structure of LAN segments, which ultimately means that packets have exactly one path to traverse when moving from point A to point B anywhere in the network. InfiniBand, which is a popular 21st century technology in the supercomputing and high-performance computing (HPC) communities, uses a grid or mesh system that gives packets multiple paths from point A to point B. This key difference, among other nuances, gives InfiniBand a top speed of 80 Gbits/sec, resulting in a speed that is 80x faster than Amazon’s AWS 1Gbit/sec standard Ethernet connections.

What’s the big deal about InfiniBand?

“So what?” you may be thinking. “A faster cloud network is nice, but it doesn’t seem like THAT big a deal.”

Actually, it is a VERY big deal when you stop and think about how a cloud computing provider can take advantage of a network like this.

As founder and CMO Andreas Gauger put it to me during the interview process, virtualization is a game of Tetris in which you are trying to fit various sizes of Virtual Machines on top of physical hardware to maximize utilization. This is particularly critical for a public cloud provider. With InfiniBand, Profit Bricks can rearrange the pieces, and at 80 Gbits/sec, our hyper-visor can move a VM from one physical machine to another without the VM ever knowing. This helps us maximize the physical hardware and keep prices competitive, but it also means two other things for our customers:

  • You can provision any combination of CPU cores and RAM you want, up to and including the size of the full physical hardware we use
  • You can change the number of CPU cores or amount of RAM on-the-fly, live, without rebooting the VM

In a world where other public cloud providers force you into cookie cutter VM sizes in an attempt to simplify the game of Tetris for themselves, the first feature is obviously differentiating. But when most people hear the second one, their reaction is that it can’t possibly be true — it must be a lie. You can’t change virtual hardware on a VM without rebooting it, can you?

No way you can change CPU or RAM without rebooting a VM!

Do you suppose I’d check that out before leaving the only employer I’ve ever known in my adult life?

I spun up a VM, installed Apache, launched a load test from my desktop against the web server I just created, changed both the CPU Cores and RAM on the server instance, confirmed the change at the VM command line, and allowed the load test to end.  You know what the load test log showed?

Number of errors: 0.

The Apache web server never went down, despite the virtual hardware change, and handled HTTP requests every 40 milliseconds. I never even lost my remote login session. Whoa.

But wait, there’s more (and more to come)

Throw in the fact that the ProfitBricks block storage platform takes advantage of InfiBand to not only provide RAID 10 redundancy, but RAID 10 mirrored across two availability zones, and I was completely sold.  I realized that ProfitBricks founder, CTO, and CEO Achim Weiss took the data center efficiency knowledge that gave 1&1 a tremendous price advantage and combined it with supercomputing technology to create a cloud computing game-changer that his engineering team is just beginning to tap into. I can’t wait to see what they do with object storage, databases, and everything else that you’d expect from a fully IaaS offering. I had to be a part of that.

Simply put: ProfitBricks uses InfiniBand to enable Cloud Computing 2.0.

And that’s why, after 19 years, I left HP.