One of the biggest catchphrases in modern science is Human Genome–the DNA coding that largely pre-determines who we are and many of our medical outcomes. By mapping and analyzing the structure of the human genetic code, scientists and doctors have already started to identify the causes of many diseases and to pinpoint effective treatments based on the specific genetic sequence of a given patient. With the advanced data that such analysis provides, doctors can offer more targeted strategies for potentially terminal patients at times when no other clinically relevant treatment options exist.
The University of Edinburgh’s entry into the ISC 2014 Student Cluster Competition, EPCC, has been awarded first place in the LINPACK test. The EPCC team harnessed Boston’s HPC cluster to smash the 10Tflop mark for the first time – shattering the previous record of 9.27Tflops set by students at ASC14 earlier this month. The team recorded a score of 10.14Tflops producing 3.38 Tflops/kW which would achieve a rank of #4 in the Green500, a list of the most energy efficient supercomputers in the world.
This achievement was made possible thanks to the provisioning of a high performance, liquid cooled GPU cluster by Boston. The system consisted on four 1U Supermicro servers, each comprising of two Intel® Xeon™ ‘Ivy Bridge’ processors and two NVIDIA® K40 Tesla GPUs, and Mellanox FDR 56Gb/s InfiniBand adapters, switches and cables.
As data continues to grow exponentially storing today’s data volumes in an efficient way is a challenge. Many traditional storage solutions neither scale-out nor make it feasible from Capex and Opex perspective, to deploy Peta-Byte or Exa-Byte data stores.
In this newly published whitepaper, we summarize the installation and performance benchmarks of a Ceph storage solution. Ceph is a massively scalable, open source, software-defined storage solution, which uniquely provides object, block and file system services with a single, unified Ceph storage cluster. The testing emphasizes the careful network architecture design necessary to handle users’ data throughput and transaction requirements.
Mellanox recently announced a collaboration with IBM to produce a tightly integrated server and storage solutions that incorporate our end-to-end FDR 56Gb/s InfiniBand and 10/40 Gigabit Ethernet interconnect solutions with IBM POWER CPUs. By combining IBM POWER CPUs with the world’s highest-performance interconnect solution will drive data at optimal rates, maximizing performance and efficiency for all types of applications and workloads, as well as enable dynamic storage solutions to allow multiple applications to efficiently share data repositories.
Advances in high-performance applications are enabling analysts, researchers, scientists and engineers to run more complex and detailed simulations and analyses in a bid to gather game-changing insights and deliver new products to market. This is placing greater demand on existing IT infrastructures, driving a need for instant access to resources – compute, storage, and network.
Companies are looking for faster and more efficient ways to drive business value from their applications and data. The combination of IBM processor technologies and Mellanox high-speed interconnect solutions can provide clients with an advanced and efficient foundation to achieve their goals.
Shout out to anyone who happens to attend the GPU Technology Conference 2014! This conference is touted as the world’s biggest and most important GPU developer conference. Follow all the social conversation around the event using the hashtag #GTC2014. The conference will be held next week, March 24-27,2014 at the San Jose McEnery Convention Center in San Jose, CA.
This is the fourth year I am attending this event and I will be hanging out at the “Ask the Expert Table” at the GTC. Feel free to swing by and chat about any of your burning questions you may have on GPUDirect RDMA with Mellanox InfiniBand!
New advances in Big Data applications are enabling analysts, researchers, scientists and engineers to run more complex and detailed simulations and analyses than ever before. These applications deliver game-changing insights, bring new products to market and place greater demand on existing IT infrastructures.
This ever-growing demand drives the need for instant access to resources – compute, storage, and network. Users are seeking cutting-edge technologies and tools to help them better capture, understand and leverage increasing volumes of data as well as build infrastructures that are energy-efficient and can easily scale as their business grow.
The HPC Advisory Council published a best practices paper showing record application performance for LS-DYNA® Automotive Crash Simulation, one of the automotive industry’s most computational and network intensive applications for automotive design and safety. The paper can be downloaded here: HPC Advisory Council : LS-Dyna Performance Benchmark and Profiling.
The LS-DYNA benchmarks were tested on a Dell™ PowerEdge R720 based-cluster comprised of 32 nodes and with networking provided by Mellanox Connect-IB™ 56Gb/s InfiniBand adapters and switch. The results demonstrate that the combined solution delivers world-leading performance versus any given system at these sizes, or versus larger core count system based on Ethernet or proprietary interconnect solution based supercomputers.
The TopCrunch project is used to track the aggregate performance trends of high performance computer systems and engineering software. Rather than using a synthetic benchmark, actual engineering software applications are used with real datasets and run on high performance computer systems.
Author: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Follow him on Twitter: @ScotSchultz.
Windows Azure continues to be the leader in High-Performance Computing Cloud services. Delivering a HPC solution built on top of Windows Server technology and Microsoft HPC Pack, Windows Azure offers the performance and scalability of a world-class supercomputing center to everyone, on demand, in the cloud.
Customers can now run compute-intensive workloads such as parallel Message Passing Interface (MPI) applications with HPC Pack in Windows Azure. By choosing compute intensive instances such as A8 and A9 for the cloud compute resources, customers can deploy these compute resources on demand in Windows Azure in a “burst to the cloud” configuration, and take advantage of InfiniBand interconnect technology with low-latency and high-throughput, including Remote Direct Memory Access (RDMA) technology for maximum efficiency. The new high performance A8 and A9 compute instances also provide customers with ample memory and the latest CPU technology.
The new Windows Azure services can burst and scale on-demand, deploy Virtual Machines and Cloud Services when users require them. Learn more about Azure new services: http://www.windowsazure.com/en-us/solutions/big-compute/
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel. Follow him on Twitter.
This is an excerpt of a post published today on the Cisco HPC Networking blog by Joshua Ladd, Mellanox:
At some point in the process of pondering this blog post I noticed that my subconscious had, much to my annoyance, registered a snippet of the chorus to Paul Simon’s timeless classic “50 Ways to Leave Your Lover” with my brain’s internal progress thread. Seemingly, endlessly repeating, billions of times over (well, at least ten times over) the catchy hook that offers one, of presumably 50, possible ways to leave one’s lover – “Hop on the bus, Gus.” Assuming Gus does indeed wish to extricate himself from a passionate predicament, this seems a reasonable suggestion. But, supposing Gus has a really jilted lover; his response to Mr. Simon’s exhortation might be “Just how many hops to that damn bus, Paul?”
HPC practitioners may find themselves asking a similar question, though in a somewhat less contentious context (pun intended.) Given the complexity of modern HPC systems with their increasingly stratified memory subsystems and myriad ways of interconnecting memory, networking, computing, and storage components such as NUMA nodes, computational accelerators, host channel adapters, NICs, VICs, JBODs, Target Channel Adapters, etc., reasoning about process placement has become a much more complex task with much larger performance implications between the “best” and the “worst” placement policies. To compound this complexity, the “best” and “worse” placement necessarily depends upon the specific application instance and its communication and I/O pattern. Indeed, an in-depth discussion on Open MPI’s sophisticated process affinity system is far beyond the scope of this humble blog post and I refer the interested reader to the deep dive talk Jeff Squyres (Cisco) gave at Euro MPI on this topic.
In this posting I’ll only consider the problem framed by Gus’ hypothetical query; How can one map MPI processes as close to an I/O device as possible thereby minimizing data movement or ‘hops’ through the intranode interconnect for those processes? This is a very reasonable request but the ability to automate this process has remained mostly absent in modern HPC middleware. Fortunately, powerful tools such as “hwloc” are available to help us with just such a task. Hwloc usually manipulates processing units and memory, but it can also discover I/O devices and report their locality as well. In simplest terms, this can be leveraged to place I/O intensive applications on cores near the I/O devices they use. Whereas Gus probably didn’t have the luxury to choose his locality so as to minimize the number of hops necessary to get on his bus, Open MPI, with the help of hwloc, now provides a mechanism for mapping MPI processes to NUMA nodes “closest” to an I/O device.
Read the full text of the blog here.
Joshua Ladd is an Open MPI developer & HPC algorithms engineer at Mellanox Technologies. His primary interests reside in algorithm design and development for extreme-scale high performance computing systems. Prior to joining Mellanox Technologies, Josh was a staff research scientist at the Oak Ridge National Lab where he was engaged in R&D on high-performance communication middleware. Josh holds a B.S., M.S., and Ph.D. all in applied mathematics.
We want thank everyone for joining us at SC13 in Denver, Colorado last month. We hope you had a chance to become more familiar with our end-to-end interconnect solutions for HPC.
Check out the videos of the presentations given during the Mellanox Evening Event, held on November 20, 2013 in the Sheraton Denver Downtown Hotel. The event was keynoted by Eyal Waldman, President and CEO of Mellanox Technologies: