Shout out to anyone who happens to attend the GPU Technology Conference 2014! This conference is touted as the world’s biggest and most important GPU developer conference. Follow all the social conversation around the event using the hashtag #GTC2014. The conference will be held next week, March 24-27,2014 at the San Jose McEnery Convention Center in San Jose, CA.
This is the fourth year I am attending this event and I will be hanging out at the “Ask the Expert Table” at the GTC. Feel free to swing by and chat about any of your burning questions you may have on GPUDirect RDMA with Mellanox InfiniBand!
Covered in previous blog posts (Part 1 and Part 2), the concept of the Virtual Modular Switch (VMS) is clearly an advantage for networks of medium to large scale. As we move into huge networks where multiple modular switches are needed, this advantage reduces to the point where it is a matter of personal preference whether to implement using VMS or multiple chassis.
When the odds are even, this preference can come down to a matter of cost of equipment, cost of operating the equipment, certain network KPIs that need to be met or any other parameter that the network facilitator will care about.
The Mellanox implementation of VMS is based on our own ASIC design known as SwitchX. It is used as the fabric element in each of our Ethernet (and InfiniBand) product line of switches. SwitchX carries 36 high speed interfaces of standard 40 GbE which when used in a non-blocking fat tree topology, allows 18 ports to be used for external interfaces and 18 ports to be used as internal interfaces towards the spine layer of the VMS fat tree. Having 36 ports on each of the spine elements allows as many as 36 leaf elements. The total number of external ports in a non-blocking two tier VMS is 36*18=648.
New advances in Big Data applications are enabling analysts, researchers, scientists and engineers to run more complex and detailed simulations and analyses than ever before. These applications deliver game-changing insights, bring new products to market and place greater demand on existing IT infrastructures.
This ever-growing demand drives the need for instant access to resources – compute, storage, and network. Users are seeking cutting-edge technologies and tools to help them better capture, understand and leverage increasing volumes of data as well as build infrastructures that are energy-efficient and can easily scale as their business grow.
It is no secret that recent market trends have forced the traditional desktop to go through a dramatic transformation. It’s also easy to predict that sooner, rather than later, the traditional way of seating and working in front of a desktop will disappear. Why is this happening? Desktops that led the digital revolution and ruled the digital world for more than 30 years are going to experience a sudden death. This reminds me of the way the dinosaurs disappeared. What is the “asteroid” that will destroy such a large and well established infrastructure? Can it be stopped?
The HPC Advisory Council published a best practices paper showing record application performance for LS-DYNA® Automotive Crash Simulation, one of the automotive industry’s most computational and network intensive applications for automotive design and safety. The paper can be downloaded here: HPC Advisory Council : LS-Dyna Performance Benchmark and Profiling.
The LS-DYNA benchmarks were tested on a Dell™ PowerEdge R720 based-cluster comprised of 32 nodes and with networking provided by Mellanox Connect-IB™ 56Gb/s InfiniBand adapters and switch. The results demonstrate that the combined solution delivers world-leading performance versus any given system at these sizes, or versus larger core count system based on Ethernet or proprietary interconnect solution based supercomputers.
The TopCrunch project is used to track the aggregate performance trends of high performance computer systems and engineering software. Rather than using a synthetic benchmark, actual engineering software applications are used with real datasets and run on high performance computer systems.
: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Follow him on Twitter: @ScotSchultz
Congratulations go out to Yarden Gerbi as she recently took home the silver medal in competition at the Judo Grand Prix, recently held in Dusseldorf, Germany. This competition brought together 370 athletes from 55 countries. Gerbi secured victories over competitors from Mongolia and Austria and moved on to the semi-finals. Gerbi is currently training in preparation for the 2016 Rio Olympic games.
If you search the internet for data center automation tools, you will come up with many options. You can easily find software tools that automate server provisioning, network equipment configuration or monitor the different elements. But you cannot find tools for automatic fabric configuration.
Fabrics become more popular these days. If traditional aggregation switching in data centers of Cloud providers, Web 2.0 providers, and large-scale enterprises has been based on modular switches, we now see them being replaced by fabrics – arrays of fixed, 1U switches. These fabrics increase the flexibility and efficiency in data center aggregation – lower cost of equipment, power reduction, better scalability and high resiliency.
Mellanox Virtual Modular Switch™ (VMS) is such a fabric, comprised of Mellanox 10, 40, and 56GbE fixed switches. It provides an optimized approach for aggregating racks. The VMS excels in its flexibility, power savings and performance. Based on Mellanox switches, the VMS leverages the unique advantages of the SwitchX-2, the highest performing 36-port 40GbE switching IC.
The Need for Automation
The scalability that the fabrics bring drives a change in the way they are configured. The legacy way to configure switches and routers is scripting – each device has its management interface, typically CLI, and when the right configuration script is applied to each switch, they interwork as a single fabric. However, this approach does not scale and one cannot configure big fabrics in mega data centers this way, since creating and maintaining scripts for many fixed switches can become a nightmare. So, fabric creation automation is required – a tool that can do it both automatically and fast, to allow short setup time.
People often ask me why Mellanox is interested in storage, since we make high-speed InfiniBand and Ethernet infrastructure, but don’t sell disks or file systems. It is important to understand the four biggest changes going on in storage today: Flash, Scale-Out, Appliances, and Cloud/Big Data. Each of these really deserves its own blog but it’s always good to start with an overview.
Flash is a hot topic, with IDC forecasting it will consume 17% of enterprise storage spending within three years. It’s 10x to 1000x faster than traditional hard disk drives (HDDs) with both higher throughput and lower latency. It can be deployed in storage arrays or in the servers. If in the storage, you need faster server-to-storage connections. If in the servers, you need faster server-to-server connections. Either way, traditional Fibre Channel and iSCSI are not fast enough to keep up. Even though Flash is cheaper than HDDs on a cost/performance basis, it’s still 5x to 10x more expensive on a cost/capacity basis. Customers want to get the most out of their Flash and not “waste” its higher performance on a slow network.
Flash can be 10x faster in throughput, 300-4000x faster in IOPS per GB (slide courtesy of EMC Corporation)
Windows Azure continues to be the leader in High-Performance Computing Cloud services. Delivering a HPC solution built on top of Windows Server technology and Microsoft HPC Pack, Windows Azure offers the performance and scalability of a world-class supercomputing center to everyone, on demand, in the cloud.
Customers can now run compute-intensive workloads such as parallel Message Passing Interface (MPI) applications with HPC Pack in Windows Azure. By choosing compute intensive instances such as A8 and A9 for the cloud compute resources, customers can deploy these compute resources on demand in Windows Azure in a “burst to the cloud” configuration, and take advantage of InfiniBand interconnect technology with low-latency and high-throughput, including Remote Direct Memory Access (RDMA) technology for maximum efficiency. The new high performance A8 and A9 compute instances also provide customers with ample memory and the latest CPU technology.
The new Windows Azure services can burst and scale on-demand, deploy Virtual Machines and Cloud Services when users require them. Learn more about Azure new services: http://www.windowsazure.com/en-us/solutions/big-compute/
: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel. Follow him on Twitter
Cloud computing was developed specifically to overcome issues of localization and limitations of power and physical space. Yet many data center facilities are in danger of running out of power, cooling, or physical space.
Mellanox offers an alternative and cost-efficient solution. Mellanox’s new MetroX® long-haul switch system makes it possible to move from the paradigm of multiple, disconnected data centers to a single multi-point meshed mega-cloud. In other words, remote data center sites can now be localized through long-haul connectivity, providing benefits such as faster compute, higher volume data transfer, and improved business continuity. MetroX provides the ability for more applications and more cloud users, leading to faster product development, quicker backup, and more immediate disaster recovery.
The more physical data centers you join using MetroX, the more you scale your company’s cloud into a mega-cloud. You can continue to scale your cloud by adding data centers at opportune moments and places, where real estate is inexpensive and power is at its lowest rates, without concern for distance from existing data centers and without fear that there will be a degradation of performance.
Moreover, you can take multiple distinct clouds, whether private or public, and use MetroX to combine them into a single mega-cloud. This enables you to scale your cloud offering without adding significant infrastructure, and it enables your cloud users to access more applications and to conduct more wide-ranging research while maintaining the same level of performance.