When it comes to advanced scientific and computational research in Australia, the leading organization is the National Computational Infrastructure (NCI). NCI was tasked to form a national research cloud, as part of a government effort to connect eight geographically distinct Australian universities and research institutions into a single national cloud system.
NCI decided to establish a high-performance cloud, based on Mellanox 56Gb/s Ethernet solutions. NCI, home to the Southern Hemisphere’s most powerful supercomputer, is hosted by the Australian National University and supported by three government agencies: Geoscience Australia, the Bureau of Meteorology, and the Commonwealth Scientific and Industrial Research Organisation (CSIRO).
During my undergraduate days at UC Berkeley in the 1980’s, I remember climbing through the attic of Cory Hall running 10Mbit/sec coaxial cables to professors’ offices. Man, that 10base2 coax was fast!! Here we are in 2014 right on the verge of 100Gbit/sec networks. Four orders of magnitude increase in bandwidth is no small engineering feat, and achieving 100Gb/s network communications requires innovation at every level of the seven layer OSI model.
To tell you the truth, I never really understood the top three layers of this OSI model: I prefer the TCP/IP model which collapses all of them into a single “Application” layer which makes more sense. Unfortunately, it also collapses the Link layer and the Physical layer and I actually don’t think this makes sense to combine these two. I like to build my own ‘hybrid’ model that collapses the top three layers into an Application layer but allows you to consider the Link and Physical layers separately.
It turns out that a tremendous amount of innovation is required in these bottom four layers to achieve effective 100Gb/s communications networks. The application layer needs to change as well to fully take advantage of 100Gb/s networks. For now we’ll focus on the bottom four layers. Continue reading →
Virtualization has already proven itself to be the best way to improve data center efficiency and to simplify management tasks. However, getting those benefits requires using the various services that the Hypervisor provides. This introduces delay and results in longer execution time, compared to running over a non-virtualized data center (native infrastructure). This drawback hasn’t been hidden from the eyes of the high-tech R&D community seeking ways to enjoy the advantages of virtualization with a minimal effect on performance.
One of the most popular solutions today to enable native performance is to use the SR-IOV (Single Root IO Virtualization) mechanism which bypasses the Hypervisor and enables a direct link between the VM to the IO adapter. However, although the VM gets the native performance, it loses all of the Hypervisor services. Important features like high availability (HA) or VM migration can’t be done easily. Using SR-IOV requires that the VM must have the specific NIC driver (that he communicates with) which results in more complicated management since IT managers can’t use the common driver that runs between the VM to the Hypervisor.
As virtualization becomes a standard technology, the industry continues to find ways to improve performance without losing benefits, and organizations have started to invest more in the deployment of RDMA enabled interconnects in virtualized data centers. In one my previous blogs, I discussed the proven deployment of RoCE (RDMA over Converged Ethernet) in Azure using SMB Direct (SMB 3.0 over RDMA) enabling faster access to storage.
Today’s data centers demand that the underlying interconnect provide the utmost bandwidth and extremely low latency. While high bandwidth is important, it is not worth much without low latency. Moving large amounts of data through a network can be achieved with TCP/IP, but only RDMA can produce the low latency that avoids costly transmission delays.
The speedy transfer of data is critical to it being used efficiently. Interconnect based on Remote Direct Memory Access (RDMA) offers the ideal option for boosting data center efficiency, reducing overall complexity, and increasing data delivery performance. Mellanox RDMA enables sub-microsecond latency and up to 56Gb/s bandwidth, translating to screamingly fast application performance, better storage and data center utilization, and simplified network management.
Big Data solutions such as Hadoop and NoSQL applications are no longer a sole game for Internet moguls. Today’s retail, transportation and entertainment corporations use Big Data practices such as Hadoop for data storage and data analytics.
IBM BigInsights makes Big Data deployments an easier task for the system architect. BigInsights with IBM’s GPFS-FPO file system support provides enterprise level Big Data solution, eliminating Single Point of Failure structures and increasing ingress and analytics performance.
The inherent RDMA support in IBM’s GPFS takes the performance aspect a notch higher. The testing conducted at Mellanox Big Data Lab with IBM BigInsights 2.1, GPFS-FPO and FDR 56Gbps InfiniBand showed an increased performance for write and read of 35% and 50 %, respectively, comparing to a vanilla HDFS deployment. On the analytics benchmarks, the system provided 35% throughput gain by enabling the RDMA feature.
In 1967, Gene Amdahl developed a formula that calculates the overall efficiency of a computer system by analyzing how much of the processing can be parallelized and the amount of parallelization that can be applied in the specific system.
At that time, deeper performance analysis had to take into consideration the efficiency of three main hardware resources that are needed for the computation job: the compute, memory and storage.
On the compute side, efficiency has to be measured by how many threads can run in parallel (which depends on the number of cores). The memory size affects the percentage of IO operation that needs to access the storage, which slows significantly the execution time and the overall system efficiency.
Those three hardware resources worked very well until the beginning of 2000. At that time, the computer industry started to use a grid-computing or as it known today, scale-out systems. The benefits of the scale-out architecture are clear. It enables building systems with higher performance, easy to scale with built-in high availability at a lower cost. However, the efficiency of those systems heavily depend on the performance and the resiliency of the interconnect solution.
The importance of the Interconnect became even bigger in the virtualized data center, where the amount of east west traffic continues to grow (as more parallel work is being done). So, if we want to use Amdahl’s law to analyze the efficiency of the scale-out system, in addition to the three traditional items (compute, memory & storage) the fourth item, which is the Interconnect, has to be considered as well.
Mellanox’s Ethernet and InfiniBand interconnects enable and enhance world-leading cloud infrastructures around the globe. Utilizing Mellanox’s fast server and storage interconnect solutions, these cloud vendors maximized their cloud efficiency and reduced their cost-per-application.
Mellanox is now working with a variety of incubators, accelerators, co-working spaces and venture capitalists to introduce these cloud vendors that are based on Mellanox interconnect cloud solution to new evolving startup companies. These new companies can enjoy best performance with the added benefit of reduced cost, as they advance application development. In this post, we will discuss the advantages of using Mellanox based clouds.
RDMA (Remote Direct Memory Access) is a critical element in building the most scalable and cost-effective cloud environments and to achieve the highest return-on-investment. For example, Microsoft Azure’s InfiniBand based cloud, as listed on the world’s top performance capable systems (TOP500), demonstrated 33% lower application cost compared to other clouds on the same list.
Mellanox’s InfiniBand and RoCE (RDMA over Converged Ethernet) cloud solutions deliver world-leading Ethernet based interconnect density, compute and storage. Mellanox’s Virtual Protocol Interconnect (VPI) technology incorporates both InfiniBand and Ethernet into the same solution to provide interconnect flexibility for cloud providers.
56Gb/s per port with RDMA
2us for VM to VM connectivity
3.5x faster VM migration
6x faster storage access
Cost Effective Storage
Higher storage density with RDMA
Utilization of existing disk bays
Higher Infrastructure Efficiency
Support more VMs per server
Offload hypervisor CPU
I/O consolidation (one wire)
Don’t waste resources worried about bringing up dedicated cloud infrastructures. Instead, keep your developers focused on developing applications that are strategic to your business. By choosing a RDMA-based cloud from one of our partners, you can be rest assured that you will have the most efficient, scalable, and cost-effective cloud platform available.
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel.
Hadoop MapReduce is the leading Big Data analytics framework. This framework enables data scientists to process data volumes and variety never processed before. The result from this data processing is new business creation and operation efficiency.
As MapReduce and Hadoop advance, more organizations try to use the frameworks in near real-time capabilities. Leveraging RDMA (Remote Direct Memory Access) capabilities for faster Hadoop MapReduce capabilities has proven to be a successful method.
In our presentation at Oracle Open World 2013, we show the advantages RDMA brings to enterprises deploying Hadoop and other Big Data applications:
On the analytics side, UDA (Unstructured Data Accelerator), doubles the computation power by offloading networking and buffer copying from the server’s CPU to the network controller. In addition, a novel shuffle and merge approach helped to achieve the needed performance acceleration. The UDA package is and open source package available here (https://code.google.com/p/uda-plugin/). The HDFS (Hadoop Distributed File System) layer is also getting its share of performance boost.
While the community continues to improve the feature, work conducted at Ohio State University brings the RDMA capabilities to the data ingress process of HDFS. Initial testing shows over 80% improvement in the data write path to the HDFS repository. The RDMA HDFS acceleration research and downloadable package is available from the Ohio State University website at: http://hadoop-rdma.cse.ohio-state.edu/
We are expecting more RDMA acceleration enablement to different Big Data frameworks in the future. If you have a good use case, we will be glad to discuss the need and help with the implementation.
Author: Eyal Gutkind is a Senior Manager, Enterprise Market Development at Mellanox Technologies focusing on Web 2.0 and Big Data applications. Eyal held several engineering and management roles at Mellanox Technologies over the last 11 years. Eyal Gutkind holds a BSc. degree in Electrical Engineering from Ben Gurion University in Israel and MBA from Fuqua School of Business at Duke University, North Carolina.
One of the barriers to adoption of blade server technology has been the reliance on a limited number of network switches available. Organizations requiring unique switching capabilities or extra bandwidth have had to rely on Top of Rack switches built by networking companies that have little or no presence in the server market. The result was a potential customer base of users who wanted to realize the benefits of blade server technology but were forced to remain with rack servers and switches due to a lack of alternative networking products. Here’s where Hewlett Packard has once again shown why they remain the leader in blade server technology by announcing a new blade switch that leaves the others in the dust.
Mellanox SX1018HP Ethernet Blade Switch
Working closely with our partner Mellanox, HP has just announced a new blade switch for the c-Class enclosure that is designed specifically for customers that demand performance and raw bandwidth. The Mellanox SX1018HP is built on the latest SwitchX ASIC technology and for the first time gives servers a direct path to 40Gb. In fact this switch can provide up to sixteen 40Gb server downlinks and up to eighteen 40Gb network uplinks for an amazing 1.3Tb/s of throughput. Now even the most demanding virtualized server applications can get the bandwidth they need. Financial service customers and especially those involved in High Frequency Trading look to squeeze every drop of latency out of their network. Again, the Mellanox SX1018HP excels, dropping port to port latency to an industry leading 230nS at 40Gb. There is no other blade switch currently available that can make that claim.
For customers currently running Infiniband networks, the appeal of being able to collapse their data requirements onto a single network has always been tempered by the lack of support for Remote Direct Memory Access (RDMA) on Ethernet networks. Again, HP and Mellanox lead the way in blade switches. The SX1018HP supports RDMA over Converged Ethernet (RoCE) allowing those RDMA tuned applications to work across both Infiniband and Ethernet networks. When coupled with the recently announced HP544M 40Gb Ethernet/FDR Infiniband adapter, customers can now support RDMA end to end on either network and begin the migration to a single Ethernet infrastructure. Finally, many customers already familiar with Mellanox IB switches provision and manage their network with Unified Fabric Manager (UFM). The SX1018HP can be managed and provisioned with this same tool, providing a seamless transition to the Ethernet word. Of course standard CLI and secure web browser management is also available.
Incorporating this switch along with the latest generation of HP blade servers and network adapters now gives any customer the same speed, performance and scalability that was previously limited to rack deployments using a hodgepodge of suppliers. Data center operations that cater to High Performance Cluster Computing (HPCC), Telecom, Cloud Hosting Services and Financial Services will find the HP blade server/Mellanox SX1018HP blade switch a compelling and unbeatable solution.
Click here for more information on the new Mellanox SX1018HP Ethernet Blade Switch.
As flash storage has become increasingly available at lower and lower prices, many organizations are leveraging flash’s low-latency features to boost application and storage performance in their data centers. Flash storage vendors claim their products can increase application performance by leaps and bounds, and a great many data center administrators have found that to be true. But what if your flash could do even more?
One of the main features of flash storage is its ability to drive massive amounts of data to the network with very low latencies. Data can be written to and retrieved from flash storage in a matter of microseconds at speeds exceeding several gigabytes per second, allowing applications to get the data they need and store their results in record time. Now, suppose you connect that ultra-fast storage to your compute infrastructure using 1GbE technology. A single 1GbE port can transfer data at around 120MB/s. For a flash-based system driving, say, 8GB/s of data, you’d need sixty-seven 1GbE ports to avoid bottlenecking your system. Most systems have only eight ports available, so using 1GbE would limit your lightning-fast flash to just under 1GB/s, an eighth of the performance you could be getting. That’s a bit like buying a Ferrari F12berlinetta (max speed: >211 mph) and committing to drive it only on residential streets (speed limit: 25 mph). Sure, you’d look cool, but racing neighborhood kids on bicycles isn’t really the point of a Ferrari, is it? Upgrade that 1GbE connection to 10GbE, and you can cover your full Flash bandwidth with seven ports, if your CPU can handle the increased TCP stack overhead and still perform application tasks. In terms of our vehicular analogy, you’re driving the Ferrari on the highway now, but you’re still stuck in third gear. So, how do you get that Ferrari to the Bonneville Salt Flats and really let loose?
Take one step further in your interconnect deployment and upgrade that 10GbE connection to a 40GbE with RDMA-over-Converged-Ethernet (RoCE) or 56Gb/s FDR InfiniBand connection. Two ports of either protocol will give you full bandwidth access to your flash system, and RDMA features mean ultra-low CPU overhead and increased overall efficiency. Your flash system will perform to its fullest potential, and your application performance will improve drastically. Think land-speed records, except in a data center.
So, if your flash-enhanced application performance isn’t quite what you expected, perhaps it’s your interconnect and not your flash system that’s underperforming.