InfiniBand is a network communications protocol that offers a switch-based fabric of point-to-point bi-directional serial links between processor nodes, as well as between processor nodes and input/output nodes, such as disks or storage. Every link has exactly one device connected to each end of the link, such that the characteristics controlling the transmission (sending and receiving) at each end are well defined and controlled.
InfiniBand creates a private, protected channel directly between the nodes via switches, and facilitates data and message movement without CPU involvement with Remote Direct Memory Access (RDMA) and Send/Receive offloads that are managed and performed by InfiniBand adapters. The adapters are connected on one end to the CPU over a PCI Express interface and to the InfiniBand subnet through InfiniBand network ports on the other. This provides distinct advantages over other network communications protocols, including higher bandwidth, lower latency, and enhanced scalability.
Hadoop has become a leading programming framework in the big data space. Organizations are replacing several traditional architectures with Hadoop and use it as a storage, data base, business intelligence and data warehouse solution. Enabling a single file system for Hadoop and other programming frameworks benefits users who need dynamic scalability of compute and or storage capabilities.
This week is EMC World, a huge event with tens of thousands of customers, partners, resellers and EMC employees talking about cloud, storage, and virtualization. EMC sells many storage solutions but most of the excitement and recent growth (per the latest EMC earnings announcement) are about scale-out storage, including EMC’s Isilon, XtremIO, and ScaleIO solutions.
As mentioned in my blog on the four big changes in storage, traditional scale-out storage connects many storage controllers together, while the new scale-out server storage links the storage on many servers. In both designs the disk or flash on all the nodes in each node is viewed and managed as one large pool of storage. Instead of having to manually partition and assign workloads to different storage systems, workloads can be either shifted seamlessly from node to node (no downtime) or distributed across the nodes.
Clients connect to (scale-out storage) or run on (scale-out server storage) different nodes but must be able to access storage on other nodes as if it were local. If I’m connecting to node A, I need rapid access to the storage on node A, B, C, D, and all the other nodes in the cluster. The system may also migrate data from one node to another, and rapidly exchange metadata or control traffic to keep track of who has which data.
In 1967, Gene Amdahl developed a formula that calculates the overall efficiency of a computer system by analyzing how much of the processing can be parallelized and the amount of parallelization that can be applied in the specific system.
At that time, deeper performance analysis had to take into consideration the efficiency of three main hardware resources that are needed for the computation job: the compute, memory and storage.
On the compute side, efficiency has to be measured by how many threads can run in parallel (which depends on the number of cores). The memory size affects the percentage of IO operation that needs to access the storage, which slows significantly the execution time and the overall system efficiency.
Those three hardware resources worked very well until the beginning of 2000. At that time, the computer industry started to use a grid-computing or as it known today, scale-out systems. The benefits of the scale-out architecture are clear. It enables building systems with higher performance, easy to scale with built-in high availability at a lower cost. However, the efficiency of those systems heavily depend on the performance and the resiliency of the interconnect solution.
The importance of the Interconnect became even bigger in the virtualized data center, where the amount of east west traffic continues to grow (as more parallel work is being done). So, if we want to use Amdahl’s law to analyze the efficiency of the scale-out system, in addition to the three traditional items (compute, memory & storage) the fourth item, which is the Interconnect, has to be considered as well.
People often ask me why Mellanox is interested in storage, since we make high-speed InfiniBand and Ethernet infrastructure, but don’t sell disks or file systems. It is important to understand the four biggest changes going on in storage today: Flash, Scale-Out, Appliances, and Cloud/Big Data. Each of these really deserves its own blog but it’s always good to start with an overview.
Flash is a hot topic, with IDC forecasting it will consume 17% of enterprise storage spending within three years. It’s 10x to 1000x faster than traditional hard disk drives (HDDs) with both higher throughput and lower latency. It can be deployed in storage arrays or in the servers. If in the storage, you need faster server-to-storage connections. If in the servers, you need faster server-to-server connections. Either way, traditional Fibre Channel and iSCSI are not fast enough to keep up. Even though Flash is cheaper than HDDs on a cost/performance basis, it’s still 5x to 10x more expensive on a cost/capacity basis. Customers want to get the most out of their Flash and not “waste” its higher performance on a slow network.
Flash can be 10x faster in throughput, 300-4000x faster in IOPS per GB (slide courtesy of EMC Corporation)
Last week (on December 9th, 2013), Symantec announced the GA of their clustered file storage (CFS). The new solution enables customers to access mission critical data and applications 400% faster than traditional Storage Area Networks (SANs) at 60% of the cost.
Faster is cheaper! Sounds like magic! How they are doing it?
Try to understand the “magic”: It is important to understand the advantages that using SSD with high performance interconnect enable in the modern scale-out (or clustered) storage systems. Up to now, SAN-based storage has typically been used to increase performance and provide data availability for multiple applications and clustered systems. However, with the recent high-performance applications demand, SAN vendors are trying to add SSD into the storage array itself to provide higher bandwidth and lower latency response.
Since SSDs offer an incredibly high number of IOPS and bandwidth, it is important to use the right interconnect technology and to avoid bottlenecks associated with access to storage. Old fabric, like Fibre Channel (FC) cannot cope with faster pipe demands, as 8Gb/s (or even 16Gb/s) bandwidth performance is not good enough to satisfy the applications requirements. While 40Gb/s Ethernet may look like an alternative, InfiniBand (IB) currently supports up to 56Gb/s, with a roadmap to 100Gb/s in next year.
Mellanox’s Ethernet and InfiniBand interconnects enable and enhance world-leading cloud infrastructures around the globe. Utilizing Mellanox’s fast server and storage interconnect solutions, these cloud vendors maximized their cloud efficiency and reduced their cost-per-application.
Mellanox is now working with a variety of incubators, accelerators, co-working spaces and venture capitalists to introduce these cloud vendors that are based on Mellanox interconnect cloud solution to new evolving startup companies. These new companies can enjoy best performance with the added benefit of reduced cost, as they advance application development. In this post, we will discuss the advantages of using Mellanox based clouds.
RDMA (Remote Direct Memory Access) is a critical element in building the most scalable and cost-effective cloud environments and to achieve the highest return-on-investment. For example, Microsoft Azure’s InfiniBand based cloud, as listed on the world’s top performance capable systems (TOP500), demonstrated 33% lower application cost compared to other clouds on the same list.
Mellanox’s InfiniBand and RoCE (RDMA over Converged Ethernet) cloud solutions deliver world-leading Ethernet based interconnect density, compute and storage. Mellanox’s Virtual Protocol Interconnect (VPI) technology incorporates both InfiniBand and Ethernet into the same solution to provide interconnect flexibility for cloud providers.
56Gb/s per port with RDMA
2us for VM to VM connectivity
3.5x faster VM migration
6x faster storage access
Cost Effective Storage
Higher storage density with RDMA
Utilization of existing disk bays
Higher Infrastructure Efficiency
Support more VMs per server
Offload hypervisor CPU
I/O consolidation (one wire)
Don’t waste resources worried about bringing up dedicated cloud infrastructures. Instead, keep your developers focused on developing applications that are strategic to your business. By choosing a RDMA-based cloud from one of our partners, you can be rest assured that you will have the most efficient, scalable, and cost-effective cloud platform available.
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel.
High-performance simulations require the most efficient compute platforms. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores and their utilization factor and the interconnect performance, efficiency, and scalability. Efficient high-performance computing systems require high-bandwidth, low-latency connections between thousands of multi-processor nodes, as well as high-speed storage systems.
Mellanox has released “Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions”. This guide describes how to design, build, and test a high performance compute (HPC) cluster using Mellanox® InfiniBand interconnect covering the installation and setup of the infrastructure including:
HPC cluster design
Installation and configuration of the Mellanox Interconnect components
Cluster configuration and performance testing
Author: Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining the Mellanox team in March 2013 as Director of HPC and Technical Computing, Schultz is 25-year veteran of the computing industry. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles, most recently in strategic HPC technology ecosystem enablement. Scot was also instrumental with the growth and development of the Open Fabrics Alliance as co-chair of the board of directors. Scot currently maintains his role as Director of Educational Outreach, founding member of the HPC Advisory Council and of various other industry organizations.
High-performance scientific applications typically require the lowest possible latency in order to have the parallel processes be in sync as much as possible. In the past, this requirement drove the adoption of SMP machines, where the floating point elements (CPU, GPUs) were placed as much as possible on the same board. With the increased demands for higher compute capability, and lowering the cost of adoption for making large scale HPC more available, we have witnessed the increase of clustering as the preferred architecture for high-performance computing.
We introduce and explore some of the latest advancements in the areas of high speed networking and suggest new usage models that leverage the latest technologies that meet the desired requirements of today’s demanding applications. The recently launched Mellanox Connect-IB™ InfiniBand adapter introduced a novel high-performance and scalable architecture for high-performance clusters. The architecture was designed from the ground up to provide high performance and scalability for the largest supercomputers in the world, today and in the future.
The device includes a new network transport mechanism called Dynamically Connected Transport™ Service (DCT), which was invented to provide a Reliable Connection Transport mechanism — the service that provides many of InfiniBand’s advanced capabilities such as RDMA, large message sends, and low latency kernel bypass — at an unlimited cluster size. We will also discuss optimizations for MPI collectives communications, that are frequently used for processes synchronization and show how their performance is critical for scalable, high-performance applications.
Benchmarking is a term heard throughout the tech industry as a measure of success and pride in a particular solution’s ability to handle this or that workload. However, most benchmarks feature a simulated workload, and in reality, a deployed solution may perform much differently. This is especially true with databases, since the types of data and workloads can vary greatly.
StorageReview.com and MarkLogic recently bucked the benchmarking trend, developing a benchmark that tests storage systems against an actual NoSQL database instance. Testing is done in the StorageReview lab, and the first round focused heavily on host-side flash solutions. Not surprisingly, flash-accelerated solutions took the day, with the lowest overall latencies for all database operations, generally blowing non-flash solutions out of the water and showing that NoSQL database environments can benefit significantly from the addition of flash-accelerated systems.
In order to accurately test all of these flash solutions, the test environment had to be set up so that no other component would bottleneck the testing. As it’s often the interconnect between database, client and storage nodes that limits overall system performance, StorageReview plumbed the test setup with none other than Mellanox ultra low-latency, FDR 56Gb/s InfiniBand adapter cards and switches to ensure full flash performance realization and true apples-to-apples test results.