When it comes to advanced scientific and computational research in Australia, the leading organization is the National Computational Infrastructure (NCI). NCI was tasked to form a national research cloud, as part of a government effort to connect eight geographically distinct Australian universities and research institutions into a single national cloud system.
NCI decided to establish a high-performance cloud, based on Mellanox 56Gb/s Ethernet solutions. NCI, home to the Southern Hemisphere’s most powerful supercomputer, is hosted by the Australian National University and supported by three government agencies: Geoscience Australia, the Bureau of Meteorology, and the Commonwealth Scientific and Industrial Research Organisation (CSIRO).
Network and Link Layer Innovation: Lossless Networks
In a previous post, I discussed that innovations are required to take advantage of 100Gb/s at every layer of the communications protocol stack networks – starting off with the need for RDMA at the transport layer. So now let’s look at the requirements at the next two layers of the protocol stack. It turns out that RDMA transport requires innovation at the Network and Link layers in order to provide a lossless infrastructure.
‘Lossless’ in this context does not mean that the network can never lose a packet, as some level of noise and data corruption is unavoidable. Rather by ‘lossless’ we mean a network that is designed such that it avoids intentional, systematic packet loss as a means of signaling congestion. That is packet loss is the exception rather than the rule.
Lossless networks can be achieved by using priority flow control at the link layer which allows packets to be forwarded only if there is buffer space available in the receiving device. In this way buffer overflow and packet loss is avoided and the network becomes lossless.
In the Ethernet world, this is standardized as 802.1 QBB Priority Flow Control (PFC) and is equivalent to putting stop lights at each intersection. A packet on a given priority class can only be forwarded when the light is green.
The rapid pace of change in data and business requirements is the biggest challenge when deploying a large scale cloud. It is no longer acceptable to spend years designing infrastructure and developing applications capable to cope with data and users at scale. Applications need to be developed in a much more agile manner, but in such a way that allows dynamic reallocation of infrastructure to meet changing requirements.
Choosing an architecture that can scale is critical. Traditional “scale-up” technologies are too expensive and can ultimately limit growth as data volumes grow. Trying to accommodate data growth without proper architectural design, results in un-needed infrastructure complexity and cost.
The most challenging task for the cloud operator in a modern cloud data center supporting thousands or even hundreds-of-thousands of hosts is scaling and automating network services. Fortunately, server virtualization has enabled automation of routine tasks – reducing the cost and time required to deploy a new application from weeks to minutes. Yet, reconfiguring the network for a new or migrated virtual workload can take days and cost thousands of dollars.
To solve these problems, you need to think differently about your data center strategy. Here are three technology innovations that will help data center architects design a more efficient and cost-effective cloud:
1. Overlay Networks
Overlay network technologies such as VXLAN and NVGRE, make the network as agile and dynamic as other parts of the cloud infrastructure. These technologies enable automated network segment provisioning for cloud workloads, resulting in a dramatic increase in cloud resource utilization.
Overlay networks provide for ultimate network flexibility and scalability and the possibility to:
Combine workloads within pods
Move workloads across L2 domains and L3 boundaries easily and seamlessly
Integrate advanced firewall appliances and network security platform seamlessly
As data continues to grow exponentially storing today’s data volumes in an efficient way is a challenge. Many traditional storage solutions neither scale-out nor make it feasible from Capex and Opex perspective, to deploy Peta-Byte or Exa-Byte data stores.
In this newly published whitepaper, we summarize the installation and performance benchmarks of a Ceph storage solution. Ceph is a massively scalable, open source, software-defined storage solution, which uniquely provides object, block and file system services with a single, unified Ceph storage cluster. The testing emphasizes the careful network architecture design necessary to handle users’ data throughput and transaction requirements.
This past week in Atlanta, I got the chance to attend the sessions, presented and exhibited at the OpenStack Summit. The Summit was attended by over 4,500 registered participants. Today there are more users than ever! More than 200 companies have joined the project, and the main contributors of current OpenStack release are Red Hat, HP and IBM. The OpenStack Foundation has posted a recap video showing some highlights:
Some themes emerged during the summit. The new concept of bigusers becoming major contributors is really taking off. Big users are becoming major contributors to the project because it means they can move faster as a company. These big users include large banks, manufacturing, retailers, government agencies, entertainment and everything between. Instead of spending time trying to convince vendors to add features, these large organizations have realized that they can work with the OpenStack community directly to add those features and move faster as a business as a result.
People often ask me why Mellanox is interested in storage, since we make high-speed InfiniBand and Ethernet infrastructure, but don’t sell disks or file systems. It is important to understand the four biggest changes going on in storage today: Flash, Scale-Out, Appliances, and Cloud/Big Data. Each of these really deserves its own blog but it’s always good to start with an overview.
Flash is a hot topic, with IDC forecasting it will consume 17% of enterprise storage spending within three years. It’s 10x to 1000x faster than traditional hard disk drives (HDDs) with both higher throughput and lower latency. It can be deployed in storage arrays or in the servers. If in the storage, you need faster server-to-storage connections. If in the servers, you need faster server-to-server connections. Either way, traditional Fibre Channel and iSCSI are not fast enough to keep up. Even though Flash is cheaper than HDDs on a cost/performance basis, it’s still 5x to 10x more expensive on a cost/capacity basis. Customers want to get the most out of their Flash and not “waste” its higher performance on a slow network.
Flash can be 10x faster in throughput, 300-4000x faster in IOPS per GB (slide courtesy of EMC Corporation)
Windows Azure continues to be the leader in High-Performance Computing Cloud services. Delivering a HPC solution built on top of Windows Server technology and Microsoft HPC Pack, Windows Azure offers the performance and scalability of a world-class supercomputing center to everyone, on demand, in the cloud.
Customers can now run compute-intensive workloads such as parallel Message Passing Interface (MPI) applications with HPC Pack in Windows Azure. By choosing compute intensive instances such as A8 and A9 for the cloud compute resources, customers can deploy these compute resources on demand in Windows Azure in a “burst to the cloud” configuration, and take advantage of InfiniBand interconnect technology with low-latency and high-throughput, including Remote Direct Memory Access (RDMA) technology for maximum efficiency. The new high performance A8 and A9 compute instances also provide customers with ample memory and the latest CPU technology.
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel. Follow him on Twitter.
Cloud computing was developed specifically to overcome issues of localization and limitations of power and physical space. Yet many data center facilities are in danger of running out of power, cooling, or physical space.
Mellanox offers an alternative and cost-efficient solution. Mellanox’s new MetroX® long-haul switch system makes it possible to move from the paradigm of multiple, disconnected data centers to a single multi-point meshed mega-cloud. In other words, remote data center sites can now be localized through long-haul connectivity, providing benefits such as faster compute, higher volume data transfer, and improved business continuity. MetroX provides the ability for more applications and more cloud users, leading to faster product development, quicker backup, and more immediate disaster recovery.
The more physical data centers you join using MetroX, the more you scale your company’s cloud into a mega-cloud. You can continue to scale your cloud by adding data centers at opportune moments and places, where real estate is inexpensive and power is at its lowest rates, without concern for distance from existing data centers and without fear that there will be a degradation of performance.
Moreover, you can take multiple distinct clouds, whether private or public, and use MetroX to combine them into a single mega-cloud. This enables you to scale your cloud offering without adding significant infrastructure, and it enables your cloud users to access more applications and to conduct more wide-ranging research while maintaining the same level of performance.
Mellanox is a CloudNFV integration partner providing ConnectX-3 and ConnectX-3 PRO 10/40GbE NIC on Dell Servers
“The CloudNFV team will be starting PoC execution in mid-January, reporting on our results beginning of February, and contributing four major documents to the ISG’s process through the first half of 2014.” said Tom Nolle, President of CIMI Corporation, Chief Architect of CloudNFV in his recent related blog.and enabling active high performance data center. Telefonica and Sprint have agreed to sponsor
the CloudNFV PoC.
“We’re already planning additional PoCs, some focusing on specific areas and developed by our members and some advancing the boundaries of NFV into the public and private cloud and into the world of pan-provider services and global telecommunications.”
Mellanox server and storage interconnect enable telecom data plane virtual network functions with near bare metal server performance in OpenStack Cloud environment through integration to NFV Orchestration and SDN platforms.
Author: As a Director of Business Development at Mellanox, Eran Bello handles the business, solutions and product development and strategy for the growing Telecom and Security markets. Prior to joining Mellanox, Eran was Director of Sales and Business Development at Anobit Technologies where he was responsible for the development of the ecosystem for Anobit new Enterprise SSD business as well as portfolio introduction and business engagements with key Server OEMs, Storage Solution providers and mega datacenters. Earlier on Eran was VP of Marketing and Sales for North and Central America at Runcom Technologies, the first company to deliver Mobile WiMAX/4G End to End solution and was a member of the WiMAX/4G Forum.
Mellanox’s Ethernet and InfiniBand interconnects enable and enhance world-leading cloud infrastructures around the globe. Utilizing Mellanox’s fast server and storage interconnect solutions, these cloud vendors maximized their cloud efficiency and reduced their cost-per-application.
Mellanox is now working with a variety of incubators, accelerators, co-working spaces and venture capitalists to introduce these cloud vendors that are based on Mellanox interconnect cloud solution to new evolving startup companies. These new companies can enjoy best performance with the added benefit of reduced cost, as they advance application development. In this post, we will discuss the advantages of using Mellanox based clouds.
RDMA (Remote Direct Memory Access) is a critical element in building the most scalable and cost-effective cloud environments and to achieve the highest return-on-investment. For example, Microsoft Azure’s InfiniBand based cloud, as listed on the world’s top performance capable systems (TOP500), demonstrated 33% lower application cost compared to other clouds on the same list.
Mellanox’s InfiniBand and RoCE (RDMA over Converged Ethernet) cloud solutions deliver world-leading Ethernet based interconnect density, compute and storage. Mellanox’s Virtual Protocol Interconnect (VPI) technology incorporates both InfiniBand and Ethernet into the same solution to provide interconnect flexibility for cloud providers.
56Gb/s per port with RDMA
2us for VM to VM connectivity
3.5x faster VM migration
6x faster storage access
Cost Effective Storage
Higher storage density with RDMA
Utilization of existing disk bays
Higher Infrastructure Efficiency
Support more VMs per server
Offload hypervisor CPU
I/O consolidation (one wire)
Don’t waste resources worried about bringing up dedicated cloud infrastructures. Instead, keep your developers focused on developing applications that are strategic to your business. By choosing a RDMA-based cloud from one of our partners, you can be rest assured that you will have the most efficient, scalable, and cost-effective cloud platform available.
Author: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel.