The rapid pace of change in data and business requirements is the biggest challenge when deploying a large scale cloud. It is no longer acceptable to spend years designing infrastructure and developing applications capable to cope with data and users at scale. Applications need to be developed in a much more agile manner, but in such a way that allows dynamic reallocation of infrastructure to meet changing requirements.
Choosing an architecture that can scale is critical. Traditional “scale-up” technologies are too expensive and can ultimately limit growth as data volumes grow. Trying to accommodate data growth without proper architectural design, results in un-needed infrastructure complexity and cost.
The most challenging task for the cloud operator in a modern cloud data center supporting thousands or even hundreds-of-thousands of hosts is scaling and automating network services. Fortunately, server virtualization has enabled automation of routine tasks – reducing the cost and time required to deploy a new application from weeks to minutes. Yet, reconfiguring the network for a new or migrated virtual workload can take days and cost thousands of dollars.
To solve these problems, you need to think differently about your data center strategy. Here are three technology innovations that will help data center architects design a more efficient and cost-effective cloud:
1. Overlay Networks
Overlay network technologies such as VXLAN and NVGRE, make the network as agile and dynamic as other parts of the cloud infrastructure. These technologies enable automated network segment provisioning for cloud workloads, resulting in a dramatic increase in cloud resource utilization.
Overlay networks provide for ultimate network flexibility and scalability and the possibility to:
- Combine workloads within pods
- Move workloads across L2 domains and L3 boundaries easily and seamlessly
- Integrate advanced firewall appliances and network security platform seamlessly
As data continues to grow exponentially storing today’s data volumes in an efficient way is a challenge. Many traditional storage solutions neither scale-out nor make it feasible from Capex and Opex perspective, to deploy Peta-Byte or Exa-Byte data stores.
In this newly published whitepaper, we summarize the installation and performance benchmarks of a Ceph storage solution. Ceph is a massively scalable, open source, software-defined storage solution, which uniquely provides object, block and file system services with a single, unified Ceph storage cluster. The testing emphasizes the careful network architecture design necessary to handle users’ data throughput and transaction requirements.
One of the most important value-add solutions that Mellanox provides to its customers and partners is Educational Services. We offer a variety of learning methods to our partners, customers and other technology leaders.
One of the most successful learning platforms to our customers is our open enrollment courses. These 3-4 day instructor led courses are available worldwide: the United Kingdom, Germany, France, Israel, Australia, China and in the US: New York, California, Massachusetts and Washington. Soon we will offer an “After hours, virtual format”, meaning the students will gain the benefit of a blended (remote instructor led along with online training) learning format, allowing participants flexibility to take the course and still not miss many working hours.
This past week in Atlanta, I got the chance to attend the sessions, presented and exhibited at the OpenStack Summit. The Summit was attended by over 4,500 registered participants. Today there are more users than ever! More than 200 companies have joined the project, and the main contributors of current OpenStack release are Red Hat, HP and IBM. The OpenStack Foundation has posted a recap video showing some highlights:
Some themes emerged during the summit. The new concept of big users becoming major contributors is really taking off. Big users are becoming major contributors to the project because it means they can move faster as a company. These big users include large banks, manufacturing, retailers, government agencies, entertainment and everything between. Instead of spending time trying to convince vendors to add features, these large organizations have realized that they can work with the OpenStack community directly to add those features and move faster as a business as a result.
Big Data solutions such as Hadoop and NoSQL applications are no longer a sole game for Internet moguls. Today’s retail, transportation and entertainment corporations use Big Data practices such as Hadoop for data storage and data analytics.
IBM BigInsights makes Big Data deployments an easier task for the system architect. BigInsights with IBM’s GPFS-FPO file system support provides enterprise level Big Data solution, eliminating Single Point of Failure structures and increasing ingress and analytics performance.
The inherent RDMA support in IBM’s GPFS takes the performance aspect a notch higher. The testing conducted at Mellanox Big Data Lab with IBM BigInsights 2.1, GPFS-FPO and FDR 56Gbps InfiniBand showed an increased performance for write and read of 35% and 50 %, respectively, comparing to a vanilla HDFS deployment. On the analytics benchmarks, the system provided 35% throughput gain by enabling the RDMA feature.
This week is EMC World, a huge event with tens of thousands of customers, partners, resellers and EMC employees talking about cloud, storage, and virtualization. EMC sells many storage solutions but most of the excitement and recent growth (per the latest EMC earnings announcement) are about scale-out storage, including EMC’s Isilon, XtremIO, and ScaleIO solutions.
As mentioned in my blog on the four big changes in storage, traditional scale-out storage connects many storage controllers together, while the new scale-out server storage links the storage on many servers. In both designs the disk or flash on all the nodes in each node is viewed and managed as one large pool of storage. Instead of having to manually partition and assign workloads to different storage systems, workloads can be either shifted seamlessly from node to node (no downtime) or distributed across the nodes.
Clients connect to (scale-out storage) or run on (scale-out server storage) different nodes but must be able to access storage on other nodes as if it were local. If I’m connecting to node A, I need rapid access to the storage on node A, B, C, D, and all the other nodes in the cluster. The system may also migrate data from one node to another, and rapidly exchange metadata or control traffic to keep track of who has which data.
In 1967, Gene Amdahl developed a formula that calculates the overall efficiency of a computer system by analyzing how much of the processing can be parallelized and the amount of parallelization that can be applied in the specific system.
At that time, deeper performance analysis had to take into consideration the efficiency of three main hardware resources that are needed for the computation job: the compute, memory and storage.
On the compute side, efficiency has to be measured by how many threads can run in parallel (which depends on the number of cores). The memory size affects the percentage of IO operation that needs to access the storage, which slows significantly the execution time and the overall system efficiency.
Those three hardware resources worked very well until the beginning of 2000. At that time, the computer industry started to use a grid-computing or as it known today, scale-out systems. The benefits of the scale-out architecture are clear. It enables building systems with higher performance, easy to scale with built-in high availability at a lower cost. However, the efficiency of those systems heavily depend on the performance and the resiliency of the interconnect solution.
The importance of the Interconnect became even bigger in the virtualized data center, where the amount of east west traffic continues to grow (as more parallel work is being done). So, if we want to use Amdahl’s law to analyze the efficiency of the scale-out system, in addition to the three traditional items (compute, memory & storage) the fourth item, which is the Interconnect, has to be considered as well.