Companies today are finding that the size and growth of stored data is becoming overwhelming. As the databases grow, the challenge is to create value by discovering insights and connections in the big databases in as close to real time as possible. In the recently published whitepaper, “Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks“ we describe a combination of high performance networking and graph base and analytics technologies which offers a solution to this need.
Each of the examples in the paper is based on an element of a typical analysis solution. In the first example, involving Vertex Ingest Rate shows the value of using high performance equipment to enhance real-time data availability. Vertex objects represent nodes in a graph, such as Customers, so this test is representative of the most basic operation: loading new customer data into the graph. In the second example, Vertex Query Rate highlights the improvement in the time needed to receive results, such as finding a particular customer record or a group of customers.
The third example, Distributed graph navigation processing starts at a Vertex and explores its connections to other Vertices. This is representative of traversing social networks, finding optimal transportation or communications routes and similar problems. The final example, Task Ingest Rate shows the performance improvement when loading the data connecting each of the vertices. This is similar to entering orders for products, transit times over a communications path and so on.
Each of these elements is an important part of a Big Data analysis solution. Taken together, they show that InfiniteGraph can be made significantly more effective when combined with Mellanox interconnect technology.
Resources: Mellanox Web 2.0 Solutions
We recently posted a whitepaper on “Deploying Ceph with High Performance Networks” using Ceph as a block storage device. In this post, we review the advantages of using CephFS as an alternative for HDFS.
Hadoop has become a leading programming framework in the big data space. Organizations are replacing several traditional architectures with Hadoop and use it as a storage, data base, business intelligence and data warehouse solution. Enabling a single file system for Hadoop and other programming frameworks benefits users who need dynamic scalability of compute and or storage capabilities.
The rapid pace of change in data and business requirements is the biggest challenge when deploying a large scale cloud. It is no longer acceptable to spend years designing infrastructure and developing applications capable to cope with data and users at scale. Applications need to be developed in a much more agile manner, but in such a way that allows dynamic reallocation of infrastructure to meet changing requirements.
Choosing an architecture that can scale is critical. Traditional “scale-up” technologies are too expensive and can ultimately limit growth as data volumes grow. Trying to accommodate data growth without proper architectural design, results in un-needed infrastructure complexity and cost.
The most challenging task for the cloud operator in a modern cloud data center supporting thousands or even hundreds-of-thousands of hosts is scaling and automating network services. Fortunately, server virtualization has enabled automation of routine tasks – reducing the cost and time required to deploy a new application from weeks to minutes. Yet, reconfiguring the network for a new or migrated virtual workload can take days and cost thousands of dollars.
To solve these problems, you need to think differently about your data center strategy. Here are three technology innovations that will help data center architects design a more efficient and cost-effective cloud:
1. Overlay Networks
Overlay network technologies such as VXLAN and NVGRE, make the network as agile and dynamic as other parts of the cloud infrastructure. These technologies enable automated network segment provisioning for cloud workloads, resulting in a dramatic increase in cloud resource utilization.
Overlay networks provide for ultimate network flexibility and scalability and the possibility to:
- Combine workloads within pods
- Move workloads across L2 domains and L3 boundaries easily and seamlessly
- Integrate advanced firewall appliances and network security platform seamlessly
As data continues to grow exponentially storing today’s data volumes in an efficient way is a challenge. Many traditional storage solutions neither scale-out nor make it feasible from Capex and Opex perspective, to deploy Peta-Byte or Exa-Byte data stores.
In this newly published whitepaper, we summarize the installation and performance benchmarks of a Ceph storage solution. Ceph is a massively scalable, open source, software-defined storage solution, which uniquely provides object, block and file system services with a single, unified Ceph storage cluster. The testing emphasizes the careful network architecture design necessary to handle users’ data throughput and transaction requirements.
This past week in Atlanta, I got the chance to attend the sessions, presented and exhibited at the OpenStack Summit. The Summit was attended by over 4,500 registered participants. Today there are more users than ever! More than 200 companies have joined the project, and the main contributors of current OpenStack release are Red Hat, HP and IBM. The OpenStack Foundation has posted a recap video showing some highlights:
Some themes emerged during the summit. The new concept of big users becoming major contributors is really taking off. Big users are becoming major contributors to the project because it means they can move faster as a company. These big users include large banks, manufacturing, retailers, government agencies, entertainment and everything between. Instead of spending time trying to convince vendors to add features, these large organizations have realized that they can work with the OpenStack community directly to add those features and move faster as a business as a result.
Big Data solutions such as Hadoop and NoSQL applications are no longer a sole game for Internet moguls. Today’s retail, transportation and entertainment corporations use Big Data practices such as Hadoop for data storage and data analytics.
IBM BigInsights makes Big Data deployments an easier task for the system architect. BigInsights with IBM’s GPFS-FPO file system support provides enterprise level Big Data solution, eliminating Single Point of Failure structures and increasing ingress and analytics performance.
The inherent RDMA support in IBM’s GPFS takes the performance aspect a notch higher. The testing conducted at Mellanox Big Data Lab with IBM BigInsights 2.1, GPFS-FPO and FDR 56Gbps InfiniBand showed an increased performance for write and read of 35% and 50 %, respectively, comparing to a vanilla HDFS deployment. On the analytics benchmarks, the system provided 35% throughput gain by enabling the RDMA feature.
Las Vegas, Nevada is not only the home of games, art, shows and fun, also serves as home to one of the largest Hadoop clusters in the world!
Racks in the Switch SuperNAP – Photo Courtesy of Switch
During the upcoming 2014 EMC World show, we invite you to join us for an informative tour of SuperNAP, The World’s leader in Data Center EcoSystem Development and home of the 1000-node Hadoop cluster. In this tour, we will show how a Hadoop cluster is deployed in a co-location data center, maintained and provide analytics tools for a large community of businesses and academic institutes. It will be a great opportunity to learn about actual working cluster workloads, design considerations and available tools for next generation businesses opportunities in Big Data.
Congratulations go out to Yarden Gerbi as she recently took home the silver medal in competition at the Judo Grand Prix, recently held in Dusseldorf, Germany. This competition brought together 370 athletes from 55 countries. Gerbi secured victories over competitors from Mongolia and Austria and moved on to the semi-finals. Gerbi is currently training in preparation for the 2016 Rio Olympic games.
Windows Azure continues to be the leader in High-Performance Computing Cloud services. Delivering a HPC solution built on top of Windows Server technology and Microsoft HPC Pack, Windows Azure offers the performance and scalability of a world-class supercomputing center to everyone, on demand, in the cloud.
Customers can now run compute-intensive workloads such as parallel Message Passing Interface (MPI) applications with HPC Pack in Windows Azure. By choosing compute intensive instances such as A8 and A9 for the cloud compute resources, customers can deploy these compute resources on demand in Windows Azure in a “burst to the cloud” configuration, and take advantage of InfiniBand interconnect technology with low-latency and high-throughput, including Remote Direct Memory Access (RDMA) technology for maximum efficiency. The new high performance A8 and A9 compute instances also provide customers with ample memory and the latest CPU technology.
The new Windows Azure services can burst and scale on-demand, deploy Virtual Machines and Cloud Services when users require them. Learn more about Azure new services: http://www.windowsazure.com/en-us/solutions/big-compute/
: Eli Karpilovski manages the Cloud Market Development at Mellanox Technologies. In addition, Mr. Karpilovski serves as the Cloud Advisory Council Chairman. Mr. Karpilovski served as product manager for the HCA Software division at Mellanox Technologies. Mr. Karpilovski holds a Bachelor of Science in Engineering from the Holon Institute of Technology and a Master of Business Administration from The Open University of Israel. Follow him on Twitter
Cloud computing was developed specifically to overcome issues of localization and limitations of power and physical space. Yet many data center facilities are in danger of running out of power, cooling, or physical space.
Mellanox offers an alternative and cost-efficient solution. Mellanox’s new MetroX® long-haul switch system makes it possible to move from the paradigm of multiple, disconnected data centers to a single multi-point meshed mega-cloud. In other words, remote data center sites can now be localized through long-haul connectivity, providing benefits such as faster compute, higher volume data transfer, and improved business continuity. MetroX provides the ability for more applications and more cloud users, leading to faster product development, quicker backup, and more immediate disaster recovery.
The more physical data centers you join using MetroX, the more you scale your company’s cloud into a mega-cloud. You can continue to scale your cloud by adding data centers at opportune moments and places, where real estate is inexpensive and power is at its lowest rates, without concern for distance from existing data centers and without fear that there will be a degradation of performance.
Moreover, you can take multiple distinct clouds, whether private or public, and use MetroX to combine them into a single mega-cloud. This enables you to scale your cloud offering without adding significant infrastructure, and it enables your cloud users to access more applications and to conduct more wide-ranging research while maintaining the same level of performance.