Companies today are finding that the size and growth of stored data is becoming overwhelming. As the databases grow, the challenge is to create value by discovering insights and connections in the big databases in as close to real time as possible. In the recently published whitepaper, “Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks“ we describe a combination of high performance networking and graph base and analytics technologies which offers a solution to this need.
Each of the examples in the paper is based on an element of a typical analysis solution. In the first example, involving Vertex Ingest Rate shows the value of using high performance equipment to enhance real-time data availability. Vertex objects represent nodes in a graph, such as Customers, so this test is representative of the most basic operation: loading new customer data into the graph. In the second example, Vertex Query Rate highlights the improvement in the time needed to receive results, such as finding a particular customer record or a group of customers.
The third example, Distributed graph navigation processing starts at a Vertex and explores its connections to other Vertices. This is representative of traversing social networks, finding optimal transportation or communications routes and similar problems. The final example, Task Ingest Rate shows the performance improvement when loading the data connecting each of the vertices. This is similar to entering orders for products, transit times over a communications path and so on.
Each of these elements is an important part of a Big Data analysis solution. Taken together, they show that InfiniteGraph can be made significantly more effective when combined with Mellanox interconnect technology.
Resources: Mellanox Web 2.0 Solutions
Big Data solutions such as Hadoop and NoSQL applications are no longer a sole game for Internet moguls. Today’s retail, transportation and entertainment corporations use Big Data practices such as Hadoop for data storage and data analytics.
IBM BigInsights makes Big Data deployments an easier task for the system architect. BigInsights with IBM’s GPFS-FPO file system support provides enterprise level Big Data solution, eliminating Single Point of Failure structures and increasing ingress and analytics performance.
The inherent RDMA support in IBM’s GPFS takes the performance aspect a notch higher. The testing conducted at Mellanox Big Data Lab with IBM BigInsights 2.1, GPFS-FPO and FDR 56Gbps InfiniBand showed an increased performance for write and read of 35% and 50 %, respectively, comparing to a vanilla HDFS deployment. On the analytics benchmarks, the system provided 35% throughput gain by enabling the RDMA feature.
Mellanox recently announced a collaboration with IBM to produce a tightly integrated server and storage solutions that incorporate our end-to-end FDR 56Gb/s InfiniBand and 10/40 Gigabit Ethernet interconnect solutions with IBM POWER CPUs. By combining IBM POWER CPUs with the world’s highest-performance interconnect solution will drive data at optimal rates, maximizing performance and efficiency for all types of applications and workloads, as well as enable dynamic storage solutions to allow multiple applications to efficiently share data repositories.
Advances in high-performance applications are enabling analysts, researchers, scientists and engineers to run more complex and detailed simulations and analyses in a bid to gather game-changing insights and deliver new products to market. This is placing greater demand on existing IT infrastructures, driving a need for instant access to resources – compute, storage, and network.
Companies are looking for faster and more efficient ways to drive business value from their applications and data. The combination of IBM processor technologies and Mellanox high-speed interconnect solutions can provide clients with an advanced and efficient foundation to achieve their goals.
People often ask me why Mellanox is interested in storage, since we make high-speed InfiniBand and Ethernet infrastructure, but don’t sell disks or file systems. It is important to understand the four biggest changes going on in storage today: Flash, Scale-Out, Appliances, and Cloud/Big Data. Each of these really deserves its own blog but it’s always good to start with an overview.
Flash is a hot topic, with IDC forecasting it will consume 17% of enterprise storage spending within three years. It’s 10x to 1000x faster than traditional hard disk drives (HDDs) with both higher throughput and lower latency. It can be deployed in storage arrays or in the servers. If in the storage, you need faster server-to-storage connections. If in the servers, you need faster server-to-server connections. Either way, traditional Fibre Channel and iSCSI are not fast enough to keep up. Even though Flash is cheaper than HDDs on a cost/performance basis, it’s still 5x to 10x more expensive on a cost/capacity basis. Customers want to get the most out of their Flash and not “waste” its higher performance on a slow network.
Flash can be 10x faster in throughput, 300-4000x faster in IOPS per GB (slide courtesy of EMC Corporation)
Hadoop MapReduce is the leading Big Data analytics framework. This framework enables data scientists to process data volumes and variety never processed before. The result from this data processing is new business creation and operation efficiency.
As MapReduce and Hadoop advance, more organizations try to use the frameworks in near real-time capabilities. Leveraging RDMA (Remote Direct Memory Access) capabilities for faster Hadoop MapReduce capabilities has proven to be a successful method.
In our presentation at Oracle Open World 2013, we show the advantages RDMA brings to enterprises deploying Hadoop and other Big Data applications:
- Double analytics performance, accelerating MapReduce framework
- Double Hadoop file system ingress capabilities
- Reducing NoSQL Databases’ latencies by 30%
On the analytics side, UDA (Unstructured Data Accelerator), doubles the computation power by offloading networking and buffer copying from the server’s CPU to the network controller. In addition, a novel shuffle and merge approach helped to achieve the needed performance acceleration. The UDA package is and open source package available here (https://code.google.com/p/uda-plugin/). The HDFS (Hadoop Distributed File System) layer is also getting its share of performance boost.
While the community continues to improve the feature, work conducted at Ohio State University brings the RDMA capabilities to the data ingress process of HDFS. Initial testing shows over 80% improvement in the data write path to the HDFS repository. The RDMA HDFS acceleration research and downloadable package is available from the Ohio State University website at: http://hadoop-rdma.cse.ohio-state.edu/
We are expecting more RDMA acceleration enablement to different Big Data frameworks in the future. If you have a good use case, we will be glad to discuss the need and help with the implementation.
Contact us through the comments section below or at firstname.lastname@example.org
: Eyal Gutkind is a Senior Manager, Enterprise Market Development at Mellanox Technologies focusing on Web 2.0 and Big Data applications. Eyal held several engineering and management roles at Mellanox Technologies over the last 11 years. Eyal Gutkind holds a BSc. degree in Electrical Engineering from Ben Gurion University in Israel and MBA from Fuqua School of Business at Duke University, North Carolina.