New Territories Explored: Distributed File System Hadoop

It took me a while but I’m back – hope you’re all been waiting to hear from me .

With that, I’ve decided to go into un-charted territories…HADOOP

 

Hadoop is an Apache project. It is a framework, written in Java, for running applications on large clusters built with commodity hardware (distributed computing). Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster.

 

Hadoop provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. The 2 key functions in Hadoop are map and reduce. I would like to briefly touch on what they mean.

 

The map function processes a key/value pair to generate a set of intermediate key/value pairs, while the reduce function merges all intermediate values associated with the same intermediate key. A map-reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner, the framework sorts the outputs of the maps, which then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system Master/Slave architecture.

 

The  test we’ve conducted was the DFSIO benchmark, a map-reduce job where each map task opens a file and writes/reads to/from it, closes it, and measures the I/I time. There is only single reduce task which aggregates individual times and sizes. We’ve limited the test for 10 to 50 files measurements with 360 MB that we found reasonable compared with the ratio of number of nodes used and number of files. We then compared that to a public publish from Yahoo  which used 14k files over 4k nodes. This boils down to 3.5 files per node where we are using 50 files over 12 nodes, which equates to over 4 files per node.

 

Given the above configuration and the test described above, here is a snap shot of the results we’ve seen:

 

 



It can clearly be seen from the above, as well as through other results we’ve been given, that InfiniBand and 10GigE (via our ConnectX adapters) is half the time in execution time and over triple in bandwidth…these are very conclusive results by any matrix. 

 

A very interesting point to review is that the tests which were executed using DFS located on a hard disk showed significant better performance, but when testing with RamDisk, the gap increased even more. e.g. latency became from half to one-third… it seems like a clear way to unleash the potential.

 

In my next blog post I’ll plan to either review a new application or anther aspect of this application.

 

 

Nimrod Gindi

Director of Corporate Strategy

nimrodg@mellanox.com