We recently posted a whitepaper on “Deploying Ceph with High Performance Networks” using Ceph as a block storage device. In this post, we review the advantages of using CephFS as an alternative for HDFS.
Hadoop has become a leading programming framework in the big data space. Organizations are replacing several traditional architectures with Hadoop and use it as a storage, data base, business intelligence and data warehouse solution. Enabling a single file system for Hadoop and other programming frameworks benefits users who need dynamic scalability of compute and or storage capabilities.
The Apache™ Hadoop® project and storage software vendors are independently developing alternative distributed file system solutions for HDFS. Ceph, an emerging software storage solution used mainly for cloud based installations, has a file system plugin for Hadoop. Ceph, in conjunction with high performance InfiniBand network, provides an innovative way to store and process Peta and Exabytes of information for Big Data applications.
The benchmarking tests we conducted shows that Ceph with Mellanox FDR InfiniBand, can surpass native HDFS performance by 20%, and provide manageability and resiliency advantages over native HDFS.
To learn more on the setup we used for this benchmark, please view the whitepaper (registration required).
Author: Eyal Gutkind is a Senior Manager, Enterprise Market Development at Mellanox Technologies focusing on Web 2.0 and Big Data applications. Eyal held several engineering and management roles at Mellanox Technologies over the last 11 years. Eyal Gutkind holds a BSc. degree in Electrical Engineering from Ben Gurion University in Israel and MBA from Fuqua School of Business at Duke University, North Carolina.