We recently posted a whitepaper on “Deploying Ceph with High Performance Networks” using Ceph as a block storage device. In this post, we review the advantages of using CephFS as an alternative for HDFS.
Hadoop has become a leading programming framework in the big data space. Organizations are replacing several traditional architectures with Hadoop and use it as a storage, data base, business intelligence and data warehouse solution. Enabling a single file system for Hadoop and other programming frameworks benefits users who need dynamic scalability of compute and or storage capabilities.
Virtualization has already proven itself to be the best way to improve data center efficiency and to simplify management tasks. However, getting those benefits requires using the various services that the Hypervisor provides. This introduces delay and results in longer execution time, compared to running over a non-virtualized data center (native infrastructure). This drawback hasn’t been hidden from the eyes of the high-tech R&D community seeking ways to enjoy the advantages of virtualization with a minimal effect on performance.
One of the most popular solutions today to enable native performance is to use the SR-IOV (Single Root IO Virtualization) mechanism which bypasses the Hypervisor and enables a direct link between the VM to the IO adapter. However, although the VM gets the native performance, it loses all of the Hypervisor services. Important features like high availability (HA) or VM migration can’t be done easily. Using SR-IOV requires that the VM must have the specific NIC driver (that he communicates with) which results in more complicated management since IT managers can’t use the common driver that runs between the VM to the Hypervisor.
As virtualization becomes a standard technology, the industry continues to find ways to improve performance without losing benefits, and organizations have started to invest more in the deployment of RDMA enabled interconnects in virtualized data centers. In one my previous blogs, I discussed the proven deployment of RoCE (RDMA over Converged Ethernet) in Azure using SMB Direct (SMB 3.0 over RDMA) enabling faster access to storage.
Today’s data centers demand that the underlying interconnect provide the utmost bandwidth and extremely low latency. While high bandwidth is important, it is not worth much without low latency. Moving large amounts of data through a network can be achieved with TCP/IP, but only RDMA can produce the low latency that avoids costly transmission delays.
The speedy transfer of data is critical to it being used efficiently. Interconnect based on Remote Direct Memory Access (RDMA) offers the ideal option for boosting data center efficiency, reducing overall complexity, and increasing data delivery performance. Mellanox RDMA enables sub-microsecond latency and up to 56Gb/s bandwidth, translating to screamingly fast application performance, better storage and data center utilization, and simplified network management.