John Kim is Director of Storage Marketing at Mellanox Technologies, where he helps storage customers and vendors benefit from high performance interconnects and RDMA (Remote Direct Memory Access). After starting his high tech career in an IT helpdesk, John worked in enterprise software and networked storage, with many years of solution marketing, product management, and alliances at enterprise software companies, followed by 12 years working at NetApp and EMC. Follow him on Twitter: @Tier1Storage
View all posts by John F. Kim →
The latest buzz about Ethernet is that 25GbE is coming. Scratch that, it’s already here and THE hot topic in the Ethernet world, with multiple vendors sampling 25GbE wares and Mellanox already shipping an end-to-end solution with adapters, switches and cables that support 25, 50, and 100GbE speeds. Analysts predict 25GbE sales will ramp faster than any previous Ethernet speed.
Why?????? What’s driving this shift?
Figure 1: Analysts predict 25/40/50/100GbE adapters reach 57% of a $1.8 Billion USD high-speed Ethernet adapter market by 2020. (Based on Crehan Research data published January 2016.)
These new speeds are so hot that, like the ageless celebrities you just saw on the Oscar Night red carpet, we say “25 is the new 10 and 50 is the new 40.” But whoa! Sure everyone wants to look younger for the camera, but no 25-year old actor wants to look 10. More importantly, why would anyone want 25GbE or 50GbE when we already have 40GbE and 100GbE?
I recently saw an infographic titled “2015 Data Storage Roadmap” and was pleasantly surprised to see Mellanox listed under the storage networking section. The side comment was “Ethernet Becoming The Standard Storage Network.”
Figure 1: Tech Expectations blog infographic shows the new storage networking vendors. (Graphic excerpted from the larger original graphic, which is available here.)
Why surprised? Because in the past, when people said “Storage Networking” they usually meant Fibre Channel. But the growth of cloud, software-defined, and scale-out storage, as well as hyper-converged and big data solutions, have all made Ethernet the new standard storage network (rather than Fibre Channel), just as the infographic above says. Since Mellanox is the leading vendor of networking equipment for speeds above 10Gb/s, it’s really not a surprise after all to have Mellanox on the leaderboard.
In my first blog on CephI explained what it is and why it’s hot; in my second blog on Ceph I showed how faster networking can enable faster Ceph performance (especially throughput). But many customers are asking how to make Ceph even faster. And recent testing by Red Hat and Mellanox, along with key partners like Supermicro, QCT (QuantaCloud technology), and Intel have provided more insight into increasing Ceph performance, especially around IOPS-sensitive workloads.
Different Data in Ceph Imposes Different Workloads
Ceph can be used for block or object storage and different workloads. Usually, block workloads consist of smaller, random I/O, where data is managed in blocks ranging from 1KB to 64KB in size. Object storage workloads usually offer large, sequential I/O with data chunks ranging from 16KB to 4MB in size (and individual objects can be many gigabytes in size). The stereotypical small, random block workload is a database such as MySQL or active virtual machine images. Common object data include archived log files, photos, or videos. However in special cases, block I/O can be large and sequential (like copying a large part of a database) and object I/O can be small and random (like analyzing many small text files).
The different workloads put different requirements on the Ceph system. Large sequential I/O (usually objects) tends to stress the storage and network bandwidth for both reads and writes. Small random I/O (usually blocks) tends to stress the CPU and memory of the OSD server as well as the storage and network latency. Reads usually require fewer CPU cycles and are more likely to stress storage and network bandwidth, while writes are more likely to stress the CPUs as they calculate data placement. Erasure Coding writes require more CPU power but less network and storage bandwidth.
Analyst firm, Neuralytix,just published a terrific white paper about the revolution affecting data storage interconnects. Titled Faster Interconnects for Next Generation Data Centers, it explains why customers are rethinking their data center storage and networks, in particular around how iSCSI and iSER (iSCSI with RDMA) are starting to replace Fibre Channel for block storage.
You can find the paper here. It’s on-target about iSCSI vs. FC, but it doesn’t cover the full spectrum of factors dooming FC to a long and slow fadeout from the storage connectivity market. I’ll summarize the key points of the paper as well as the other reasons Fibre Channel has no future.
Three reasons Fibre Channel is a Dead End, As Explained by Neuralytix:
1. Flash: Fast Storage Needs Fast Networking
Today’s flash far outperforms hard drives for throughput, latency, IOPS, power consumption, and reliability. It has better price/performance than hard disks and already represents between 10-15% of shipping enterprise storage capacity according to analysts. With fast storage, your physical network and your network protocol must have high bandwidth and low latency, otherwise you’re wasting much of the value of flash. Tomorrow’s NVMe devices will support up to 2-3GB/s (16-24Gb/s) each with latencies <50 us (that’s <0.05 milliseconds vs. 2-5 milliseconds for hard drives). Modern Ethernet supports speeds of 100Gb/s per link, with latencies of several microseconds, and combined with the hardware-accelerated iSER block protocol, it’s perfect for supporting maximum performance on non-volatile memory (NVM), whether today’s flash or tomorrow’s next-gen solid state storage.
In my first blog on Ceph, I explained what it is and why it’s hot. But what does Mellanox, a networking company, have to do with Ceph, a software-defined storage solution? The answer lies in the Ceph scale-out design. And some empirical results are found in the new “Red Hat Ceph Storage Clusters on Supermicro storage servers”reference architecturepublished August 10th.
Ceph has two logical networks, the client-facing (public) and the cluster (private) networks. Communication with clients or application servers is via the former while replication, heartbeat, and reconstruction traffic run on the latter. You can run both logical networks on one physical network or separate the networks if you have a large cluster or lots of activity.
Figure 1: Logical diagram of the two Ceph networks
Back in April 2015, during the Ethernet Technology Summit conference, my colleague Rob Davis wrote a great blog about NVMe Over Fabrics. He outlined the basics of what NVMe is and why Mellanox is collaborating on a standard to access NVMe devices over networks (over fabrics). We had two demos from two vendors in our booth:
Mangstor’s NX-Series array with NVMe Over Fabrics, using Mellanox 56GbE RoCE (or FDR InfiniBand), demonstrated >10GB/s read throughput and >2.5 million 4KB random read IOPS.
Saratoga Speed’s Altamont XP-L with iSER (iSCSI RDMA), using Mellanox 56Gb RoCE to reach 11.6GB/s read throughput and 2.7 million 4KB sequential read IOPs
These numbers were pretty impressive, but in the technology world, nothing stands still. One must always strive to be faster, cheaper, more reliable, and/or more efficient.
The Story Gets Better
Today—four months after Ethernet Technology Summit—kicked off the Flash Memory Summit in Santa Clara, California. Mellanox issued a press releasehighlighting the fact that we now have NINE vendors showing TWELVE demos of flash (or other non-volatile memory) being accessed using high-speed Mellanox networks at 40, 56, or even 100Gb/s speeds. Mangstor and Saratoga Speed are both back with faster, more impressive demos and we have other demos from Apeiron, HGST, Memblaze, Micron, NetApp, PMC-Sierra, and Samsung. Here’s a quick summary:
In talks with customers, server vendors, the IT press, and even within Mellanox, one of the hottest storage topics is Ceph. You’ve probably heard of it and many big customers are implementing it or evaluating it. But I am also frequently asked the following:
What is Ceph?
Why is it a hot topic in storage?
Why does Mellanox, a networking company, care about Ceph, and why should Ceph customers care about networking?
I’ll answer #1 and #2 in this blog and #3 in another blog.
Figure 1: A bigfin reef squid (Sepioteuthis lessoniana) of the Class Cephalopoda
Early May is a time of celebrations. May 1 is the traditional start of summer, as well as International Workers Day. Cinco De Mayo celebrates the Mexican victory over the French in 1862. In the United States, it’s time for Mother’s Day.
Figure 1: A traditional Maypole celebration in England
Most importantly for IT, it’s time for EMC World. EMC is Mother Storage to many enterprise customers gathered in Las Vegas this week.
Figure 2: EMC CEO Joe Tucci says “Live Long and Prosper” to mothers (and storage users) across the galaxy.
This week is EMC World, a huge event with tens of thousands of customers, partners, resellers and EMC employees talking about cloud, storage, and virtualization. EMC sells many storage solutions but most of the excitement and recent growth (per the latest EMC earnings announcement) are about scale-out storage, including EMC’s Isilon, XtremIO, and ScaleIO solutions.
As mentioned in my blog on the four big changes in storage, traditional scale-out storage connects many storage controllers together, while the new scale-out server storage links the storage on many servers. In both designs the disk or flash on all the nodes in each node is viewed and managed as one large pool of storage. Instead of having to manually partition and assign workloads to different storage systems, workloads can be either shifted seamlessly from node to node (no downtime) or distributed across the nodes.
Clients connect to (scale-out storage) or run on (scale-out server storage) different nodes but must be able to access storage on other nodes as if it were local. If I’m connecting to node A, I need rapid access to the storage on node A, B, C, D, and all the other nodes in the cluster. The system may also migrate data from one node to another, and rapidly exchange metadata or control traffic to keep track of who has which data.
This week, Las Vegas hosts the National Association of Broadcasters conference, or NAB Show. A big focus is the technology needed to deliver movies and TV shows using 4K video.
Standard DVD video resolution is 720×480. Blue-ray resolution is 1920×1080. But, thanks to digital projection in movie theatres and huge flat-screen TVs at home, more video today is being shot in 4K (4096×2160) resolutions. The video is stored compressed but must be streamed uncompressed for many editing, rendering, and other post-production workflows. Each frame has over 8 million pixels and requires 24x greater bandwidth than DVD (4x greater bandwidth than Blue-ray).