All posts by John F. Kim

About John F. Kim

John Kim is Director of Storage Marketing at Mellanox Technologies, where he helps storage customers and vendors benefit from high performance interconnects and RDMA (Remote Direct Memory Access). After starting his high tech career in an IT helpdesk, John worked in enterprise software and networked storage, with many years of solution marketing, product management, and alliances at enterprise software companies, followed by 12 years working at NetApp and EMC. Follow him on Twitter: @Tier1Storage

The Best Flash Array Controller Is a System-on-Chip called BlueField

As the storage world turns to flash and flash turns to NVMe over Fabrics, the BlueField SoC could be the most highly integrated and most efficient flash controller ever. Let me explain why.

The backstory—NVMe Flash Changes Storage

Dramatic changes are happening in the storage market. This change comes from NVMe over Fabrics, which comes from NVMe, which comes from flash. Flash has been capturing more and more of the storage market. IDC reported that in Q2 2017, the all-flash array (AFA) revenue grew 75% YoY while the overall external enterprise storage array market was slightly down. In the past this flash consisted of all SAS and SATA solid state drives (SSDs), but flash and SSDs have long been fast enough that the SATA and SAS interfaces imposed bandwidth bottlenecks and extra latency.

 

Figure 1: SATA and SAS controllers can cause a bottleneck and result in higher latency.

 

The SSD vendors developed the Non-volatile memory Express or NVMe standard and commands (version 1.0 released March 2011), which run over a PCIe interface. NVMe allows higher throughput, up to 20Gb/s per SSD today (and more in the near future) and lower latency.  It eliminates the SAS/SATA controllers and requires PCIe connections, typically 4 PCIe Gen 3 lanes per SSD.  Many servers deployed with local flash now enjoy the higher performance of NVMe SSDs.

 

How to Share Fast SSD Goodness

But local flash deployed this way is “trapped in the server” because each server can only use its own flash. Different servers need different amounts of flash at different times, but with a local model you must overprovision enough flash in each server to support the maximum that might be needed, even if you need the extra flash for only a few hours at some point in the future. The answer over the last 20 years has been to centralize and network the storage using iSCSI, Fibre Channel Protocol, iSER (iSCSI over RDMA), or NAS protocols like SMB and NFS.

But these all use either SCSI commands or file semantics and were not optimized for flash performance, so they can deliver good performance but not the best possible performance. As a result the NVMe community, including Mellanox, created NVMe over Fabrics (NVMe-oF) to allow fast, efficient sharing of NVMe flash over a fabric. It allows the lean and efficient NVMe commands to operate across an RDMA network with protocols like RoCE and InfiniBand. And it maintains the efficiency and low latency of NVMe while allowing sharing, remote access, replication, failover, etc.  A good overview of NVMe over Fabrics is in this YouTube video:

 

Video 1: An overview of how NVMe over Fabrics has Evolved

NVMe over Fabrics Frees the Flash But Doesn’t Come Free

Once  NVMe-oF frees the Flash from the server, you now need an additional CPU to run NVMe commands in a Just-A-Bunch-of-Flash (JBOF) box, plus more CPU power if it’s a storage controller running storage software. You need DRAM to store the buffers and queues. You need a PCIe switch to connect to the SSDs. And you need rNICs that can handle RDMA at high enough speeds to support all the fast NVMe SSDs. In other words, you have to build a complete server design with enhanced internal and external connectivity to support this faster storage. For a storage controller this is not unusual, but for a JBOF it’s more complex and costly than what they’re accustomed to doing with SAS or SATA HBAs and expanders—that don’t require CPUs, DRAM, PCIe switches, or rNICs.

Also, since NVMe SSDs and the NVMe over Fabrics protocol are inherently low latency, the latency of everything else in the system—software, network, HBAs, cache or DRAM access, etc., becomes more prominent and reducing latency in those areas becomes more critical.

A New SoC Is the Most Efficient Way to Drive NVMe-oF

Fortunately there is a new way to build NVMe-oF systems: a single chip that provides everything needed, other than the SSDs and the DRAM DIMMs; it is the Mellanox BlueField.  It includes:

  • ConnectX-5 high-speed NIC (up to 2x100Gb/s ports, Ethernet or InfiniBand),
  • Up to 16 ARM A72 (64-bit) CPU cores,
  • A built-in PCIe switch (32 lanes at Gen3/Gen4),
  • DRAM controller & coherent cache
  • A fast mesh fabric to connect it all

 

Figure 2: BlueField (logical design illustration) includes networking, CPU cores, cache, DRAM controllers, and a PCIe switch all on one chip.

 

The embedded ConnectX-5 delivers not just 200Gb/s of network bandwidth but all the features of ConnectX-5, including RDMA and NVMe protocol offloads. This means the NVMe-oF data traffic can go directly from SSD to NIC (or NIC to SSD) without interrupting the CPU. It also means overlay network encapsulation (like VXLAN), virtual switch features (such as OVS), erasure coding, T10 data integrity factor signatures, and stateless TCP offloads can all be processed by the NIC without involving the CPU cores.  The CPU cores remain free to run storage software, security, encryption, or other functionality.

 

The fast mesh internal fabric enables near-instantaneous data movement between the PCIe, CPU, cache and networking elements as needed, and operates much more efficiently than a classic server design where traffic between the SSDs and NIC(s) must traverse the PCIe switch and DRAM multiple times for each I/O. With this design, NVMe-oF data traffic queues and buffers can be handled completely in the on-chip cache and doesn’t need to go to the external DRAM, which is only needed if additional storage functions running on the CPU cores are applied to the data. Otherwise the DRAM can be used for control plane traffic, reporting, and management. The PCIe switch supports up to 32 lanes of both Gen3 or Gen4, so it can transfer more than 200Gb/s of data to/from SSDs and is ready for the new PCIe Gen4-enabled SSDs expected to arrive in 2018. (PCIe Gen4 can transfer 2x more traffic per lane than PCIe Gen3.)

BlueField is the FIRST SoC to include all these features and performance, making it uniquely well-suited to control flash arrays, in particular NVMe-oF arrays and JBOFs.

 

BlueField Is the Most Integrated NVMe-oF Solution

We’ve seen that in the flash storage world, performance is very important. But simplicity of design and controlling costs are also important. By combining all the components of a NVMe-oF server into a single chip, BlueField makes the flash array design very simple and lowers the cost—including allowing a smaller footprint and lower power consumption.

Figure 3: BlueField (logical design illustration) includes networking, CPU cores, cache, DRAM controllers, and a PCIe switch all on one chip.

 

Vendors Start Building Storage Solutions Based on BlueField

Not surprisingly, key Original Design Manufacturers (ODMs) and storage Original Equipment Manufacturers (OEMs) are already designing storage solutions based on BlueField SoC. Mellanox is also working with key partners to create more BlueField solutions for network processing, cloud, security, machine learning, and other non-storage use cases. Mellanox has created a BlueField Storage Reference Platform that can handle many NVMe SSDs and serve them up using NVMe over Fabrics using BlueField. This is the perfect development and reference platform to help customers and partners test and develop their own BlueField-powered storage controllers and JBOFs.

Figure 4: The BlueField Reference System helps vendors and partners quickly develop BlueField-based storage systems.

 

BlueField is the Best Flash Array Controller

The optimized performance and tight integration of all the components needed, makes BlueField the perfect flash array controller, especially for NVMe-oF storage arrays and JBOFs. Designs using BlueField will deliver more flash performance at lower cost and using less power than standard server-based designs.

You can see the BlueField SoC and BlueField Storage Reference Platform this week (August 8-10) at Flash Memory Summit, in the Santa Clara Convention Center, in the Mellanox booth #138.

 

Supporting Resources:

 

 

The Ideal Network for Containers and NFV Microservices

Containers are the New Virtual Machine

Containers represent a hot trend in cloud computing today. They allow virtualization of servers and portability of applications minus the overhead of running a hypervisor on every host and without a copy of the full operating system in every virtual machine. This makes them more efficient than using full virtual machines. You can pack more applications on each server with containers than with a hypervisor.

Figure 1: Containers don’t replicate the entire OS for each application so have less overhead than virtual machines. Illustration courtesy of Docker, Inc. and RightScale, Inc.

 

Containers Make it Easy to Convert Legacy Appliances Into Microservices

Because they are more efficient, containers also make it easier to convert legacy networking appliances into Virtualized Network Functions (VNF) and into microservices. It’s important to understand that network function virtualization (NFV) is not the same as re-architecting functions as microservices, but that the two are still highly complementary.

 

Figure 2: Docker Swarm and Kubernetes are tools to automate deployment of containers. Using containers increases IT and cloud flexibility but puts new demands on the network.

 

The Difference Between Microservices and Plain Old NFV

Strictly speaking, NFV simply replaces a dedicated appliance with the same application running as a virtual machine or container. The monolithic app remains monolithic and must be deployed in the same manner as if it were still on proprietary hardware, except it’s now running on commercial off the shelf (COTS) servers. These servers are cheaper than the dedicated appliances but performance is often slower, because generic server CPUs generally are not great at high-speed packet processing or switching.

Microservices means disaggregating the parts of a monolithic application into many small parts that can interact with each other and scale separately. Suppose my legacy appliance inspects packets, routes them to the correct destination, and analyzes suspicious traffic. As I deploy more appliances, I get these three capabilities in exactly the same ratio, even though one particular customer (or week, or day) might require substantially more routing and very little analysis, or vice versa. However, if I break my application into specific components, or microservices that interoperate with each other, then I can scale only the services that are needed. Deploying microservices in containers means it’s easy to add, reduce, or change the mix and ratio of services running from customer to customer, or even hour to hour. It also makes applications faster to deploy and easier to develop and update, because individual microservices can be designed, tested, deployed or updated quickly without affecting all the other services.

So, NFV moves network functions from dedicated appliances to COTS servers and microservices disaggregates monolithic functions into scalable components. Doing both gives cloud service providers total flexibility in choosing which services are deployed and what hardware is used. But, one more critical element must be considered in the quest for total infrastructure efficiency—NFV optimized networking.

Figure 3: Plain NFV uses monolithic apps on commodity servers. Microservices decomposes apps into individual components that can be scaled separately.

 

Microservices and Containers Require the Right Network Infrastructure

When you decompose monolithic applications into microservices, you place greatly increased demand on the network. Monolithic apps connect their functions within one server so there is little or no east-west traffic — all traffic is north-south to and from the clients or routers. But, an app consisting of disaggregated microservices relies on the network for inter-service communication and can easily generate several times more east-west traffic than north-south traffic. Much of this traffic can even occur between containers on the same physical host, thereby taxing the virtual switch running in host software.

Figure 4: Changing to a microservices design allows flexibility to deploy exactly the services that are needed but greatly increases east-west network traffic, mandating the use of robust and reliable switches.

Moving to COTS servers also poses a performance challenge because the proprietary appliances use purpose-built chips to accelerate their packet processing, while general purpose X86 CPUs require many cycles to process packet streams, especially for small packets.

The answer to both challenges is deploying the right networking hardware. The increased east-west traffic demands a switch that is not only fast and reliable, but able to handle micro-bursts of traffic while also fairly allocating performance across ports. Many Ethernet switches use merchant silicon that only delivers the advertised bandwidth for larger packet sizes, or only when a certain combinations of ports are used. They might cause an unexpected packet drop under load or switch from cut-through networking to store-and-forward networking, which will greatly increase network latency. The main problem with these switches is that performance becomes unpredictable — sometimes it’s good and sometimes it’s bad, and this makes supporting cloud service level agreements impossible. On the other hand, choosing the right switch ensures good throughput and low latency across all packet sizes and port combinations, which also eliminates packet loss during traffic microbursts.

Figure 5: Mellanox Spectrum has up to 8x better microburst absorption capability than the Broadcom Tomahawk silicon used in many other switches. Spectrum also delivers full rated bandwidth at all packet sizes without any avoidable packet loss.

Separately from the switch, an optimized smart NIC such as the Mellanox ConnectX®-4 includes Single Route I/O Virtualization (SRIOV) and an internal Ethernet switch or, eSwitch, to accelerate network functions. These features let each container access the NIC directly and can offload inter-container traffic from the software virtual switch using an Open vSwitch (OVS) offload technology called ASAP2. These smart NICs also offload the protocol translation for overlay networks—like VXLAN, NVGRE, and Geneve, which are used to provide improved container isolation and mobility. These features and offloads greatly accelerate container networking performance while reducing the host’s CPU utilization. Faster networking plus more available CPU cycles enables more containers per host, improving cloud scalability and reducing costs.

Figure 6: ASAP2 offloads packet processing from the software vSwitch to a hardware-accelerated eSwitch in the NIC, greatly accelerating container network performance.

 

Medallia Deploys Microservices Using Containers

Medallia provides a great case study of a modern cloud services provider that has embraced containers and advanced networking, in order to deliver Customer Feedback Management as Software-as-a-Service (SaaS). Medallia enables companies to track and improve their customers’ experiences. Every day, Medallia must capture and analyze online and social media feedback from millions of interactions and deliver real-time analysis and reporting, including personalized dashboards to thousands of their customers’ employees.  Medallia wanted to run their service on commodity hardware using open standards and fully-automated provisioning. They also wanted full portability of any app, service, or networking function, making it easy to move, replace, or relaunch any function on any hardware.

 

To accomplish all this, they designed a software-defined, scalable cloud infrastructure using microservices and containers on the following components:

  • Docker for container management
  • Aurora, Mesos, and Bamboo for automation
  • Ceph for storage
  • Ubuntu Linux for compute servers and Cumulus Linux for networking
  • Mellanox ConnectX-4 Lx 50GbE adapters
  • Mellanox Spectrum switches running Cumulus Linux (50GbE to servers, 100GbE for aggregation)

Figure 7 and Video 1: Medallia uses containers, Cumulus Linux, and Ceph running on Mellanox adapters and switches to deliver a superior cloud SaaS to their customers.

 

Medallia found that using end-to-end Mellanox networking hardware to underlay their containers and microservices resulted in faster performance and a more reliable network. Their Ceph networked storage performance matched that of their local storage, and they were able to automate network management tasks and reduce the number of network cables per rack. All of this enables Medallia to deliver a better SaaS to their cloud customers, who, in turn, learn how to be better listeners and vendors to their own retail customers.

Mellanox is the Container Networking Company

The quest for NFV and containerization of microservices is a noble one that increases flexibility and lowers hardware costs. However, to do this correctly, cloud service providers need networking solutions like Mellanox ConnectX-4 adapters and Spectrum switches. Using the right network hardware ensures fast, reliable and secure performance from containers and VNFs, making Mellanox the ideal NFV and Container Networking Company.

Supporting Resources:

 

 

 

Excelero Unites NVMe Over Fabrics With Hyper-Converged Infrastructure

Two Hot IT Topics Standing Alone, Until Now…

Two of the hottest topics and IT trends right now are hyper-converged infrastructure (HCI) and NVMe Over Fabrics (NVMe-oF).  The hotness of HCI is evident in the IPO of Nutanix in September and HPE’s acquisition of Simplivity in January 2017. The interest in NVMe-oF has been astounding with all the major storage vendors working on it and all the major SSD vendors promoting it as well.

But the two trends have been completely separate—you could do one, the other, or both, but not together in the same architecture. HCI solutions could use NVMe SSDs but not NVMe-oF, while NVMe-oF solutions were being deployed either as separate, standalone flash arrays or NVMe flash shelves behind a storage controller. There was no easy way to create a hyper-converged solution using NVMe-oF.

 

Excelero NVMesh Combines NVMe-oF with HCI

Now a new solution launched by Excelero combines the low latency and high throughput of NVMe-oF with the scale-out and software-defined power of HCI. Excelero does this with a technology called NVMesh that takes commodity server, flash, and networking technology and connects it in a hyper-converged configuration using an enhanced version of the NVMe-oF protocol. With this solution, each node can act both as an application server and as a storage target, making its local flash storage accessible to all the other nodes in the cluster. It also supports a disaggregated flash model so customers have a choice between scale-out converged infrastructure and a traditional centralized storage array.

Figure 1: Excelero NVMesh combines NVMe-oF with HCI, much like combining peanut butter and chocolate into one tasty treat).

 

 

Remote Flash Access Without the Usual CPU Penalties

NVMesh creates a virtualized pool of block storage using the NVMe SSDs on each server and leverages a technology called Remote Direct Drive Access (RDDA) to let each node access flash storage remotely.   RDDA itself builds on top of industry-standard Remote Direct Memory Access (RDMA) networking to maintain the low latency of NVMe SSDs even when accessed over the network fabric.  The virtualized pools allow several NVMe SSDs to be accessed as one logical volume by either local or remote applications.

In a traditional hyper-converged model, the storage sharing consumes some part of the local CPU cycles, meaning they are not available for the application. The faster the storage and the network, the more CPU is required to share the storage. RDDA avoids this by allowing the NVMesh clients to directly access the remote storage without interrupting the target node’s CPU. This means high performance—whether throughput or IOPS—is supported across the cluster without eating up all the CPU cycles.

 

Recent testing showed a 4-server NVMesh cluster with 8 SSDs per server could support several million 4KB IOPS or over 6.5GB/s (>50Gb/s)—very impressive results for a cluster that size.

Figure 2: NVMesh leverages RDDA and RDMA to allow fast storage sharing with minimal latency and without consuming CPU cycles on the target. The control path passes through the management module and CPUs but the data path does not, eliminating potential performance bottlenecks.

 

Integrates with Docker and OpenStack

Another feature NVMesh has over the standard NVMe-oF 1.0 protocol is that it supports integration with Docker and OpenStack. NVMesh includes plugins for both Docker Persistent Volumes and Cinder, which makes it easy to support and manage container and OpenStack block storage. In a world where large clouds increasingly use either OpenStack or Docker, this is a critical feature.

Figure 3: Excelero’s NVMesh includes plug-ins for both Docker and OpenStack Cinder, making it easy to use it for both container and cloud block storage.

 

 

Another Step Forward in the NVMe-oF Revolution

The launch of Excelero’s NVMesh is an important step forward in the ongoing revolution of NVMe over Fabrics. The open source solution supports high performance but only with a centralized storage solution and without many important storage features. The NVMe-oF array solutions offer a proven appliance solution but some customers want a software-defined storage option built on their favorite server hardware.  Excelero offers them all of these features together: hyper-converged infrastructure, NVMe over Fabrics technology, and software-defined storage.

 

Supporting Resources:

Storage Predictions for 2017

Looking at what’s to come for storage in 2017, I find three simple and easy predictions which lead to three more complex predictions.  Let’s start with the easy ones:

  • Flash keeps taking over
  • NVMe over Fabrics remains the hottest storage technology
  • Cloud continues to eat the world of IT

 

Flash keeps taking over

Every year, for the past four years, has been “The Year Flash Takes Over” and every year flash owns a growing minority of storage capacity and spend, but it’s still in the minority. 2017 is not the year flash surpasses disk in spending or capacity — there’s simply not enough NAND fab capacity yet, but it is the year all-flash arrays go mainstream. SSDs are now growing in capacity faster than HDDs (15TB SSD recently announced) and every storage vendor offers an all-flash flavor. New forms of 3D NAND are lowering price/TB on one side to compete with high capacity disks while persistent memory technologies like 3D-XPoint (while not actually buillt on NAND flash) are increasing SSD performance even further above that of disk. HDDs will still dominate low price, high-capacity storage for some years, but are rapidly becoming a niche technology.

1

Figure 1: TrendFocus 2015 chart shows worldwide hard drive shipments have fallen since 2010. Flash is one major reason, cloud is another.

 

According to IDC (Worldwide Quarterly Enterprise Storage Systems Tracker, September 2016) in Q2 2016 the all-flash array (AFA) market grew 94.5% YoY while the overall enterprise storage market grew 0%, giving AFAs 19.4% of the external (outside the server) enterprise storage systems market. This share will continue to rise.

2

Figure 2: Wikibon 2015 forecast predicts 4-year TCO of flash storage dropped below that of hard disk storage in 2016. 

 

NVMe over Fabrics (NVMe-oF) remains the hottest storage technology

It’s been a hot topic since 2014 and it’s getting hotter, even though production deployments are not yet widespread. The first new block storage protocol in 20 years has all the storage and SSD vendors excited because it makes their products and the applications running on them work better.  At least 4 startups have NVMe-oF products out with POCs in progress, while large vendors such as Intel, Samsung, Seagate, and Western Digital are demonstrating it regularly. Mainstream storage vendors are exploring how to use it while Web 2.0 customers want it to disaggregate storage, moving flash out of each individual server into more flexible, centralized repositories.

It’s so hot because it helps vendors and customers get the most out of flash (and other non-volatile memory) storage. Analyst G2M, Inc. predicts the NVMe market will exceed $57 Billion by 2020, with a compound annual growth rate (CAGR) of 95%. They also claim say 40% of AFAs will use NVMe SSDs by 2020, and hundreds of thousands of those arrays will connect with NVMe over Fabrics.

3

Figure 3: G2M predicts incredibly fast growth for NVMe SSDs, servers, appliances, and NVMe over Fabrics.

 

Cloud continues to eat the world of IT 

Nobody is surprised to hear cloud is growing faster than enterprise IT. IDC reported cloud (public + private) IT spending for Q2 2016 grew 14.5% YoY while traditional IT spending shrank 6% YoY. Cloud offers greater flexibility and efficiency, and in the case of public cloud the ability to replace capital expense investments with a pure OpEx model.

It’s not a panacea, as there are always concerns about security, privacy, and speed of access. Also, larger customers often find that on-premises infrastructure — often set up as private cloud — can cost less than public cloud in the long run. But there is no doubting the inexorable shift of projects, infrastructure, and spending to the cloud. This shift affects compute (servers), networking, software, and storage, and drives both cloud and enterprise customers to find more efficient solutions that offer lower cost and greater flexibility.

4

Figure 4: IDC Forecasts cloud will consume >40% of IT infrastructure spending by 2020. Full chart available at:  http://chartchannel.icharts.net/chartchannel/worldwide-cloud-it-infrastructure-market-forecast-deployment-type-2015-2020-shares

 

OK Captain Obvious, Now Make Some Real Predictions!

Now let’s look at the complex predictions which are derived from the easy ones:

  • Storage vendors consolidate and innovate
  • Fibre Channel continues its slow decline
  • Ceph grows in popularity for large customers
  • RDMA becomes more prevalent in storage

 

Traditional storage vendors consolidate and innovate

Data keeps growing at over 30% per year but spending on traditional storage is flat. This is forcing vendors to fight harder for market share by innovating more quickly to make their solutions more efficient, flexible, flash-focused, and cloud-friendly. Vendors that previously offered only standalone arrays are offering software-defined options, cloud-based storage, and more converged or hyper-converged infrastructure (HCI) options. For example, NetApp offers options to replicate or back up data from NetApp boxes to Amazon Web Services, Dell/EMC HDS, and IBM all sell converged infrastructure racks. In addition, startup Zadara Storage offers enterprise storage-as-a-service running either in the public cloud or as on-premises private cloud.

Meanwhile, major vendors all offer software versions of some of their products instead of only selling hardware appliances. For example, EMC ScaleIO, IBM Spectrum Storage, IBM Cloud Object Storage (formerly CleverSafe), and NetApp ONTAP Edge are all available as software that runs on commodity servers.

The environment for flash startups is getting tougher because all the traditional vendors now offer their own all-flash flavors. There are still startups making exciting progress in NVMe over Fabrics, object storage, hyper-converged infrastructure, data classification, and persistent memory, but only a few can grow into profitability on their own. 2017 will see a round of acquisitions as storage vendors who can’t grow enough organically look to expand their portfolios in these areas.

 

Fibre Channel Continues its Downward Spiral

One year ago I wrote a blog about why Fibre Channel (FC) is doomed and all signs (and analyst forecasts) point to its continued slow decline. All the storage trends around efficiency, flash, performance, big data, Ceph, machine learning, object storage, containers, HCI, etc. are moving against Fibre Channel. (Remember the “Cloud Eats the World” chart above? They definitely don’t want to use FC either.) The only thing keeping FC hopes alive is the rapid growth of all-flash arrays, which deploy mostly FC today because they are replacing legacy disk or hybrid FC arrays. But even AFAs are trending to using more Ethernet and InfiniBand (occasionally direct PCIe connections) to get more performance and flexibility at lower cost.

The FC vendors know the best they can hope for is to slow the rate of decline, so all of them were betting on growing their Ethernet product lines. More recently the FC vendors (Emulex, QLogic, Brocade) have been acquired by larger companies, but not as hot growth engines but rather so the larger companies can milk the cash flow from the expensive FC hardware before their customers convert to Ethernet and escape.

 

Ceph grows in Popularity for Large Customers

Ceph — both the community version and Red Hat Ceph Storage — continues to gain fans and use cases. Originally seen as suited only for storing big content on hard drives (low-cost, high-capacity storage), it’s now gained features and performance making it suitable for other applications. Vendors like Samsung, SanDisk (now WD), and Seagate are demonstrating Ceph on all-flash storage, while Red Hat and Supermicro teamed up with Percona to show Ceph works well as database storage (and is less expensive than Amazon storage for running MySQL).  I wrote a series of blogs on Ceph’s popularity, optimizing Ceph performance, and using Ceph for databases.

Ceph is still the only storage solution that is software-defined, open source, scale-out and offering enterprise storage features (though Lustre is approaching this as well). Major contributors to Ceph development include not just Red Hat but also Intel, the drive/SSD makers, Linux vendors (Canonical and SUSE), Ceph customers, and, of course, Mellanox.

In 2016, Ceph added features and stability to its file/NAS offering, CephFS, as well as major performance improvements for Ceph block storage. In 2017, Ceph will improve performance, management, and CephFS even more while also enhancing RDMA support. As a result, its adoption grows beyond its traditional base to add Telcos, cable companies, and large enterprises who want a scalable software-defined storage solution for OpenStack.

 

 

RDMA More Prevalent in Storage

RDMA, or Remote Direct Memory Access, has actually been prevalent in storage for a long time as a cluster interconnect and for HPC storage. Just about all the high-performance scale-out storage products use Mellanox-powered RDMA for their cluster communications — examples include Dell FluidCache for SAN, EMC XtremIO, EMC VMAX3, IBM XIV, InfiniDat, Kaminario, Oracle Engineered Systems, Zadara Storage, and many implementations of Lustre and IBM Spectrum Scale (GPFS).

The growing use of flash media and intense interest in NVMe-oF are accelerating the move to RDMA. Faster storage requires faster networks, not just more bandwidth but also lower latency, and in fact the NVMe-oF spec requires RDMA to deliver its super performance.

5

Figure 5: Intel presented a chart at Flash Memory Summit 2016 showing how the latency of storage devices is rapidly decreasing, leading to the need to decrease software and networking latency with higher-speed networks (like 25GbE) and RDMA.

In addition to the exploding interest in NVMe-oF, Microsoft has improved support for RDMA access to storage in Windows Server 2016, using SMB Direct and Windows Storage Spaces Direct, and Ceph RDMA is getting an upgrade. VMware has enhanced support for iSER (iSCSI Extensions for RDMA) in VSphere 2016 and more storage vendors like Oracle (in tape libraries) and Synology have added iSER support to enable accelerated client access. On top of this, multiple NIC vendors (not just Mellanox) have announced support for RoCE (RDMA over Converged Ethernet) on 25, 40, 50, and 100Gb Ethernet speeds. These changes all mean more storage vendors and storage deployments will leverage RDMA in 2017.

 

So Let’s Get This Party Started

2017 promises to be a super year for storage innovation. With technology changes, disruption, and consolidation, not every vendor will be a winner and not every storage startup will find hockey-stick growth and riches, but it’s clear the storage hardware and software vendors are working harder than ever, and customers will be big winners in many ways.

 

 

Ceph For Databases? Yes You Can, and Should

Ceph is traditionally known for both object and block storage, but not for database storage. While its scale-out design supports both high capacity and high throughput, the stereotype is that Ceph doesn’t support the low latency and high IOPS typically required by database workloads.

However, recent testing by Red Hat, Supermicro, and Percona—one of the top suppliers of MySQL database software—show that Red Hat Ceph Storage actually does a good job of supporting database storage, especially when running it on multiple VMs, and it does very well compared to running MySQL on Amazon Web Services(AWS).

In fact, Red Hat was a sponsor of Percona Live Europe last week in Amsterdam, and it wasn’t just to promote Red Hat Enterprise Linux. Sr. Storage Architect Karan Singh presented a session “MySQL and Ceph: A tale of two friends.”

Capture

 

Figure 1: This shadowy figure with the stylish hat has been spotted storing MySQL databases in a lab near you.

 

MySQL Needs Performance, But Not Just Performance

The front page of the Percona Europe web site says “Database Performance Matters,” and so it does. But there are multiple ways to measure database performance—it’s not just about running one huge instance of MySQL on one huge bare metal server with the fastest possible flash array. (Just in case that is what you want, check out conference sponsor Mangstor, who offer a very fast flash array connected using NVMe Over Fabrics.)  The majority of MySQL customers also consider other aspects of performance:

  • Performance across many instances: Comparing aggregate performance of many instances instead of just one large MySQL instance
  • Ease of deployment: The ability to spin up, manage, move and retire many MySQL instances using virtual machines.
  • Availability: Making sure the database keeps running even in case of hardware failure, and can be backed up and restored in case of corruption.
  • Storage management: Can the database storage be centralized, easily expanded, and possibly shared with other applications?
  • Price/Performance: Evaluating the cost of each database transaction or storage IOP.
  • Private vs. Public Cloud: Which instances should be run in a public cloud like AWS vs. in a private, on-premises cloud?

It’s common for customers to deploy many MySQL instances to support different applications, users, and projects. It’s also common to deploy them on virtual machines, which makes more efficient use of hardware and simplifies migration of instances. For example a particular MySQL instance can be given more resources when it’s hot then moved to an older server when it’s not.

Likewise it’s preferred to offer persistent, shared storage which can scale up in both capacity and performance when needed. While a straight flash array or local server flash might offer more peak performance to one MySQL instance, Ceph’s scale-out architecture makes it easy to scale up the storage performance to run many MySQL instances across many storage nodes. Persistent storage ensures the data continues to exist even if the database instances goes away. Ceph also features replication and erasure coding to protect against hardware failure and snapshots to support quick backup and restore of databases.

As for the debate between public vs. private cloud, it has too many angles to cover here, but clearly there are MySQL customers who prefer to run in their own datacenter rather than AWS, and others who would happily go either way depending which costs less.

2

 

Figure 2: Ceph can scale out to many nodes for both redundancy and increased performance for multiple database instances.

But the questions remain: can Ceph perform well enough for a typical MySQL user, and how does it compare to AWS in performance and price? This is what Red Hat, Supermicro, and Percona set off to find out.

 

3

 

Figure 3: MySQL on AWS vs. MySQL on Red Hat Ceph Storage. Which is faster? Which is less expensive?

First Red Hat ran baseline benchmarks on AWS EC2 (r3.2xlarge and m4.4xlarge) using Amazon’s Elastic Block Storage (EBS) with provisioned IOPS set to 30 IOPS/GB, testing with Sysbench for 100% read and 100% write. Not surprisingly, after converting from Sysbench numbers (requests per second per MySQL instance) to IOPS, AWS performance was as advertised—30 read IOPS/GB and 26 write IOPS/GB.

Then they tested the Ceph cluster illustrated above: 5 Supermicro cloud servers (SSG-6028R-E1CF12L) with four NVMe SSDs each, plus 12 Supermicro client machines on dual 10GbE networks. Software was Red Hat Ceph Storage 1.3.2 on RHEL 7.2 with Percona Server. After running the same Sysbench tests the Ceph cluster at 14% and 87% capacity utilization, they found read IOPS/GB were 8x or 5x better, while write IOPS/GB were 3x better than AWS at 14% utilization.  At 87% utilization of the Ceph cluster, write IOPS/DB were 14% lower than AWS due to the write amplification from the combination of InnoDB write buffering, Ceph replication, and OSD journaling.

4

Figure 4: Ceph private cloud generated far better write IOPS/GB at 14% capacity and slightly lower IOPS/GB at 72% and 87% capacity.

 

What about Price/Performance?

The Ceph cluster was always better than AWS for reads and much better than AWS for writes when nearly empty but slightly slower than AWS for writes when nearly full. On the other hand when looking at the cost per IOP for MySQL writes, Ceph was far less expensive than AWS in all scenarios. In the best case Ceph was less than 1/3rd the price/IOP and in the worst case half the price/IOP, vs. AWS EBS with provisioned IOPS.

b

Figure 5: MySQL on a Ceph private cloud showed much better (lower) price/performance than running on AWS EBS with Provisioned IOPS.

 

What Next for the Database Squid?

Having shown good performance chops running MySQL on Red Hat Ceph Storage, Red Hat also looked at tuning Ceph block storage performance, including RBD format, RBD order, RBD fancy striping, TCP settings, and various QEMU settings. These are covered in the Red Hat Summit presentation and Percona webinar.

For the next phase in this database testing, I’d like to see Red Hat, Supermicro, and Percona test larger server configurations that use more flash per server and faster networking. While this test only used dual 10GbE networks, previous testing has shown that using Mellanox 40 or 50Gb Ethernet can reduce latency and therefore increase IOPS performance for Ceph, even when dual 10GbE networks provide enough bandwidth. It would also be great to demonstrate the benefits of Ceph replication and cluster self-healing features for data protection as well as Ceph snapshots for nearly instant backup and restore of databases.

My key takeaways from this project are as follows:

  • Ceph is a good choice for many MySQL use cases
  • Ceph offers excellent performance and capacity scalability, even if it might not offer the fastest performance for one specific instance.
  • Ceph performance for MySQL compares favorably with AWS EBS Provisioned IOPS
  • You can build a private storage cloud with Red Hat Ceph Storage with a lower price/capacity and price/performance than running on AWS.

If you’re running a lot of MySQL instances, especially on AWS, it behooves you to evaluate Ceph as a storage option. You can learn more about this from the PerconaLive and Red Hat Summit presentations linked below.

Supporting Resources:

 

No Wrinkles as Mellanox Powers NVMe over Fabrics Demos at Flash Memory Summit and IDF

Mellanox just rounded out a two very busy weeks with back-to-back trade shows related to storage. We were at Flash Memory Summit August 9-11 in Santa Clara, followed by Intel Developer Forum (IDF) August 16-18 in San Francisco. A common theme was seeing Mellanox networking everywhere for demonstrating the performance of flash storage.

The fun began at Flash Memory Summit with several demos of NVMe over Fabrics (NVMe-oF). As my colleague Rob Davis wrote in his blog, the 1.0 standard and community drivers were just released in June 2016, and while FMS 2015 also featured NVMe-oF demos from Mangstor, Micron and PMC Sierra (now Microsemi), all were pre-standard and only Mangstor had a shipping product. Plus all the demos ran only on Linux.

blog

Figure 1: NVMe over Fabrics is nearly always powered by RoCE (RDMA over Converged Ethernet)

So it was extremely exciting this year to see FIVE demos of NVMe over Fabrics at FMS using Mellanox networking, with three of them available as products. All the demos either used the standard NVMe-oF drivers or were compatible with the standard drivers, and they showed initiators running on Windows and VMware, not just Linux.

  • E8 Storage showed a distributed, scale-out NVMe-oF software-defined storage solution
  • Mangstor showed a high-performance, scale-up NVMe-oF array, with initiators running on bare-metal Linux and on a Linux VM running on top of VMware ESXi
  • Micron showed a Windows NVMe-oF initiator interoperating with a Linux target
  • Newisys (division of Sanmina) showed a live NVMe-oF demo
  • Pavilion Data showed a super dense NVMe-oF custom array supporting up to 460TB, 40x40GbE connections, and up to 20 million IOPS, all in one 4RU box.

blog 2

Figure 2: Pavilion Data’s custom-engineered all-flash array supports up to 460TB of raw capacity, 120GB/s of throughput, and 20M IOPS, all running NVMe-oF with up to forty 40GbE connections.

But NVMe over Fabrics wasn’t the only flash demo to leverage Mellanox networking! Samsung demonstrated an impressive Windows Storage Spaces Direct (S2D) cluster that reached 80GB/s (640 Gb/s) of data throughput. It used just 4 Dell servers, each with 4 Samsung NVMe SSDs and two Mellanox ConnectX-4 100GbE RDMA-enabled NICs, all connected by Mellanox’s Spectrum 2700 100GbE switch and LinkX® cables. Samsung also showed an all-flash reference design with 24 NVMe SSDs, capable of supporting several storage solutions including Ceph.

Nimbus Data unveiled a new family of flashy arrays which all support iSER (iSCSI Extensions for RDMA) on top of RoCE. Nexenta and Mellanox released a joint white paper showing how to deploy a hyper-converged software-defined storage (NexentaEdge) solution using Micron SSDs and Mellanox 50Gb Ethernet.

blog 3

 

Figure 3: Nimbus Data’s Exaflash C-series supports up to 3PB raw flash and can connect at 100Gb/s with either Ethernet or InfiniBand

At IDF a week later, there were more flashy demos. This time HGST (a Western Digital Brand), Seagate, and Samsung, showed NVMe over Fabrics using Mellanox adapters. Newisys and E8 Storage returned with their NVMe-oF demos, while Samsung also brought back their glorious Windows S2D cluster. To add to the storage excitement, Plexistor showed a solution for Shared Persistent Memory (uses technology similar to NVMe over Fabrics). Atto demonstrated ThunderLink which connects Thunderbolt 3 devices to 40Gb Ethernet networks, and Nokia showed their Airframe OCP rack.

 

blog 4blog5

 

 

 

 

 

 

 

 

 

Figure 4: Seagate showed a 2U NVMe-oF system with 24 Seagate Nytro XF1440 NVMe SSDs, while Atto’s ThunderLink™ connects Thunderbolt™ 3 devices to 40GbE networks.

Even Intel themselves showed NVMe over Fabrics with Mellanox ConnectX-4 100GbE NICs, paired with their Storage Performance Developer Kit (SPDK) and an Intel Silicon Photonics 100GbE cable. (Mellanox LinkX cables also support Silicon Photonics for 100GbE speeds at distances up to 2km.)

blog 6

Figure 5: Intel showed NVMe over Fabrics using their SPDK software and Mellanox ConnectX-4 adapters.

The common thread across these demos at FMS and IDF? They all used Mellanox ConnectX-3 or ConnectX-4 network adapters, and they all ran at speeds of 25Gb/s or faster (many at 100Gb/s).  In fact as far as I could see, every single demonstration of NVMe over Fabrics used Mellanox adapters, except for demos by other network adapter or chip vendors who showed their own networking.

This is not surprising given that Mellanox adapters and switches are the first to support 25, 50, and 100GbE speeds, and the first and best at supporting low-latency RDMA— via InfiniBand or RoCE—for super-efficient data movement. In addition, ConnectX-4 makes RoCE—and thus NVMe over Fabrics—deployments easier by allowing RoCE to run with Priority Flow Control (PFC) or Explicit Congestion Notification (ECN), or both (see my blog about that).

The key takeaways from these recent events are as follows:

  • NVMe over Fabrics is now a released standard with working products from several vendors
  • NVMe-oF support is expanding to Windows and VMware, no longer Linux-only
  • The speed of flash absolutely requires faster network speeds: 25, 40, 50, or even 100Gb/s
  • RoCE on Mellanox adapters is by far the most popular RDMA solution for supporting NVMe over Fabrics
  • Other flash storage solutions—such as Windows Storage Spaces, NexentaEdge, Ceph, and Plexistor—also choose Mellanox networking for the higher performance and efficiency

Many of the presentations—some given by me and my colleagues—from these two shows are now available online (links in the Resources section below). And if you’d like to see more solutions leveraging the power and efficiency of Mellanox networking, look for Mellanox at an upcoming event near you.

Supporting Resources:

 

Resilient RoCE Relaxes RDMA Requirements

RoCE — or RDMA over Converged Ethernet — has already proven to be the most popular choice for cloud deployments of Remote Direct Memory Access (RDMA). And it’s increasingly being used for fast flash storage access, such as with NVMe Over Fabrics. But some customers prefer not to configure their networks to be lossless using priority flow control (PFC). Now, with new software from Mellanox, RoCE can be deployed either with or without PFC, depending upon customer network requirements, infrastructure, and preference. This makes RoCE easier to deploy for more customers and will accelerate adoption of RDMA.

Background: Why RDMA?

The increasing speed of CPUs, networks, and storage (flash) have amplified the advantages of RDMA, making it more popular. As CPUs and storage get faster, they support faster network speeds such as 25, 40, 50, and 100GbE. But, as network speeds increase, more of the CPU cores are devoted to handling network traffic with its related data copies and interrupts. And as solid-state storage offers ever lower latencies, the network stack latency becomes a greater and greater part of the total time to access data.

Roce1

Figure 1: As storage gets faster, software latency becomes a larger part of total data access latency. (Source: Intel presentation on SPDK, May 2016.)

RDMA solves both of these issues by reducing network latency and offloading the CPU. It uses zero-copy and hardware transport technology to transfer data directly from the memory of one server to another (or from server to storage) without making multiple copies, and hardware offloads relieve the CPU from managing any of the networking. This means that with RoCE, more CPU cores are available to run the important applications and the lower latency lets faster storage like flash shine.

Roce2

Figure 2: RDMA increases network efficiency by transferring data directly to memory and bypassing the CPU. (Source: RoCE Initiative.)

The Purpose of Ethernet Flow Control

It’s clear that all RDMA performs best without packet loss, simply because detecting and retransmitting lost packets causes delays, no matter what protocol is used. The faster the network gets — such as 25, 40, 50, and 100GbE speeds — the greater the relative effect of packet loss and the more valuable to avoid packet loss.

RoCE has built-in error correction and retransmission mechanisms so it does not require a lossless network, however initial implementations recommended lossless networks. The most common source of packet loss within the datacenter is traffic overload on ports, such as an incast situation. So, it was recommended that customers deploy RoCE with Priority Flow Control (PFC).

PFC is part of the Ethernet Data Center Bridging (DCB) specification, originally implemented to support FCoE, which requires a lossless network. It acts like a traffic light or traffic cop at intersections, preventing collisions and avoiding packet loss from overloaded switch ports. The “Priority” in PFC allows traffic to be grouped into several classes so more important or latency-sensitive packets (for example storage or RDMA traffic) get priority over less latency-sensitive traffic.

 

 

 

 

 

traffic-cop-police-in-new-york-city-street-4k_1

 

Figure 3: PFC prevents packet loss on busy networks, just like a traffic cop prevents accidents at busy intersections.

Priority Flow Control

Priority Flow Control works very well, all major enterprise switches (including Mellanox switches) support it, and it’s been successfully deployed with RoCE in very large networks. In fact, because PFC eliminates packet loss from port overload, it effectively makes any datacenter network lossless. However, PFC requires the network administrators to set up VLANs and configure the flow control priorities, and some network administrators prefer not to do this.

ECN Eliminates Congestion for Smoother Network Flows

But there is an alternative mechanism to avoid packet loss, which leverages Explicit Congestion Notification (ECN). ECN allows switches to notify hosts when congestion is likely to happen, and the end nodes adjust their data transmission speeds to prevent congestion before it occurs.

The RoCE congestion management protocol takes advantage of ECN to avoid congestion and packet loss. ECN capable switches detect when a port is getting too busy and marks outbound packets from that port with the Congestion Experienced (CE) bit. The receiving NIC sees the CE indication and notifies the sending NIC with a Congestion Notification Packet (CNP). In turn, the sending NIC backs off its sending rate temporarily to prevent congestion from occurring. Once the risk of congestion declines sufficiently, the sender resumes full-speed data transmission.

Roce4

Figure 4: RoCE congestion management leverages ECN to avoid both congestion and packet loss.

It’s like putting all the RoCE packets into self-driving cars which sense and avoid traffic jams using the data shared from all the other cars and local businesses. If a red light is ahead, the cars slow down so they won’t hit the red light, instead arriving at the intersection during the next green light.

Of course, ECN isn’t new. What is new is the latest software release that takes advantage of the advanced hardware mechanisms in the Mellanox ConnectX®-4 and ConnectX-4 Lx adapters which are optimized for deployment with ECN. Of course, you can still use PFC alone. You can even use both in a, “belt and suspenders” approach where ECN prevents congestion but just in case, PFC steps in as a, “traffic cop” to prevent packet loss and keep flows orderly.

Roce5

Figure 5: RoCE can be deployed with ECN only, PFC only, or both, if you want to ensure your pants (or network flows) won’t fall down.

It’s the Same RoCE Specification as Before

To be clear, this is still the same RoCE specification and wire protocol, which hasn’t changed. It’s simply an enhanced implementation of RoCE, leveraging the improved features and capability of the Mellanox ConnectX-4 adapter family and the ECN support found in advanced switches, including the Mellanox Spectrum switch family. Different RoCE capable adapters still interoperate exactly as before.

Resilient RoCE delivers RDMA performance on lossy networks that performs on par with lossless networks and substantially better than protocols that rely on TCP/IP for error recovery. It gives customers more flexibility to deploy RDMA in the way that best suits their network architecture and performance needs. Some customers will deploy only PFC, some will deploy only ECN, and some will deploy both.

RoCE Continues to Improve and Evolve

Resilient RoCE continues the evolution of RoCE to serve the needs of both bigger networks and more types of enterprise and cloud customers.

  • 2013: First RoCE NICs shipped which are L3-routable
  • 2014: L3-routable RoCE standard approved
  • 2015 (June): Soft-RoCE lets any NIC run RoCE (though only rNICs offer the hardware acceleration and offload)
  • 2015 (October): RoCE plugfest proves multiple RoCE rNIC vendors can interoperate
  • 2016: Resilient RoCE lets RoCE run on lossless or lossy networks

Roce6

Figure 6: RoCE continues to evolve and improve (source: Mellanox and InfiniBand Trade Association).

RoCE On!

It’s clear why RoCE is the most popular way to use RDMA over Ethernet—it provides the best performance and greatest efficiency. Now, with the addition of Soft-RoCE and the ability to operate with or without lossless networks, RoCE has the most flexibility and largest ecosystem of any Ethernet-based RDMA technology.

RESOURCES:

 

 

The Drive for 25: HPE Introduces New 25GbE NICs

race-car-track-close-shot

At the Discover Conference earlier this month, HPE introduced exciting new 25G networking technology in their “Drive for 25” including dual-port 25GbE adapters, in both mezzanine and stand-up PCIe card form factors. These new adapters — based on the Mellanox ConnectX-4 Lx silicon — enable cloud and enterprise customers to improve network performance and efficiency while lowering total cost of ownership (TCO).

fig 1

Figure 1: HPE dual-port 25GbE adapter in both mezzanine (640SFP28) and PCIe card (640FLR_SFP28) formats, both based on the Mellanox ConnectX-4 Lx silicon.

Increasing Demand for 25GbE

With the increasing levels of performance coming out of HPE servers, applications frequently need more network bandwidth. 25GbE is ideal for many workloads and servers, providing 2.5x more bandwidth than 10GbE on each port. It accelerates many workloads including database, virtualization, video streaming, high-frequency trading (HFT), and network function virtualization (NFV). 25GbE — along with its close cousins 50GbE and 100GbE, also accelerates the new generation of infrastructure including hyper-converged infrastructure, in-memory computing, software-defined storage, and big data.

fig 2

Figure 2: HPE offers new speedy 25GbE adapters as part of the “Drive to 25” solution

 Two Ports, Flexible Connection Options

These new HPE 25GbE adapters each support two SFP28 ports to allow for high-availability or connection to multiple physical networks. Using a SFP28 form factor allows each port on the adapter to support many connectivity options, giving HPE customers the ability to choose the best cabling option for their needs:

  • 10GbE or 25GbE speeds
  • Copper or fiber optic cabling
  • Cables and transceivers supporting distances from 0.5M (50cm) to 10km
  • No breakout cables required
  • Ability to re-use existing structured 10GbE fiber for 25GbE connections

Advanced Support for Public and Private Cloud Workloads

These new HPE adapters, with Mellanox ConnectX-4 technology, also support advanced cloud offloads to improve packet processing speeds and maximize performance in virtualized environments. They include features to optimize video streaming, and support Remote Direct Memory Access (RDMA) using the RDMA over Converged Ethernet (RoCE) protocol.

As customers increasingly deploy HPE servers to handle cloud workloads and as network speeds increase, the smart offloads in these new HPE adapters offload the CPU and reduce network latencies. This delivers more CPU power to the applications. HPE customers will also leverage the increased bandwidth and efficiency to create more efficient software-defined storage and hyper-converged infrastructure solutions.

Highest Performance and Efficiency

The HPE 640SFP28 and 640FLR-SFP28 adapters come with impressive speed and green credentials as well. They feature some of the lowest latency and highest message rates of any 25GbE NIC, as well as very low power consumption for efficiency and a fan-less design for maximum reliability. The smart offloads allow more work to be accomplished more quickly by fewer CPU cores, and the two-port SFP28 design mentioned earlier allows a broad choice of the most efficient cabling for the distances required, including the ability to re-use existing structured fiber. (HPE also offers an EDR IB and 100Gb Ethernet adapter based on the Mellanox ConnectX-4 silicon.)

Mellanox Helps HPE Lead in Server Innovation

By offering 25GbE adapters with flexible ports and smart offloads, HPE and Mellanox are helping customers to build more efficient datacenters. This, “Drive to 25” is another example of the technology leadership that has made HPE a leader in server and networking technology for the last 25 years and Mellanox is proud to be an HPE server networking partner.

RESOURCES

 

 

25 Is the New 10, 50 Is the new 40, 100 Is the New Amazing

(This blog was inspired by an insightful article in EE Times, written by my colleague, Chloe Jian Ma.)

The latest buzz about Ethernet is that 25GbE is coming. Scratch that, it’s already here and THE hot topic in the Ethernet world, with multiple vendors sampling 25GbE wares and Mellanox already shipping an end-to-end solution with adapters, switches and cables that support 25, 50, and 100GbE speeds. Analysts predict 25GbE sales will ramp faster than any previous Ethernet speed.

Why?????? What’s driving this shift?

John Kim 030416 Fig 1Figure 1: Analysts predict 25/40/50/100GbE adapters reach 57% of a $1.8 Billion USD high-speed Ethernet adapter market by 2020. (Based on Crehan Research data published January 2016.)

These new speeds are so hot that, like the ageless celebrities you just saw on the Oscar Night red carpet, we say “25 is the new 10 and 50 is the new 40.” But whoa! Sure everyone wants to look younger for the camera, but no 25-year old actor wants to look 10. More importantly, why would anyone want 25GbE or 50GbE when we already have 40GbE and 100GbE?

Continue reading

Ethernet Is the New Storage Network

I recently saw an infographic titled “2015 Data Storage Roadmap” and was pleasantly surprised to see Mellanox listed under the storage networking section. The side comment was “Ethernet Becoming The Standard Storage Network.”

 

John Kim 030416 Fig new storage networkFigure 1: Tech Expectations blog infographic shows the new storage networking vendors. (Graphic excerpted from the larger original graphic, which is available here.)

 

 

Why surprised? Because in the past, when people said “Storage Networking” they usually meant Fibre Channel. But the growth of cloud, software-defined, and scale-out storage, as well as hyper-converged and big data solutions, have all made Ethernet the new standard storage network (rather than Fibre Channel), just as the infographic above says. Since Mellanox is the leading vendor of networking equipment for speeds above 10Gb/s, it’s really not a surprise after all to have Mellanox on the leaderboard.

 

 

Continue reading