All posts by Bill Webb

About Bill Webb

Bill Webb is Director of Ethernet Switch Sales at Mellanox Technologies, Inc. In this role, he evangelizes the benefits of Mellanox’s Ethernet switch portfolio in scale-out data center networking, storage, cloud computing, and network-accelerated AI/machine learning environments. Bill has spent over 20 years in the networking industry in a variety of sales, engineering, and management roles. Most recently, Bill worked at Concurrent, where he introduced a Ceph-based scale-out storage product for media streaming and at Ciena, where he led a team developing first generation Software Defined Networking applications. Bill started his career at Nortel Networks and then later worked at a several start-ups building fiber-to-the-premise technology.

The Case for Whale Sharks and Micro Data Centers

The last several years have experienced a huge migration of IT infrastructure to the public cloud.  And it all makes sense, let the cloud provider invent in, manage, and scale the infrastructure which leads to lower costs and improved organization agility for the end-user.

Let’s compare the public cloud to the whale shark – one of the largest animals in the world. While its sheer size is imposing, it is actually docile and works in tandem with smaller fish. The smaller fish, or pilot fish, help keep parasites away from the whale shark and in return, the whale shark acts as a body guard for the smaller fish.

Now, while large public cloud providers continue to grow (the whale sharks are getting bigger!), there is also a huge growth in the number of Edge or Micro Data Centers. The fish are multiplying, because they can be more agile and faster, and go places where the public cloud cannot.

Why?  Autonomous vehicles are becoming closer to reality. Smart Cities are emerging with the use of Internet of Things technology. Augmented and virtual reality (AR/VR) have seen huge advances. And, of course, enterprises are realizing that they must follow a hybrid strategy. They need to use aggregation data centers between their users and centralized cloud infrastructure, in addition to remote office, branch office (ROBO) hyperconverged (HCI) infrastructure.

In a recent blog, Yuval Bachar, Principal Engineer of Data Center Architecture of LinkedIn, noted that, “Looking at the number of locations and servers per locations globally, the number of nodes [in edge data centers] will accumulate to be larger than the number of compute nodes in the traditional cloud data centers within the next 3-5 years.”

Hybrid Cloud, Autonomous Vehicles, AR/VR, IoT – just some of the drivers toward Micro Data Centers.

This is resulting in the need for Micro Data Centers. In general, there are data centers with power consumption of less than 1MW; and in many instances, significantly less. The Micro Data Center can take many forms, including a data center rack on wheels, a shipping container, a modular, expandable structure, a Navy ship, and even existing cell sites and telecom huts.

A wide variety of Micro Data Center form factors.


No matter the form factor of the Micro Data Center, it’s still providing the following values:

  • Immediate Response and Low Latency; provide compute services as close to the edge as possible.
  • Data Processing; locally process huge amounts of data for immediate action, often utilizing machine learning and analytics.
  • Data Aggregation; aggregate and summarize data to centralized cloud, preventing the needs of large and expensive data movement across the WAN.
  • Interconnectedness; provide connectivity to the edge devices as well as centralized cloud resources and other Micro Data Centers.
  • Light Footprint; small physical form factors, as well as very low power and cooling needs, allowing for flexibility in where the Micro Data Center is deployed.

No wonder the whale shark hangs out with the pilot fish!

Mellanox builds Ethernet switches which are perfect for Micro Data Centers to large, scale-out data centers. This allows you to deploy the switch you need, where you need it, and leverage a single family of switches across all your data centers; small, medium, and large.

A wide range of Ethernet switch form factors – prefect for any data center, micro to very large.


Take, for instance, our newest switch addition, the SN2010. It provides 18×10/25G and 4x100G interfaces all in a half rack unit. This means that you can provide a hyperconverged cluster with a fully redundant network configuration in one rack unit. The dual switch configuration will only require 160 watts of power between both switches. This is perfect for a Micro Data Center requiring a light footprint for both physical space and power.

All of Mellanox’s  Spectrum switches utilize the same, best-in-class, Spectrum ASIC, built by Mellanox. They provide the predictable performance required for the real time processing done in Micro Data Centers. In fact, Mellanox Spectrum switches are the only low latency switch on the market for 25GbE and above networking. Moreover, they provide support for GPUDirect and RDMA over Converged Ethernet (RoCE) to significantly accelerate analytics and Machine Learning.

The Micro Data Center needs the ability to connect with other data centers. This will typically be a large, centralized data center (including public cloud), as well as other, peer Micro Data Centers. Mellanox Spectrum switches provide best-of-class EVPN and VxLAN support to accomplish just that.

Best-in-class, standards-based Data Center Interconnect solution.

Mellanox Spectrum switches provide significant scale advantages for VxLAN and DCI, which is critical as the number of Micro Data Centers increases. In addition, EVPN is a controller-less technology, which means the entire control plane is embedded in the switches – and Mellanox doesn’t charge extra for additional features. Therefore, you can leverage Mellanox switches for incredibly scalable DCI solutions, and do so incredibly cost-effectively.

Ease of deployment is critical for Micro Data Centers. When dozens or hundreds of Micro Data Centers are deployed or repositioned, automation and zero-touch provisioning is required to support cost-effective deployment. Mellanox supports many options through its NEO network orchestration system and a suite of playbooks for both Cumulus Linux and MLNX-OS network operating systems.

Micro Data Centers are one of the hottest topics now in the world of data center deployment and networking. While the shift to centralized cloud has been significant, there is nothing like having data processed and acted upon as close to the user as possible.

When building a Micro Data Center, the Mellanox Spectrum switches provide the perfect solution:

  • Unique form-factors – half-rack, low power (80 watts) all the way up to full density switches,
  • Predictable Performance – zero packet loss and low latency, accelerating real-time responses to users, at speeds of 10/25/50/100 GbE,
  • Data Center Interconnect – standards-based EVPN/VxLAN support with incredible scale and no additional cost and,
  • Ease of Deployment – NEO network orchestration and NetDevOps playbooks with Cumulus Linux and MLNX-OS.

What a catch! (Haha)

Supporting Resources:


Are You Flying Blind with Ceph?

Ceph storage is great.  It’s flexible – you can use it for file, block, and object storage – even at the same time.  It’s huge – in cloud environments, containers, microservices – the modern architectures.  It’s open – you can run it on any hardware you want.  It scales – you can keep adding storage nodes without the need for painful data migrations.  And it can be free – you can run the open source community version, or purchase support.

But, this sort of flexibility comes at a cost.  Out-of-the-box, Ceph is ready to run ‘ok’ for most use-cases.  Think family mini-van.  It can hold a fair amount, but it’s not the biggest on the road. So, maybe you really want something like an 18 wheeler.  Also, it can go a little bit above the speed limit, but that’s it and it will certainly take a while to get you where you want to go – maybe you really want something like a Porsche.

How can you make Ceph what you want – and have the visibility you need?  This blog will discuss how Mellanox Spectrum switches allow you Optimize, Operate, and Accelerate Ceph.


By its nature, Ceph has a many-to-one traffic pattern, also known as ‘incast’ traffic.  When data is written to Ceph, it is distributed evenly across all data nodes.  When a client reads data, it will directly read from the Ceph data nodes, resulting in a many-to-one communication pattern – incast.  Incast can cause microbursts, particularly on the client’s network port.  Replication, whether 3x or erasing coding, results in many-to-one on the cluster network.  Spectrum switches have significant advantages in support incast traffic.  Spectrum switches benefit from a unique, shared buffer architecture.  This means all buffering is available to all ports at any given time.  As a result, Spectrum switches offer 10x better microburst absorption compared to other switches.

Furthermore, anyone who was spent time with a Ceph deployment knows it needs to be tuned.  There are a myriads of settings that can be changed – Ceph settings, kernel settings, network settings…the list goes on.

When you change the settings, how do you know that it’s optimal?  Sure, run a storage benchmark, see that you get the throughput you want, and call it a day.  But, there could be trouble lurking.  You need very detailed insight into what’s happening on the network.

Fortunately for you, Mellanox Spectrum switches have the most advanced telemetry on the market today.  Every 128 nanoseconds, the Spectrum hardware can take samples of port queue depth and bandwidth.  These samples are then pulled into a histogram to show queue utilization over a short period of time.

The histogram data can then be used to detect very critical behavior in a Ceph cluster.  As you change tuning values, you can know if Ceph is –

  • Causing congestion in the network
  • Causing latency to increase over time (queue lengths gradually increasing)
  • Causing microbursts

The level of telemetry detail on Spectrum switches is far beyond anything seen on other switches.   For instance, in this picture, a competitor’s switch would show the queue length at time 19:19:49 as congestion.  By using Spectrum’s Advanced Telemetry, it’s clear that this is really a momentary microburst.  What you’d do as a Ceph optimizer is much different.  For congestion, you’d examine your client load and/or add more network capacity.  For a microburst, you’d likely look closely at the TCP, messenger, and thread tuning values.

So, next time you’re tuning Ceph, make sure you’re leveraging all the data possible.  Storage benchmarking only tells you part of the story – it’s the tip of the iceberg.  You need to know that the network is performing cleanly – as issues caused by improper tuning might rear its ugly head when things get crazy during operations….



Operating a Ceph cluster is a walk in the part, right?  After all, it’s self-healing, it intelligently distributes data across the cluster using the CRUSH algorithm, and let’s face it – Ceph is pure magic.

The reality is, people who have operated a Ceph cluster likely have some scars to prove it.  The author of this blog has had times where he’s felt like the Jack Torrance character from The Shining after hours of hours of pouring through Ceph logs, checking kernel counters, looking at network counters and details – trying to find out what caused the Ceph ‘slow request’. (If you don’t know what a Ceph ‘slow request’ is – be thankful!)

Things can get worse when there is a failure.  Ceph is self-healing.  Yes, it will rebuild data and rebalance the cluster.  But, this happens at a cost.  The same storage node CPUs that are processing client requests are also responsible for performing the recovery and backfill.  Even though a Ceph cluster might not have lost data during a failure, its performance can be severely degraded.

In this situation, the Spectrum switches provide benefits beyond anything else out there.  For one, Spectrum switches are the only switches that can provide Predictable Performance – no packet loss at any packet size, consistent 300 nanosecond latency, and shared buffers that provide 10x better microburst absorption.  This means that when Ceph is busy self-healing, the Spectrum switches are providing best-in-class performance, allowing the recovery to go smoothly.  If you want to learn more about Spectrum’s Predicable Performance, check out this Tolly Report – where Spectrum performance is proven by an unbiased, 3rd party to be far superior vs. alternatives vendors.

Furthermore, you need to know what’s going on – any information….any useful information.  The Advanced Telemetry data can be streamed to monitoring tools, in real-time.  And just like Ceph, these can be open source, free tools, such as Grafana.  You can then use the monitoring information as an additional feedback point when adjusting Ceph backfill tunables.

In addition, the monitoring information can be used to identify other problems during operations.  Odd network traffic can indicate other issues, like run-away clients, pre-failed hard drives, and much more.  Let’s face it, being scale-out storage, Ceph is as much dependent on a high performing network as it is on the storage.  And to operate Ceph successfully, you need all the information you can get from the network.

To close out discussing Ceph operations, let’s have our dessert.  We’ve already discussed the highly-granular telemetry information that is available from Spectrum switches.  Well, all of that information is available to anyone with direct access from the Spectrum SDK.

Spectrum switches have the ability to run Docker containers directly on the switches.  The containers can directly access the SDK to read information about the switch, and also interact with the Network OS to configure stuff.

Furthermore, a containerized agent can be responsible for storage service policy changes.  This allows the storage administrator the ability to change the network configuration, without yielding full control from networking group.  This helps break down the organization walls between groups – the storage guys now can get what they need from the network, while the network guys still maintain their control of the network.



Even if you haven’t seen the movie Zootopia, you might have seen the highly popular trailer that featured a slow-speaking sloth and a fast moving rabbit, the main character of the movie.

The comparison to storage is immense.  Traditional, magnetic storage – where a platter spins and a head moves to read data – is slow as a sloth.  Milliseconds…to…get…my…data…!  Compare that to solid state storage such as flash and optane – with access times measured in microsecond.  Fast as a rabbit!

What does this mean for the network?  A whole lot.  Mellanox has been the driving force behind a technology called Remote Data Memory Access, or RDMA.  In an Ethernet environment, it is called RDMA over Converged Ethernet, or RoCE.  RoCE allows applications to bypass the local operating system when transferring data across nodes – significantly increasing application performance and freeing up the CPU to do more useful things.  That’s a win-win – faster applications and lower cost, since your CPUs can run more applications.

We’ve done extensive benchmarking running Ceph with RoCE, and here are some of the results –

Mellanox Spectrum switches used in this benchmarking are the best RoCE switches on the market today.  The Predictable Performance, consistent 300 nanosecond latency, and 10x better microburst absorption, as discussed in the previous blog, is one part of it.

Beyond that, the Spectrum switches include superior congestion avoidance and handling – including Fast ECN (cutting 8ms+ off of time required to notify a client of congestion), and intelligent handling of flow control to avoid congestion spreading and victim flows.

The Advanced Telemetry features of Spectrum bring it all together.  The 128 nanosecond resolution of the buffer and bandwidth monitoring is crucial when storage can operation with sub-10 microsecond latency.  The smallest changes in network performance can have major impact on storage, making real-time monitoring even more crucial.



When deploying Ceph, it is critical that you have real-time visibility into the network.  It requires Optimization for your use-case and on-going insights as you Operate the cluster.  These needs significantly increase as you Accelerate Ceph when adopting solid state storage.

So, when building your Ceph network, the network must provide –

  • Predictable Performance – zero packet loss, consist low latency, superior microburst absorption
  • Advanced Telemetry – sub-microsecond sampling, real-time monitoring, on-switch agent deployment with direct SDK access
  • Network Acceleration – RDMA/RoCE, the best RoCE switch on the market