Overlay networks (such as VXLAN) are the most widely deployed implementation of Software Defined Networks (SDN). But when deploying an overlay SDN network, a bullet proof underlay networks is required. Many simply assume that the underlay network will perform reliably and predictably. But it turns out that at the highest network speeds, predictable performance is extremely hard to deliver, and some vendors actually fall short. Unfortunately, for application level and data center architects the unpredictability of the underlying network can be hidden from view. It is fruitless trying to debug unpredictable application behavior at a system or application level when it is the underlying network that is behaving chaotically and dropping packets. At Mellanox, we deliver predictable networks so that we take the network out of the equation and let providers and customers focus on only their applications – knowing that data communications just works. So for SDN deployments Spectrum is the best underlay for your overlay network.
For those not familiar with overlay networks, this SDN Whitepaper explains more about overlay networks and other options to implement SDN networks.
In order to achieve predictable performance, it’s important to understand how modern, open networking equipment is built. At this year’s Open Compute Project (OCP) Summit in San Jose, we introduced Open Composable Networks (OCN) – which represents the realization of the vision of the Open-Ethernet initiative first launched early in 2013. OCN demonstrates the power of open networking as is explained in the blog: Why are Open Composable Networks like Lego?
By disaggregating switches, OCN enables customers to choose the best hardware and the best software. At Mellanox, we are happy to provide customers with solutions at multiple levels, as we know that fundamentally we deliver predictable performance with the best switching solutions available, from the platform all the way down to the ASIC level. This blog provides the details to support that claim and on how the Spectrum switches deliver predictable performance.
The most obvious advantages of the Spectrum switch are 37% lower power and less than half the latency of Broadcom devices. But, in fact, Predictable Performance is perhaps even more important to application performance and customer experience.
Today’s advanced switching devices are complex beasts and unfortunately sometimes all their features get reduced to a short list of simple bullets. So when comparing the Mellanox Spectrum based switches to Broadcom Tomahawk based offerings (Tolly Report), one might make the error of thinking they are roughly the same.
Of course, the Spectrum is about 37% lower power and has less than half of latency of the Broadcom device, but one might conclude that the Mellanox advantages end there. Other specifications seem to indicate the devices are equivalent, with both devices having 32 ports operating at 100 Gb/s, and each with ~16Mbytes of on-board memory for buffering of data flows.
But in fact, the seemingly identical specifications belies significant performance and predictability differences, which profoundly impact real world customer experience. Even though nominally similar, the devices are in fact worlds apart in functionality – which ultimately makes the difference between a smoothly operating data center and one that leaves you scratching your head wondering what the #$%! is going on and why you aren’t able to deliver predictable experience to your customers.
Achieving predictable performance has many aspects but three key elements to consider are:
Together these make the difference between behavior that is expected and predictable vs a customer experience that is maddening, unpredictable, and ultimately disruptive to business operations. Mellanox Spectrum based switches have key advantages in all three of these areas.
A key element of any networking component is how fast it is able to forward packets. A Zero Packet Loss switch is able to support full wire speed forwarding for all packet sizes. By contrast if a switch does not have an adequate packet forwarding rate it is unable to keep up, and will drop packets unnecessarily. Making sure that networking elements don not suffer from avoidable packet loss has a significant effect on overall application performance. This is true for both adapters and switches, but particularly important for the switch; which when unable to keep up, impacts many different end points throughout the network. Thus it is critical that the switch ASIC is able to forward packets so as to prevent avoidable packet loss. For example in Voice-over-IP type applications, the typical packet size is very small – around 100 bytes. If the network devices handling these packets can’t keep up with forwarding all of them, choppy voice, jitter or even dropped calls will occur.
Many switch vendors claim they operate at “full wire speed,” however you need to pay close attention to these claims, as often there is an asterisk that explains that this is true for only certain packet sizes. Claiming full wire speed for the best case scenario really doesn’t tell the full story, as sooner or later a burst of small control, acknowledgment, and messaging packets align – and will cause some chips to fail. Both Switch-En and Spectrum based switches support ZeroPacketLoss and operate at full wire speed for all packet sizes without dropping packets.
An Ethernet switch with 32 ports operating at 100Gb/s needs to support packet rate forwarding at 4.76 billion packets per second in order to be line rate for all packets sizes. The Spectrum switch supports this packet rate and switches at line rate for all packet sizes. The standard way to measure packet rate forwarding performance is using RFC 2544 testing. As can be seen in the diagram, the Broadcom Tomahawk cannot support full packet rate and therefore starts to drop data at packet sizes of 200Bytes or lower. Even a small packet loss rate can severely degrade application performance as a result of software timeout and packet resends. In fact one packet loss benchmark demonstrated that losses even as low as 0.01% (1 out of 10,000 packets) can cause nearly a 40% decrease in file transfer performance. In the case of the Broadcom device, for packets or 200 Bytes or small the packet loss rate is extremely high (~20-30%) – making the network virtually unusable for these packet sizes.
Note that this type of packet loss is not due to congestion or to an incast condition (where more than one ingress port target the same egress port), but rather is the result of internal limitations of the ASIC itself. So in a very real sense this is avoidable packet loss that should not occur.
A 4.7G BPPS forwarding rate is required to switch minimum packet size at full wire speed. Anything less will lose data when small packet microbursts occur. As you can imagine losing data due to avoidable packet loss at signaling rates of 25, 50, and 100Gb/s has dire consequences on network and application performance.
A microburst is a temporary condition where multiple flows target the same output port – a condition technically referred to as ‘incast’. These incast microbursts happen in the real world all the time whenever two or more flows collide for some period of time. These microbursts are particularly common during coalescing phase of distributed applications like clustered SQL database log file operations, Hadoop Map Reduce, or software defined storage solutions like Ceph, VSAN, and others.
Microbursts are analogous to traffic jams where too many cars are trying to go to the same place at the same time. The solution to microbursts is large, efficient buffering in order to smooth them out and keep traffic moving. The bigger and more efficient the buffer there is, the less probability that traffic will back up – eventually causing packets to be dropped because of congestion.
“Microbursts are inevitable, but not something that application and data center architects should really have to worry about. … Spectrum based switches deliver 9X to 15X better microburst resilience than Tomahawk based switches.”
Note that resilience to microbursts is not just a matter of the overall size of the switch buffering. It is also whether the on-chip buffering is efficiently used. Nominally both the Spectrum and the Tomahawk devices have the same amount of buffering, however, Spectrum has a fully shared buffer and thus is much, much more efficient in how the buffering is used as a shock absorber to tolerate temporary microbursts. By contrast, Tomahawk based switches have only a fraction of the buffering available for any given egress port and thus buffers overflow resulting in packet loss much earlier than for Spectrum based switches.
There is no switch technology or network topology that is able to overcome congestion/blocking resulting from incast over-subscription forever. Even a switch such as Spectrum that has a full line rate packet switching internal crossbar can only support over-subscribed incast traffic for a finite amount of time (which is equivalent to a finite microburst size).
The size of microburst that a switch can accommodate without dropping packets is critical to overall application performance. Microbursts are inevitable but not something that application and data center architects really should have to worry about. Microburst incast over-subscription is an application dependent phenomena, and typically small microbursts are common while large sustained bursts are much less so. So having a larger microburst tolerance makes a switch much less susceptible to inevitable incast conditions.
Spectrum based switches deliver 9X to 15X better microburst resilience than Tomahawk based switches. This significantly higher resiliency to temporary incast conditions prevents packet loss and flow control, reduces communication latency, and greatly improves application level performance.
The third element required to achieve predictable performance is fairness. Fairness is critical in order to enable service providers to deliver service level guarantees to customers. As discussed previously, both packet forwarding limits and inefficient buffering of microbursts can lead to packet loss.
But even before packet loss occurs, an even more basic requirement is being able to deliver fair and predictable behavior that is not dependent on the specific physical location of the virtual machine hosting an application. As can be seen Mellanox Spectrum based switches have a uniform, balanced architecture that provides fair bandwidth arbitration independent of the specific port that a particular service is connected to.
By contrast, the fragmented architecture of the Tomahawk switch means that performance is highly dependent on which particular physical ports are connected to server or storage platforms. This results in unfair allocation of bandwidth to resources depending on the particular topology and flow patterns.
Worse yet, in a virtualized environment this behavior is completely unpredictable as virtual machines are migrated between physical servers creating new chaotic patterns of connectivity. Because the underlying network is not able to provide fair and balanced flows to endpoints the resulting behavior is completely unpredictable. By contrast, the Spectrum arbitrates bandwidth fairly and independently and does not depend on which physical port a resource is connected. As a result, Spectrum-based switches deliver fair, predictable behavior enabling cloud and communication service providers to deliver their customers service level guarantees.
Obviously, customers truly care about predictable performance, fair bandwidth, efficient buffering, and line rate performance. Applications suffer when a network lacks these fundamental performance characteristics and there are ramifications to your business when an application does not perform well. We’d like to hear from you and we will put our money where our mouth is. Contact us today to find out how our new 25/50/100G Spectrum based switches outperform the competition in real-world applications.