Building a large scale data center is not an easy task and one that includes considerable cost. The larger the cluster is, the larger the core switching element needs to be to carry traffic between servers and storage elements of the data center.
Multiple redundancy and distribution mechanisms are needed to avoid network outages, make implementations resilient and reduce the business impact of failed network elements.
The Virtual Modular Switch (VMS) solution provides a distributed core element to the data center. The VMS is logically placed where you would traditionally place a chassis. Its benefit is targeted for increased resiliency by offering built-in redundancy and distribution of the networking load between multiple elements.
Covered in previous blog posts (Part 1 and Part 2), the concept of the Virtual Modular Switch (VMS) is clearly an advantage for networks of medium to large scale. As we move into huge networks where multiple modular switches are needed, this advantage reduces to the point where it is a matter of personal preference whether to implement using VMS or multiple chassis.
When the odds are even, this preference can come down to a matter of cost of equipment, cost of operating the equipment, certain network KPIs that need to be met or any other parameter that the network facilitator will care about.
The Mellanox implementation of VMS is based on our own ASIC design known as SwitchX. It is used as the fabric element in each of our Ethernet (and InfiniBand) product line of switches. SwitchX carries 36 high speed interfaces of standard 40 GbE which when used in a non-blocking fat tree topology, allows 18 ports to be used for external interfaces and 18 ports to be used as internal interfaces towards the spine layer of the VMS fat tree. Having 36 ports on each of the spine elements allows as many as 36 leaf elements. The total number of external ports in a non-blocking two tier VMS is 36*18=648.
If you search the internet for data center automation tools, you will come up with many options. You can easily find software tools that automate server provisioning, network equipment configuration or monitor the different elements. But you cannot find tools for automatic fabric configuration.
Fabrics become more popular these days. If traditional aggregation switching in data centers of Cloud providers, Web 2.0 providers, and large-scale enterprises has been based on modular switches, we now see them being replaced by fabrics – arrays of fixed, 1U switches. These fabrics increase the flexibility and efficiency in data center aggregation – lower cost of equipment, power reduction, better scalability and high resiliency.
Mellanox Virtual Modular Switch™ (VMS) is such a fabric, comprised of Mellanox 10, 40, and 56GbE fixed switches. It provides an optimized approach for aggregating racks. The VMS excels in its flexibility, power savings and performance. Based on Mellanox switches, the VMS leverages the unique advantages of the SwitchX-2, the highest performing 36-port 40GbE switching IC.
The Need for Automation
The scalability that the fabrics bring drives a change in the way they are configured. The legacy way to configure switches and routers is scripting – each device has its management interface, typically CLI, and when the right configuration script is applied to each switch, they interwork as a single fabric. However, this approach does not scale and one cannot configure big fabrics in mega data centers this way, since creating and maintaining scripts for many fixed switches can become a nightmare. So, fabric creation automation is required – a tool that can do it both automatically and fast, to allow short setup time.
Distributed elements, in any sector, have their basic benefits and drawbacks compared to a single large tool. It is similar to the preference of using small aircraft over a jumbo 747 for carrying passengers between proximate airfields or to using a bus vs. multiple private cars to move a football team around.
In networking, the analysis between a Virtual Modular Switch (VMS) and a Modular switch is cost and performance driven. A network facilitator will prefer a solution that gets the job done at the lowest cost. Such an analysis will produce different results based on the cluster’s size. If the number of network ports required for the solution can be fitted into a single chassis based device, this means that the use of the chassis, although equipped with redundant peripheral elements such as fans and power units, is presenting a single point of failure in the network. In order to solve this, a second chassis is introduced for sharing the load and provide connectivity in case of chassis failure.
From a financial point of view, assuming you had a chassis of 1000 ports in full use, you need to deploy a solution of 2000 ports for high availability purposes which means a 100% price increase. Using 2/3 of the ports in the chassis will translate to 200% increase on top of the real required investment and more such examples are easy to find. Other problem with the chassis is that it comes in very few form factors so if your solution requires 501 ports while the chassis of choice supports 500, you need to add another and pay the double cost.
Alternatively, breaking the solution into multiple devices in a VMS gives both improved granularity in terms of port count and high availability in terms of impact from failure. In loose terms, if the VMS consists of 20 switches, the failure of a switch translates to 5% loss of network capacity. Regardless from how powerful and complicated the chassis is, this is a classic case where the strength of many tops the strength of one.
Traditionally, while Ethernet networks were serving low end and non-performance driven applications, the network topology was based on an access layer with a very high port count and a very low rate of traffic generation. This drove a very high and acceptable blocking ratio and a situation where a single (or two in case of need for high availability) uplink would serve for all purposes and connect to an all mighty aggregation chassis that catered for the whole network.
While applications were continuously evolving into becoming more bandwidth hungry, latency sensitive and capacity driven, the need for a wider pipe between the access and aggregation elements in the network became the enabler for the entire evolution of the network. This in turn, drove users towards usage of more interfaces on the aggregation chassis and the network into a gridlock of price to performance ratio.
The need for a high port count of high capacity interfaces on the aggregation switch translates to a very large and complicated chassis. Now although these are available, they are traditionally a step behind the physical evolution or Ethernet technologies; late to arrive with the proper amount of higher speeds interfaces and limiting in terms of their capability to carry the extra volume in terms of power, cooling, control tables and switching matrix. This situation can be resolved by eventually replacing the existing chassis with a newer model with the promise to be more future tolerant than its predecessor and of course accepting the additional cost spent on a huge device (or two in case of need for high availability).
An alternative to hanging your entire network from a single element is to use a fabric of smaller, simpler and more cost effective elements, in order to create a network entity with the required port count, capacity and other performance attributes. This essentially means replacing your modular switch with a Virtual Modular Switch– or how we like to call it–the VMS.
A VMS is a fat tree topology of Ethernet switches with OSPF routing used for topology discovery and ECMP used for load balancing traffic between leaf (access) elements of the VMS via spine (core) elements of it.
Stay tuned to further exploration of the pros and cons in deploying a VMS vs. deploying a modular chassis.
||Author: Since 2011, Ran has served as Sr. Product Manager for Ethernet Products. Prior to joining Mellanox, Ran worked at Nokia Siemens Networks as a solution sales and marketing specialist for the packet networks business unit. Ran holds a BSc. In Electrical Engineering and Computer Sciences from the University of Tel Aviv, Israel.