From Network Function Virtualization to Network Function Cloudification: Secrets to VNF Elasticity

Cloud Computing, Virtualization, ,

According to a recent survey done by Light Reading, SDN/NFV was ahead of 5G and Internet of Things (IoT) and gained the honor of being the hottest topic at the 2015 Mobile World Congress in Barcelona. Why are people so enthused about SDN and NFV? Two key things: Agility and Elasticity. Communication Service Providers (CSPs) and enterprises alike can spin up and down networks and services on demand, and scale them to the right size that fits their business needs.

Chloe Ma 031115 Fig 1


But these are really the benefits of cloud, not just virtualization. Virtualization and cloud are often used interchangeably but they are not the same concept. Fundamentally, virtualization refers to the act of creating a virtual (rather than actual) version of something, including but not limited to a virtual computer hardware platform, operating system (OS), storage device, or computer network resources. Virtualization enhances utilization of resources and let you pack more applications onto your infrastructure.


On the other hand, cloud computing is the delivery of shared computing resources on demand through the Internet or enterprise private networks. Cloud can provide self-service capability, elasticity, automated management, scalability and pay-as-you-go service that are not inherent in virtualization, but virtualization makes it easier to achieve those.


So the Nirvana of Network Function Virtualization is really Network Function Cloudification. But exactly what do we need to do to get there?



To fully take advantage of a virtualized cloud environment, it is not sufficient to simply port your applications running on bare metal or purpose-built appliances to running on Virtual Machines (VMs). If you virtualize a complex mess, you get a virtualized complex mess. A large portion of Virtualized Network Functions (VNFs) are stateful applications which over the years have been optimized to run in a scale-up environment, normally an appliance with CPU, local storage and dedicated ASIC for packet processing. In stateful VNFs, the states are tightly coupled with the packet processing unit itself, but cloud native applications have a clear separation between application and data, where the application is stateless, and states are stored in logically centralized cloud storage.

(Click image to expand)

Chloe Ma 031115 Fig 7

The old wisdom associated performance with statefulness, and simplicity with statelessness. As new technologies emerge, assumptions are invalidated and rules rewritten. With enhanced performance of CPU, network and storage I/O, stateless applications can achieve very good performance while being much easier to scale out and recover from failure.


Let’s use Firewall as an example to understand the stateful to stateless transition for better scalability and reliability.


(Click image to expand)

Chloe Ma 031115 Fig 2


Before virtualization, there needs to be a session-aware Application Delivery Controller (ADC) sitting in front of the physical firewall arrays to do load balancing. Each of the firewall appliance only stores the sessions that have been initiatied locally, and it is difficult to exchange information between the firewall boxes. The ADC needs to know which firewall appliances can handle what sessions and distribute packets to them accordingly.


By simply virtualizing the firewall appliance to run on VMs, you can spin up or down a firewall instance fairly quickly, and move them around if needed, but the added benefits pretty much stop there.  The newly added firewall VMs can only start handling new sessions and don’t automatically offload from existing VMs; the ADCs need to remember more states and possible scale to run on more VMs also; it is still hard to recover from firewall VM failures; and it is still hard to scale down even if a firewall VM is running way below full utilization, resulting in low infrastructure efficiency.


Ultimately, the firewall application software needs to be re-architected to split into a stateless firewall processing element, and session state storage. The storage is logically centralized but can definitely be physically distributed, and depending on the performance requirement, utilize different tiers of storage.  In this new model, the firewall processing elements are stateless and identical to each other, and it is much easier to scale the service up, and down.


The ADC VMs are also much simpler in the sense that they don’t need to track any session states, they just need to do a simple stateless hash. When a VM fails, other VMs can automatically pick up the load. If you are familiar with the pets vs. cattle metaphor, you have just converted your firewall VM from a pet to a cattle, so the whole firewall application can scale out and be cloud native.


Network Function Cloudification is not pulled out of thin air.  Many web services and SaaS applications have been architected to be stateless, and even in the NFV world, VNF vendors such as Metaswitch are promoting a new stateless model to make VNFs cloud native. I borrowed the following picture from an Infonetics webinar on “Deploying IMS in the Cloud with NFV” with Martin Taylor, CTO of Metaswitch Networks.


(Click image to expand)

Chloe Ma 031115 Fig 3


Indeed, storage needs to play a much bigger role in Network Function Cloudification, and as the network functions move to VMs, the storage architecture evolve in three stages: Local, Network Storage, and Distributed.


Local storage: In a pre-virtualization world of purpose-built appliances, each appliance has its own dedicated, local storage, either internal to the appliance or nearby and directly-attached, hence the term Direct-Attached Storage (DAS). DAS is cheap but since it’s not shared, storage utilization is typically low. If an appliance fails, the sessions and related state data are usually lost, and migrating sessisons to another appliance is difficult. Local storage is connected with a SAS, SATA, or PCIe interface.


(Click image to expand)

Chloe Ma 031115 Fig 4


Network storage:  With the move to virtualization, network storage became popular because it  allows many VMs to share the same storage array. If one VM or hypervisor host dies or needs maintenance, it’s easy to migrate the network function to another VM while the data stays in the same place. This storage is more expensive but since it’s shared, utilization is higher and it’s easier to leverage replication, deduplication, and instant backups to increase efficiency.  Network storage can be connected to the servers by Ethernet, InfiniBand, or Fibre Channel, using storage protocols such as iSCSI, iSER (iSCSI Extensions for RDMA), and FCP (Fibre Channel Protocol).


(Click image to expand)Chloe Ma 031115 Fig 5


Distributed storage:  The newest paradigm is to share the local storage in each hypervisor host over a fast, low-latency network. This is also known as “Hyper-converged” or Server SAN, because each server does both compute and storage.  It combines the low cost of local storage with the sharing capabilities and higher efficiencies of network storage. The VMs see all the local disks as one virtual storage device, allowing sharing or data and state as well as rapid backup, migration, or failover of both virtualized functions and associated session data.


(Click image to expand)Chloe Ma 031115 Fig 6


As Network Function Cloudification advances, vendors and customers increasingly incorporate either network storage or distributed storage into their infrastructure design for their higher efficiency and the ability to scale cloud-native services up or down. To meet the stringent carrier-grade service level agreements, both approaches require fast, low-latency network connections to deliver desired performance in virtualized environment, and this is especially true with the growing popularity of flash-based storage. Network storage requires fast connections between the VMs and the storage, while distributed storage requires fast connections between the VMs.


Wouldn’t it be nice for any VM to be able to access any storage, either local on the same server, or remote on another server or storage device, with the same low and deterministic latency, while adding NO burdens on CPU? Indeed, those are what RDMA (Remote Direct Memory Access) and SRIOV (Single Root I/O Virtaulization) technologies can bring to you, NOW.

These advanced acceleration technologies have been integrated into major cloud and virtualization software platforms, and cloud providers are increasingly adopting RDMA or RDMA over Converged Ethernet (RoCE) over higher bandwidth connections running at 40, 56, or 100Gb/s when performance and latency can’t be compromised.


Mellanox is a leader in high-performance interconnct for all the above three storage deployment models and has excellent integration with leading hypervisors—including KVM , VMware, and Hyper-V. Mellanox also provides an end-to-end networking solution including adapters, switches, and cables, making Mellanox an easy choice for both compute and storage connections in the world of Network Function Virtualization.


Several leading network storage vendors already use Mellanox for server connectivity, including IBM, NetApp (E-series), Violin Memory, Toshiba, X-IO, and Zadara Storage. Simultaneously Mellanox networking has been shown to optimize distributed storage solutions from VMware (VSAN), EMC (ScaleIO), Citrix (SANbolic), and others.


Storage is an inseparable piece of puzzle to Network Function Cloudification, and at present, a largely-ignored topic. But as VNF vendors and CSPs get over the initial hump, they will see that efficient storage and high-performance interconnect are inevitable for them to really achieve scale-out elasticity and high availability for their cloud services.



Comments are closed.