On Tuesday, October 6, QCT opened its Cloud Solution Center located within QCT’s new U.S. corporate headquarters in San Jose. The new facility is designed to test and demonstrate modern cloud datacenter solutions that have been jointly developed by QCT and it’s technology partners. Among the demonstrated solutions, there was an innovative VDI deployment that has been jointly developed by QCT and Mellanox and based on a virtualized hyper-converged infrastructure with scale-out Software-Defined-Storage and connected over 40GbE.
VDI enables companies to centralize all of their desktop services over a virtualized data center. With VDI, users are not tied to a specific PC and can access their desktop and run applications from anywhere. VDI also helps IT administrators by creating more efficient and secure environments, which enables them to better serve their customers’ business needs.
VDI efficiency is measured by the number of virtual desktops that a specific infrastructure can support, or, in other words, by measuring the cost per user. The major limiting factor is the access time to storage. Replacing the traditional Storage Area Network (SAN) architecture with a modern scale-out software-defined storage architecture with fast interconnect supporting 40GbE significantly eliminates potential bottlenecks, enabling the lowest total cost of ownership (TCO) and highest efficiency.
Just one more week to go before VMworld 2015 begins at Moscone Center in San Francisco. VMworld is the go-to event where business and technical decision makers converge. In recent years, this week-long conference has become the major virtualization technologies event, and this year is expected to be the biggest ever.
We are thrilled to co-present a breakout session in the Technology Deep Dives and Futures track: Delivering Maximum Performance for Scale-Out Applications with ESX 6 [Tuesday, September 1, 2015: 11AM-Noon]
Presented by Josh Simons, Office of the CTO, HPC – VMware and Liran Liss, Senior Principal Architect, Mellanox.
An increasing number of important scale-out workloads – Telco Network Function Virtualization (NFV), in-memory distributed databases, parallel file systems, Microsoft Server Message Block (SMB) Direct, and High Performance Computing – benefit significantly from network interfaces that provide ultra-low latency, high bandwidth, and high packet rates. Prior to ESX 6.0,Single-Root-IO-Virtualization (SR-IOV) and Fixed Pass through (FPT), which allow placing hardware network interfaces directly under VM control, introduced significant latency and CPU overheads relative to bare-metal configurations. ESXi 6.0 introduces support for Write Combining, which eliminates these overheads, resulting in near-native performance on this important class of workloads. The benefits of these improvements will be demonstrated using several prominent workloads, including a High Performance Computing (HPC) application, a Data-Plane-Development-Kit (DPDK) based NFV appliance, and the Windows SMB-direct storage protocol Detailed information will be provided to show attendees how to configure systems to achieve these results.
Over the past couple years, we have witnessed significant architectural changes affecting modern data center storage systems. These changes have had a dramatic effect, as they have practically replaced traditional Storage Area Network (SAN), which has been the dominant solution for over a decade.
When analyzing the market trends that led to this change, it becomes very clear that virtualization is the main culprit. The SAN architecture was very efficient when only one workload was accessing the storage array, but it has become much less efficient in a virtualized environment in which different workloads arrive from different independent Virtual Machines (VMs).
To better understand this concept, let’s use a city’s traffic light system as an analogy to a data center’s data traffic. In this analogy, the cars are the data packets (coming in different sizes), and the traffic lights are the data switches. Before the city programs a traffic light’s control, it conducts a thorough study of the traffic patterns of that intersection and the surrounding area.
Enable Higher IOPS while Maximizing CPU Utilization
As virtualization is now a standard technology in the modern data center, IT managers are now seeking ways to increase efficiency by adopting new architectures and technologies that enable faster data processing and execute more jobs over the same infrastructure, thereby lowering the cost per job. Since CPUs and storage systems are the two main contributors to infrastructure cost, using fewer CPU cycles and accelerating access to storage are keys toward achieving higher efficiency.
The ongoing demand to support mobility and real-time analytics of constantly increasing amounts of data demands that new architectures and technologies be used, specifically those with smarter usage of expensive CPU cycles and as a replacement of old storage systems that were very efficient in the past, but that have become hard to manage and extremely expensive to scale in modern virtualized environments.
With an average cost of $2,500 per CPU, about 50% of compute server cost is due to the CPUs. On the other hand, the I/O controllers cost less than $100. Thus, offloading tasks from the CPU to the I/O controller frees expensive CPU cycles, increasing the overall server efficiency. Other expensive components, such as SSD, will therefore not need to wait the extra cycles for the CPU. This means that using advanced I/O controllers with offload engines results in a much more balanced system that increases the overall infrastructure efficiency.
Guest Blog post by Giacomo Losio, Head of Technology – ProLabs
Original equipment manufacturers (OEM’s) have long dominated the optical components market but a new study now suggests that, as a result of tighter margins and greater competition, customers are putting quality and price before brand. Is the era of the big OEM at an end?
When asked their views of the optical transceiver market at the European Conference on Optical Communications (ECOC) in Cannes, over 120 attendees revealed a trend which indicates a paradigm shift in attitudes.
Why do they buy? What they buy? What keeps them up at night? The answers may surprise you:
- 98% of respondents ranked quality as one of their top three priorities when purchasing fibre optics
- 89% of respondents placed price in the top three list of priorities
- Yet only 14% of respondents even considered brand names to be a top three priority – or even a concern
Today’s data centers demand that the underlying interconnect provide the utmost bandwidth and extremely low latency. While high bandwidth is important, it is not worth much without low latency. Moving large amounts of data through a network can be achieved with TCP/IP, but only RDMA can produce the low latency that avoids costly transmission delays.
The speedy transfer of data is critical to it being used efficiently. Interconnect based on Remote Direct Memory Access (RDMA) offers the ideal option for boosting data center efficiency, reducing overall complexity, and increasing data delivery performance. Mellanox RDMA enables sub-microsecond latency and up to 56Gb/s bandwidth, translating to screamingly fast application performance, better storage and data center utilization, and simplified network management.
Big Data solutions such as Hadoop and NoSQL applications are no longer a sole game for Internet moguls. Today’s retail, transportation and entertainment corporations use Big Data practices such as Hadoop for data storage and data analytics.
IBM BigInsights makes Big Data deployments an easier task for the system architect. BigInsights with IBM’s GPFS-FPO file system support provides enterprise level Big Data solution, eliminating Single Point of Failure structures and increasing ingress and analytics performance.
The inherent RDMA support in IBM’s GPFS takes the performance aspect a notch higher. The testing conducted at Mellanox Big Data Lab with IBM BigInsights 2.1, GPFS-FPO and FDR 56Gbps InfiniBand showed an increased performance for write and read of 35% and 50 %, respectively, comparing to a vanilla HDFS deployment. On the analytics benchmarks, the system provided 35% throughput gain by enabling the RDMA feature.
In 1967, Gene Amdahl developed a formula that calculates the overall efficiency of a computer system by analyzing how much of the processing can be parallelized and the amount of parallelization that can be applied in the specific system.
At that time, deeper performance analysis had to take into consideration the efficiency of three main hardware resources that are needed for the computation job: the compute, memory and storage.
On the compute side, efficiency has to be measured by how many threads can run in parallel (which depends on the number of cores). The memory size affects the percentage of IO operation that needs to access the storage, which slows significantly the execution time and the overall system efficiency.
Those three hardware resources worked very well until the beginning of 2000. At that time, the computer industry started to use a grid-computing or as it known today, scale-out systems. The benefits of the scale-out architecture are clear. It enables building systems with higher performance, easy to scale with built-in high availability at a lower cost. However, the efficiency of those systems heavily depend on the performance and the resiliency of the interconnect solution.
The importance of the Interconnect became even bigger in the virtualized data center, where the amount of east west traffic continues to grow (as more parallel work is being done). So, if we want to use Amdahl’s law to analyze the efficiency of the scale-out system, in addition to the three traditional items (compute, memory & storage) the fourth item, which is the Interconnect, has to be considered as well.
Every IT professional’s goal is to improve TCO. In a Virtual Desktop Infrastructure (VDI) application, the objective is to increase the efficiency by maximizing the number of virtual desktops per server while maintaining response times to users that would be comparable to a physical desktop. In addition, the solution must be resilient since downtime of the VDI application causes the idling of hundreds to thousands of users and consequently reduces overall organizational productivity and increases user frustration.
Low latency data requests from storage or other servers are the key to enabling more VDI sessions without increasing user response times. Legacy Fibre Channel-connected storage subsystems provide shared storage which enables moving virtual machines between physical servers. Leveraging an existing Ethernet infrastructure saves costs by combining networking and storage I/O over the same cable. iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). It basically uses the upper layers of iSCSI for session management, discovery, recovery, etc., and thus compatible with all the features and functions supported by iSCSI. However, using iSER eliminates the bottleneck through the following mechanisms:
- Uses zero copy via RDMA technology
- CRC is calculated by hardware
- Works with message boundaries instead of streams
- The transport protocol is implemented in hardware (minimal CPU cycles per IO)
Recently, at VMworld’13, LSI Corporation and Mellanox Technologies presented a joint solution that accelerates the access storage. The solution includes LSI’s Nytro MegaRAID NMR 8110-4i card which has 200GB of on-card flash and eight SAS HDDs and Mellanox’s ConnectX®-3 Pro adapter supports 10Gb/s RoCE storage connectivity between the servers. VDI performance (over TCP/IP and RoCE) was measured using Login VSI’s VDI load generator which creates the actual workload of a typical Windows user using Microsoft Office.
Running Login VSI showed that when running over 10GE TCP/IP only 65 virtual desktop responded within 5 seconds or less, versus 140 when running over 10GE RoCE. This translates into more than 2X cost saving of the VDI hardware infrastructure and proven to be an excellent economical alternative to legacy Fibre Channel based storage subsystems.
HP updated its enterprise hardware portfolio with the most notable addition being networking devices that combined wired and wireless infrastructure to better manage bring-your-own-device policies.One of those highlights is the Mellanox SX1018 HP Ethernet switch, which lowers port latency and improves downlinks.
The Mellanox SX1018HP Ethernet Switch is the highest-performing Ethernet fabric solution in a blade switch form factor. It delivers up to 1.36Tb/s of non-blocking throughput perfect for High-Performance Computing, High Frequency Trading and Enterprise Data Center- applications.
Utilizing the latest Mellanox SwitchX ASIC technology, the SX1018HP is an ultra-low latency switch that is ideally suited as an access switch providing Infiniband like performance with sixteen 10Gb/40Gb server side downlinks and eighteen 40Gb QSFP+ uplinks to the core with port to port latency as low as 230nS.
The Mellanox SX1018HP Ethernet Switch has a rich set of Layer 2 networking and security features and supports faster application performance and enhanced server CPU utilization with RDMA over Converged Ethernet (RoCE), making this switch the perfect solution for any high performance Ethernet network.
HP is the first to provide 40Gb downlinks to each blade server enabling InfiniBand-like performance in an Ethernet blade switch. Another industry first, the low-latency HP SX1018 Ethernet Switch provides the lowest port to port latency of any blade switch, more than four times faster than previous switches
When combined with the space, power and cooling benefits of blade servers, the Mellanox SX1018HP Ethernet Blade Switch provides the perfect network interface for Financial applications and high performance clusters.