All posts by Motti Beck

About Motti Beck

Motti Beck is Sr. Director Enterprise Market Development at Mellanox Technologies Inc. Before joining Mellanox Motti was a founder of BindKey Technologies an EDC startup that provided deep submicron semiconductors verification solutions and was acquired by DuPont Photomask and Butterfly Communications a pioneering startup provider of Bluetooth solutions that was acquired by Texas Instrument. Prior to that, he was a Business Unit Director at National Semiconductors. Motti hold B.Sc in computer engineering from the Technion – Israel Institute of Technology. Follow Motti on Twitter: @MottiBeck

Things Are About to Get RoCE with Mellanox in Las Vegas

Mellanox will accelerate the speed of data in virtualized data center from 10G to new heights of 25G at VMWorld 2016 which converges in Las Vegas on Aug. 29 to Sept. 1st at Mandalay Bay.


With an announcement coming for Mellanox’s ConnectX®-4 Ethernet and RoCE (RDMA over Converged Ethernet), things are about to get rocky in the best possible way. For starters, Mellanox will be showing how easy it is to increase the efficiency of the Cloud just by deploying tomorrow’s networking solutions today. Mellanox delivers network products that improve productivity, scalability and flexibility, all of which enable the industry to define the data center that meets your needs now and at the same time to be a future full proof. As the leading provider of higher performance Ethernet NICs that has an 85 percent market share above 10GbE, Mellanox’s Connect-X-4 and Connect-X-4 Lx NICs support not just all data speeds but also include efficient offload engines that accelerate the data center applications with minimal CPU overhead. Stay tuned at the show for exciting news about Mellanox’s 10/25/40/50/100Gb/s Ethernet and RoCE end-to-end solution.


Mellanox will also be hosting technology demos and on-going presentations at booth #2223 where show attendees can learn about the benefits of using our Spectrum 10/25/40/50/100GbE switch that doesn’t lose packets and why vSphere runs best over it. Show goers will also have the chance to win prizes, all around the theme of driving the industry to 25G. In addition, a number of key Mellanox partners will be participating in the Mellanox booth presentations as we enable the superior capabilities of our network solutions, including end-to-end support on the 25GbE that is much more efficient networking then the 10GbE, which is no longer fast enough to meet today’s performance and efficiency needs. This is why, when it’s time to choose your next networking technology provider for your cloud deployment, or for your hyper-converged systems, you will choose Mellanox – but don’t just believe me, come to visit our booth and see for yourself!

Lastly, don’t miss one of our paper presentations that were selected by the VMworld committee this year:

  • Achieving New Levels of Cloud Efficiency over vSphere based Hyper-Converged Infrastructure [HBC9453-SPO]
    • Monday Aug 29 from 5:30 p.m. to 6:30 p.m.
  • iSCSI/iSER: HW SAN Performance Over the Converged Data Center [INF8469]
    • Wednesday, Aug 31 from 1 p.m. to 2 p.m.
  • Latencies and Extreme Bandwidth Without Compromise [CTO8519]
    • Thursday, Sept 1 from 12 p.m. to 1 p.m.

Hope to see you there.




Achieving New Levels of Application Efficiency with Dell’s PowerEdge Connected over 25GbE

After many years of extensive development of data center visualization technologies, which started with server virtualization and continued with networking virtualization and storage virtualization, the time has arrived to work on maximizing the efficiency of the data centers that have been deployed over those advanced solutions. The rationale for doing this is pretty clear. New data centers are based on the Hyper-Converged architecture which eliminates the need for dedicated storage systems (such as SAN or NAS) and the need for dedicated servers for just storage. Modern servers that are used in such Hyper-Converged deployments usually contain multiple CPUs and large storage capacity. Modern CPUs have double-digit cores that enable the servers to supports tens, and in some cases, hundreds of Virtual Machines (VMs). From the storage point of view, such servers have a higher number of PCIe slots, which enables the NVMe storage to be used as well the ability to host 24 or 48 SAT/SATA SSDs, both of which result in extremely high storage capacity.


Figure 1: Microsoft’s Windows Servers 2016 Hyper-Converged Architecture, in which the same server is used for Compute and Storage.

Now that there are high performance servers, each capable of tens of VMs and millions IOPs, IT managers must take a careful look at the networking capabilities and avoid IO bounded situations. The network must now support all traffic classes, the compute communication, the storage communications, the control, and so on. As such, not having high enough networking bandwidth will result in unbalanced systems (see: How Scale-Out Systems Affect Amdahl’s Law) and will therefore reduce the overall deployment efficiency. That is why Dell has equipped their PowerEdge 13th generation servers with Mellanox’s ConnectX®-4 Lx 10/25Gb/s Ethernet adapters, delivering significant application efficiency advantages and cost savings for private and hybrid clouds running demanding big data, Web 2.0, analytics, and storage workloads.

In addition to data communication over 25GbE, Dell’s PowerEdge servers, equipped with ConnectX-4 Lx-based 10/25GbE adapters, are capable of accelerating latency-sensitive data center applications over RoCE (RDMA over Converged Ethernet), which enables similar performance in a virtualized infrastructure as in a non-virtualized infrastructure. This, of course, further maximizes system efficiency.

A good example that demonstrates the efficiency that higher bandwidth and lower latency networks enable is Microsoft’s recent blog which published the performance results of a benchmark that they ran over a 4-node Dell PowerEdge R730XD cluster and connected over 100Gb Ethernet. Each node was equipped with the following hardware:

  • 2x Xeon E5-2660v3 2.6Ghz (10c20t)
  • 256GB DRAM (16x 16GB DDR4 2133 MHz DIMM)
  • 4x Samsung PM1725 3.2TB NVME SSD (PCIe 3.0 x8 AIC)
  • Dell HBA330
    • 4x Intel S3710 800GB SATA SSD
    • 12x Seagate 4TB Enterprise Capacity 3.5” SATA HDD
  • 2x Mellanox ConnectX-4 100Gb (Dual Port 100Gb PCIe 3.0 x16)
    • Mellanox FW v. 12.14.2036
    • Mellanox ConnectX-4 Driver v. 1.35.14894
    • Device PSID MT_2150110033
    • Single port connected / adapter


Figure 2: Storage throughput with Storage Spaces Direct (TP5)

The Microsoft team measured the storage performance, and, in order to maximize the traffic, they ran 20 VMs per server (total of 80 VMs for the entire cluster). They achieved astonishing performance of 60GB/s over a 4-node cluster, which perfectly demonstrates the higher efficiency that can be achieved when the three components of compute, storage, and networking are balanced, minimizing potential bottlenecks that can occur in an unbalanced system.

Another example that shows the efficiency advantages of a higher bandwidth network is a simple ROI analysis of VDI deployment of 5000 Virtual Desktops, which compares connectivity over 25GbE versus 10GbE (published in my previous blog: “10/40GbE Architecture Efficiency Maxed-Out? It’s Time to deploy 25/50/100GbE”). When looking at only the hardware CAPEX savings, running over 25GbE cuts the VM costs in half, while adding the cost of the software and the OPEX even further improves the ROI.


Modern data centers must be capable to handle the flow of data flow of data, and to enable (near) real-time analysis, which is driving the demand for higher performance and more efficient networks. New deployments that are based on Dell PowerEdge servers, equipped with Mellanox ConnectX-4 Lx 10/25GbE adapters, allows clients an easy migration from today 10GbE to 25GbE without demanding costly upgrades or incurring additional operating expenses.



Accelerating High-Frequency Trading with HPE Apollo 2000 and 25Gb/s Ethernet


The financial services industry (FSI) is facing various challenges these days, including the ongoing data explosion, new regulatory demands, more messages per trade, and increased competition. In a business where profits are directly determined by communications speed and latency, building a high-performance infrastructure that is capable of analyzing a high volume of data is critical. In particular, for high frequency trading applications saving a few microseconds in latency can be worth millions of dollars. Furthermore, in order to maintain a competitive advantage, financial firms must constantly upgrade infrastructure and accelerate data analytics. Given these factors the Trading and Market Data Applications market is one of the most demanding in terms of data center networking requirements, and requires IT managers to incorporate the most advanced networking technologies, supporting ultra-low latency and the highest possible throughput, while maintaining the lowest possible total cost of ownership (TCO).


Figure 1: HPE dual-port 25GbE adapter in both mezzanine (640SFP28) and PCIe card

This week at the HPE Discover 2016 conference, Mellanox announced the availability of new 25/100Gb/s Ethernet solutions for ProLiant and Apollo servers that will reach new levels of networking performance at lower TCO. The announcement includes two dual-port 10/25GbE network interface controllers (NICs): the HPE 10/25Gb/s 2-port 640SFP28 Ethernet Adapter and the HPE 10/25Gb/s 2-port 640FLR-SFP28 Ethernet Adapter. Both are based on the Mellanox Connect-X®-4 Lx 10/25GbE controller.

One of the simplest and most effective ways to take advantage of the higher speed is with VMA Messaging Acceleration Software. VMA is an open source, dynamically-linked user-space Linux library for accelerating mes­saging traffic, and is proven to boost performance of high frequency trading applications. Applications that utilize standard BSD sockets use the library to offload network processing from a server’s CPU. The traffic is passed di­rectly to the NIC from the application user space, bypassing the kernel and IP stack and thereby minimizing context switches, buffer copies, and interrupts. This results in extremely low latency. VMA software runs on both of the new HPE Ethernet 10/25 Gb/s adapters and requires no changes to the applications.


Figure 2: VMA block diagram

Running trading and market data applications over 25GbE and VMA enables the lowest application latency, highest application throughput, and improved scalability compared to other solutions, making Mellanox Ethernet the best interconnect solution for high frequency trading. At the conference, Mellanox and HPE demonstrated its Trade and Match Server solution that is based on the Apollo 2000 platform and that has been designed to minimize system latency and optimized for higher performance, specifically for high-frequency trading operations. HPE has published benchmark results for the Trade and Match Server, connected by ConnectX-4 Lx 25GbE, that demonstrate the competitive advantages that Mellanox’s high-performance interconnect solutions enable.


Figure 3: HPE’s Trade and Match Server, with industry-leading TCP and UDP latencies when connected by ConnectX-4 Lx 25GbE

In another example, HP compares UDP latency under various traffic load scenarios, thereby simulating the consumption of high volume market data feeds like OPRA, where systems are required to maintain low and consistent latency under high volumes of traffic from the feed. Here too, the solution is able to sustain very low latency even under conditions of high message rate.


Figure 4: VMA UDP latency under high message rates (sockperf)

In addition to its higher bandwidth and lower latency, the ConnectX-4 Lx also enables IT managers to leverage Remote Direct Memory Access (RDMA) offload engines by running the latency-sensitive applications required by trading and market data applications over RoCE (RDMA over Converged Ethernet). RDMA enables the network adapter to transfer data directly from application to application without involving the operating system, thereby eliminating intermediate buffer copies. As such, running over RoCE minimizes the latency and maximizes the messages per second that the infrastructure is capable of providing, both of which are essential for businesses to maintain their competitive advantage in data analysis.

The financial services industry is one of the most demanding in terms of IT networking requirements. Much more data needs to be analyzed in real-time, and every microsecond can translate into mil­lions of dollars of profits or losses. It is therefore crucial to improve system performance with a low latency, high bandwidth connectivity such as Mellanox 25Gb/s Ethernet in order to maintain a sustainable advantage over the competition.

10/40GbE Architecture Efficiency Maxed-Out? It’s Time to Deploy 25/50/100GbE

iStock_flying-animation-information-in-cloud-78487761_HD_1080_2In 2014, after the IEEE rejected the idea of standardizing 25GbE and 50GbE over one lane and two lanes respectively. It was then that a group of technology leaders (including Mellanox, Google, Microsoft, Broadcom, and Arista) formed the 25Gb Ethernet consortium in order to create an industry standard for defining interoperable solutions. The Consortium has been so successfully pervasive in its mission that many of the larger companies that had opposed standardizing 25GbE in the IEEE, have joined the 25GbE Consortium and are now top-level promoters. Since then, the IEEE has changed its original position and has now standardized 25/50GbE.

However, now that 25/50GbE is an industry standard, it is interesting to look back and analyze whether the decision to form the Consortium was the right one.


There are many ways to handle such an analysis, but the best way is to compare the efficiency that modern ultra-fast and ultra-scalable data centers experience when running over 10/40GbE architecture versus over 25/50/100 architecture. Here, too, there are many parameters that can be analyzed, but the most important is the architecture’s ability to achieve (near) real-time data processing (serving the ever-growing “mobile world”) at the lowest possible TCO per virtual machine (VM).

Of course, processing the data in (near) real-time requires higher performance, but it also needs cost-efficient storage systems, which implies that scale-out software defined storage with flash-based disks must be deployed. Doing so will enable Ethernet-based networking and eliminate the need for an additional separate network (like Fibre Channel) that is dedicated to storage, thereby reducing the overall deployment cost and maintenance.

To further reduce cost, and yet to still support the faster speeds that flash-based storage can provide, it is more efficient to use only one 25GbE NIC instead of using three 10GbE NICs. Running over 25GbE also reduces the number of switch ports and the number of cables by a factor of three. So, access to storage is accelerated at a lower system cost.  A good example of this is the NexentaEdge high performance scale-out block and object storage that has been deployed by Cambridge University for their OpenStack-based cloud.


Building a bottleneck-free storage system is critical for achieving the highest possible efficiency of various workloads in a virtualized data center. (For example, VDI performance issues begin in the storage infrastructure.) However, no less important is to find ways to reduce the cost per VM, which can be best accomplished by maximizing the numbers of VMs that can run over a single server. With the growing number of cores per CPU, as well as the growing number of CPUs per server, hundreds of VMs can run over a single server, cutting the cost per VM. However, a faster network is essential to avoid being IO bounded. For example, a simple ROI analysis of VDI deployment of 5000 Virtual Desktops that compares just the hardware CAPEX savings shows that running over 25GbE cuts the VM cost in half. Adding the cost of the software and the OPEX further improves the ROI.


The growth in computing power per server and the move to faster flash-based storage systems demands higher performance networking. The old 10/40GbE-based architecture simply cannot hit the right density/price point and the new 25/50/100GbE speeds are therefore the right choice to close the ROI gap.

As such, the move by Mellanox, Google, Microsoft, and others to form the 25Gb Consortium in order to push ahead with 25/50GbE as a standard despite the IEEE’s initial short-sighted rejection now seems like an enlightened decision, not only because of the IEEE’s ultimate change-of-heart, but even so more because of the performance and efficiency gains that 25/50GbE bring to data centers.

Content at the Speed of Your Imagination

media-clip-loop-200pxIn the past, one port of 10GbE was enough to support the bandwidth need of 4K DPX, three ports could drive 8K formats and four ports could drive 4K-Full EXR.  However, the recent evolution in the media and entertainment industry that has been presented this week at the NAB Show showcases the need for higher resolution.  This trend continues to drive the need for networking technologies that can stream more bits per second in real-time. However, these number of ports can drive only one stream of data. New films or video productions today include special effects that necessitate the need to support multiple streams simultaneously in real-time. This creates a major “data size” challenge for the studios and post-production shops, as 10GbE interconnects have been maxed-out and can no longer provide an efficient solution that can handle the ever-growing workload demands.

This is why IT managers should consider using the new emerging Ethernet speeds of 25, 50, and 100GbE. These speeds have been established as the new industry standard, driven by a consortium of companies that includes Google, Microsoft, Mellanox, Arista, and Broadcom, and recently adopted by the IEEE as well.  A good example of the efficiency that higher speed enables is Mellanox ConnectX-4 100GbE NIC that has been deployed in Netflix’s new data center. This solution now provides the highest-quality viewing experience for as many as 100K concurrent streams out of a single server. (Mellanox also published a CDN reference architecture based our end-to-end 25/50/100GbE solutions including: the Mellanox Spectrum™ switch, the ConnectX®-4 and ConnectX-4 LX NICs, and LinkX™ copper and optical cables.)



Bandwidth required for uncompressed 4K/8K video streams

Another important parameter that IT managers must take into account when building media and entertainment data centers is the latency that it takes to stream the data. Running multiple streams over the heavy and CPU-hungry TCP/IP protocol will result in lower CPU utilization (as a significant percentage of the CPU cycles will be used to run the data communication protocol and not the workload itself), which will reduce the effective bandwidth that the real workload can use.

This is why IT managers should consider deploying RoCE (RDMA over Converged Ethernet). Remote Direct Memory Access (RDMA) makes data transfers more efficient and enables fast data move­ment between servers and storage without involving the server’s CPU. Throughput is increased, latency reduced, and CPU power freed up for video editing, compositing, and rendering work. RDMA technology is already widely used for efficient data transfer in render farms and in large cloud deployments such as Microsoft Azure, and can accelerate video editing, encoding/transcoding, and playback.



RoCE utilizes advances in Ethernet to enable more efficient implementations of RDMA over Ethernet. It enables widespread deployment of RDMA technologies in mainstream data center applications. RoCE-based network management is the same as that for any Ethernet network management, eliminating the need for IT managers to learn new technologies. Using RoCE can result is 2X higher efficiency since it doubles the number of streams compared to running over Ethernet (source: ATTO technology).


The impact of RoCE for 40Gb/s vs. TCP in the number of supported video steams


Designing data centers that can serve the needs of the media and entertainment industry has traditionally been a complicated task that has often led to slow streams and bottlenecks in the pure storage performance, and in many cases has required the use of very expensive systems that resulted in lower-than-expected efficiency gains. Using high performance networking that supports higher bandwidth and low latency guarantees a hassle-free operation and enables extreme scalability and higher ROI for any industry-standard resolution and any content imaginable.

QCT’s Cloud Solution Center – Innovative Hyper Converged Solution at Work

On Tuesday, October 6, QCT opened its Cloud Solution Center located within QCT’s new U.S. corporate headquarters in San Jose. The new facility is designed to test and demonstrate modern cloud datacenter solutions that have been jointly developed by QCT and it’s technology partners. Among the demonstrated solutions, there was an innovative VDI deployment that has been jointly developed by QCT and Mellanox and based on a virtualized hyper-converged infrastructure with scale-out Software-Defined-Storage and connected over 40GbE.


VDI enables companies to centralize all of their desktop services over a virtualized data center. With VDI, users are not tied to a specific PC and can access their desktop and run applications from anywhere. VDI also helps IT administrators by creating more efficient and secure environments, which enables them to better serve their customers’ business needs.


VDI efficiency is measured by the number of virtual desktops that a specific infrastructure can support, or, in other words, by measuring the cost per user. The major limiting factor is the access time to storage. Replacing the traditional Storage Area Network (SAN) architecture with a modern scale-out software-defined storage architecture with fast interconnect supporting 40GbE significantly eliminates potential bottlenecks, enabling the lowest total cost of ownership (TCO) and highest efficiency.

Continue reading

Ethernet That Delivers: VMworld 2015

Just one more week to go before VMworld 2015 begins at Moscone Center in San Francisco. VMworld is the go-to event where business and technical decision makers converge.  In recent years, this week-long conference has become the major virtualization technologies event, and this year is expected to be the biggest ever.


We are thrilled to co-present a breakout session in the Technology Deep Dives and Futures track: Delivering Maximum Performance for Scale-Out Applications with ESX 6 [Tuesday, September 1, 2015: 11AM-Noon]


Session CTO6454:
Presented by Josh Simons, Office of the CTO, HPC – VMware and Liran Liss, Senior Principal Architect, Mellanox.

An increasing number of important scale-out workloads – Telco Network Function Virtualization (NFV), in-memory distributed databases, parallel file systems, Microsoft Server Message Block (SMB) Direct, and High Performance Computing – benefit significantly from network interfaces that provide ultra-low latency, high bandwidth, and high packet rates. Prior to ESX 6.0,Single-Root-IO-Virtualization (SR-IOV) and Fixed Pass through (FPT), which allow placing hardware network interfaces directly under VM control, introduced significant latency and CPU overheads relative to bare-metal configurations. ESXi 6.0 introduces support for Write Combining, which eliminates these overheads, resulting in near-native performance on this important class of workloads. The benefits of these improvements will be demonstrated using several prominent workloads, including a High Performance Computing (HPC) application, a Data-Plane-Development-Kit (DPDK) based NFV appliance, and the Windows SMB-direct storage protocol Detailed information will be provided to show attendees how to configure systems to achieve these results.

Continue reading

Storage Spaces Direct: If Not RDMA, Then What? If Not Mellanox, Then Who?

Over the past couple years, we have witnessed significant architectural changes affecting modern data center storage systems. These changes have had a dramatic effect, as they have practically replaced traditional Storage Area Network (SAN), which has been the dominant solution for over a decade.


When analyzing the market trends that led to this change, it becomes very clear that virtualization is the main culprit. The SAN architecture was very efficient when only one workload was accessing the storage array, but it has become much less efficient in a virtualized environment in which different workloads arrive from different independent Virtual Machines (VMs).


To better understand this concept, let’s use a city’s traffic light system as an analogy to a data center’s data traffic. In this analogy, the cars are the data packets (coming in different sizes), and the traffic lights are the data switches. Before the city programs a traffic light’s control, it conducts a thorough study of the traffic patterns of that intersection and the surrounding area.


Continue reading

Double Your Storage System Efficiency

Enable Higher IOPS while Maximizing CPU Utilization

As virtualization is now a standard technology in the modern data center, IT managers are  now seeking ways to increase efficiency by adopting new architectures and technologies that enable faster data processing and execute more jobs over the same infrastructure, thereby lowering the cost per job. Since CPUs and storage systems are the two main contributors to infrastructure cost, using fewer CPU cycles and accelerating access to storage are keys toward achieving higher efficiency.


The ongoing demand to support mobility and real-time analytics of constantly increasing amounts of data demands that new architectures and technologies be used, specifically those with smarter usage of expensive CPU cycles and as a replacement of old storage systems that were very efficient in the past, but that have become hard to manage and extremely expensive to scale in modern virtualized environments.


With an average cost of $2,500 per CPU, about 50% of compute server cost is due to the CPUs.  On the other hand, the I/O controllers cost less than $100. Thus, offloading tasks from the CPU to the I/O controller frees expensive CPU cycles, increasing the overall server efficiency. Other expensive components, such as SSD, will therefore not need to wait the extra cycles for the CPU. This means that using advanced I/O controllers with offload engines results in a much more balanced system that increases the overall infrastructure efficiency.


Continue reading

How to Achieve Higher Efficiency in Software Defined Networks (SDN) Deployments

During the last couple of years, the networking industry has invested a lot of effort into developing Software Defined Network (SDN) technology, which is drastically changing data center architecture and enabling large-scale clouds without significantly escalating the TCO (Total Cost of Ownership).


The secret of SDN is not that it enables control of data center traffic via software–it’s not like IT managers were using screwdrivers before to manage the network–but rather that it affords the ability to decouple the control path from the data path.  This represents a major shift from the traditional data center networking architecture and therefore offers agility and better economics in modern deployments.


For readers who not familiar with SDN, a simple example can demonstrate the efficiency that SDN provides:   Imagine a traffic light that makes its own decisions as to when to change and sends data to the other lamps.  Now imagine if that were changed to a centralized control system that takes a global view of the entire traffic pattern throughout the city and therefore makes smarter decisions on how to route the traffic.


The centralized control unit tells each of the lights what to do (using a standard protocol), reducing the complexity of the local units while increasing overall agility. For example, in an emergency, the system can reroute traffic and allow rescue vehicles faster access to the source of the issue.


 Tokyo Traffic Control Center;  Photo Courtesy of @CScoutJapan

Tokyo Traffic Control Center, @CScoutJapan

Continue reading