Category Archives: 10 Gigabit Ethernet

Mellanox ConnectX Ethernet – I/O Consolidation for the Data Center

Today’s data center requires a low-cost, low-power I/O solution with network flexibility to provide I/O consolidation on a single adapter. Network administrators want the best performance, scalability, latency while solving all their LAN, SAN and IPC (Clustering) needs packed into one adapter card in a virtualized or data center environment.

ConnectX® is a single chip solution from Mellanox that provides these features for the Data Center I/O unification with its hardware and software capabilities.  ConnectX® EN 10 Gigabit Ethernet drivers provide seamless connectivity by providing optimized 10 Gigabit Ethernet I/O services that easily scale with multi-core CPUs and virtualized servers and storage architectures.

Mellanox entry into the 10GigE landscape was rather late if you consider 10GigE started showing up in 2001 on servers as PCI-X followed by PCIe adapters. With Mellanox’s extensive experience in high-performance computing and broad range of industry relationships, it has forged ahead with this technology and was the first company to offer 10GigE with PCIe 2.0. Along the way, our products have matured to become the market-leader for performance, latency, as well as consolidating all data center networking onto a single adapter.

In a span of less than 2 years, Mellanox has introduced a broad range of products supporting various media interconnects and cabling options including UTP, CX4 for copper and SR, LR and LRM for fiber optics.

Technology leadership in networking requires that a company not only have the best hardware solution, but compliment this with the best software solution to make a winning combination.

In my experience, working at other early startups, as well as network technology bellwether 10Gigabit Ethernet companies, the Gen1/Gen2 10GigE products introduced lacked the vision of what the end-customer requirements were. The products were a “mish-mosh” of features addressing 10GigE for LAN, clustering (iWARP), TCP acceleration (aka TOE) and iSCSI acceleration. They missed the mark by not solving the pain-points of a data center, be it blazing performance, low-latency, CPU utilization or true I/O consolidation.

Mellanox took the holistic approach to data center networking with a deep understanding from its InfiniBand leadership and knowledgebase, server and system configuration, virtualization requirements and benefits, driver software requirements, and most importantly, understanding customer requirements for each vertical segment.

Today, ConnectX® EN 10 Gigabit Ethernet drivers support a broad array of major operating systems, including Windows, Linux, VMware Infrastructure, Citrix XenServer and FreeBSD.

The ConnectX® EN 10 Gigabit Ethernet drivers provide:

– All Stateless offload features
– Virtualized accelerations
– Data Center Ethernet (DCE) support
– FCoE with full hardware offload for SAN consolidation on 10GigE
– Lowest 10GigE (TCP/IP) latency comparable to expensive iWARP solutions
– Single Root – IO Virtualization (SR-IOV) for superior virtualization performance
– Linux Kernel and Linux Distribution support
– WHQL certified drivers for Windows Server 2003 and 2008
– VMware Ready certification for VMware Virtual Infrastructure (ESX 3.5)
– XenServer 4.1 inbox support
– Line-rate performance with very low CPU utilization
– Replace multiple GigE NICs with a single ConnectX Ethernet adapter

To complete the last piece of the puzzle, i.e. IPC (clustering) for the Data Center, I’ll soon post in my blog on Industry’s Low Latency Ethernet (LLE) initiative and its advantages compared to current available clustering solutions on 10GigE.

Regards,
Satish Kikkeri
satish@mellanox.com

Moore’s Law’s Data Center Disruption

Change happens, and when you talk to anyone involved in the enterprise data center, change has been accelerating and is making their life more and more complicated. The most recent issue is the growing list of network protocols which the network engineer has to choose from.

 

Previously, the decision on what network protocol was very simple. For IP traffic, you used Ethernet, and for storage, Fibre Channel. Speeds were pretty simple to choose from also. You used 1 Gb Ethernet for the IP and 2 or 4 Gb Fibre Channel. The only challenge was choosing the vendor to purchase the equipment from.

 

Now what has happened is Moore’s Law has made the legacy data center network obsolete. Moore’s Law was originally conceived by one of the founders of Intel, Gordon Moore. He noticed that every generation of microprocessor that Intel made tracked a straight line when transistor count was plotted against time. What was more profound, he noticed that most all semiconductor companies tracked this line. He determined that transistor density of the microprocessors doubled every 18 months. His world famous graphical plot is still used today and now used to describe the steady march of technology.

 

Moore’s Law has caused an issue in the data center. Here is what has happened. For any data center to work properly, its major building blocks (storage, servers and network) should be in balance. Meaning, for them to work most efficiently, they should be matched. Also, you could say these three components of the data center have their functionality primarily dependent on semiconductor manufacturing processes i.e. the advance of Moore’s Law. Historically, storage and servers have tracked Moore’s Law very nicely. But when you look at the network you find a big discrepancy. Ethernet and Fibre Channel have not been tracking Moore’s Law. What has happened recently is that the efficiencies of server processing power and storage bandwidth have progressed so far ahead of the network, that the network has become a bottleneck.

 

Looking at present day data center networks, you can see that not only is the performance sub-par to the I/O needs of the server and storage, but also its functionality and features are woefully behind too. Why is this? If you look at Ethernet and Fibre Channel, you discover these protocols don’t track Moore’s Law. Go ahead and plot the advance in bandwidth over time with both Ethernet and Fibre Channel. Then overlay that onto server CPU density and storage bandwidth (aggregated) and you discover that the legacy network (Ethernet and Fibre Channel) have fallen way behind. Even their future roadmaps don’t track Moore’s Law. We are beginning to see the bottlenecks happening. While Ethernet is very popular, it was never designed for the data center. (Try pumping lots of data from tens-to-hundreds of servers and watch the congestion)! Fibre Channel is really too slow. Even 8 Gb is too slow. This lack of matching the technological advance of the servers and storage has made traditional approaches to data center network topology a dead-end. To get back in balance, the network needs to be matched using newer ways of deploying data enter networks.

 

Getting back to my original point; the network administrator of a large data center is probably noticing network problems and is pretty fed up with having to run 8 to 10 network cables to every server. Also, he can move servers anywhere from his desktop but when it comes to the network, he has to physically go into the data center and add NICs and HBAs plus cables. Throwing adapters and more cables at the problem is counterintuitive and not productive. These activities drive CapEx and OpEx through the roof.

 

There are many new network technologies which are available to the data center network administrator that offer compelling solutions to the Moore’s Law problem. 10Gb Ethernet, Low Latency Ethernet, Data Center Ethernet and InfiniBand all offer a wide range of features and solutions for the enterprise data center and cloud computing. The issue is, can people let go of the legacy way and embrace a new way to think about their network? It’s not about the protocol anymore. There are too many choices for that. The new way is to leverage what makes the most sense for the application. By leveraging the newer protocols and their powerful features

 

The change in the enterprise data center which is causing the network problems is actually a good thing. It is forcing people to think about how they deploy their networks in a new light. By adapting an open viewpoint rather than stubbornly holding onto legacy ways, the network engineer in the enterprise data center can leverage powerful alternatives which makes choice a good thing.


Tony Rea
tony@mellanox.com

Gain A Competitive Advantage

BridgeX received an excellent response from all the analysts that we briefed over the last few weeks. 

 One article talked about how BridgeX reminded the author of the early days of networking when networking companies delivered bridges for Ethernet, Token Ring and Banyan Vines.  The other one talked about the mish-mosh of protocols in the data center as a familiar story.   

 In my opinion, when data centers moved from Fast Ethernet to Gigabit Ethernet it was an easy decision because of the 10x performance improvements that were necessitated by the growth in Internet applications. The same 10x performance is now available with 10 Gigabit Ethernet but the data centers have not jumped into deploying the technology yet. Why?  The killer-app for 10 Gigabit Ethernet is I/O consolidation but the Ethernet protocol itself is still being enhanced in order for it to be deployed as an I/O consolidation fabric. Enhancements to the Ethernet protocol are being made within the IEEE Data Center Bridging Workgroup.  These enhancements will deliver new functionalities to Ethernet, yet the timeline for products is still a BIG question mark. Normally, in a growth economy, products will roll out within 12 to 18 months of spec finalization, whereas in the current economic condition it might taker a longer time and the spec is at least 18 months away for finalization. Till then,10 Gigabit Ethernet deployments will happen in data centers in smaller, niche applications and will not be deployed for I/O consolidation. So, if data centers want to save energy costs, reduce floor space and lower TCO today, then deploying a proven I/O consolidation fabric is critical. 

Just some of the enhancements currently being made to the Ethernet protocol in the IEEE:

  1. Lossless fabric
  2. Creating Virtual Lanes and providing granular QoS
  3. Enabling Fat-Tree
  4. Congestion management

These are already part of the InfiniBand fabric which has been shipping for almost 9 years now, and has been successfully deployed in several data centers and high-performance commercial applications.

Oracle Exadata is a great product that drives InfiniBand to the forefront of data centers for database applications. Exadata brings in new thinking and new strategy for delivering higher I/Os and lowering energy costs. Exadata certainly delivers a competitive advantage. 

Similarly, BridgeX coupled with ConnectX adapters and InfiniScale switching platforms provides competitive advantages by delivering a cost-optimized,I/O consolidation fabric. Data centers can consolidate their I/O using InfiniBand as the physical fabric and the virtual fabric will continue to be Ethernet or Fibre Channel. This means that the applications that need an Ethernet transport or a Fibre Channel transport will run un-modified in the InfiniBand cloud.   

I think it is time for the data centers to take a new look at their infrastructure and re-strategize the investments to gain an even greater competitive advantage. When the economy turns around, those who have infrastructure that can leapfrog their competition will eventually win.

TA Ramanujam (TAR)
tar@mellanox.com

Mellanox at VMworld Europe

Yesterday, myself along with Motti Beck and Ali Ayoub (our main VMware software developer at Mellanox) diligently put together a very compelling demo that highlights the convergence capabilities of our BridgeX BX 4000 gateway that we announced last week.

We unpacked everything and got it all up and running in less than an hour (this after we sorted out the usual power and logistical issues that always comes with having a booth).

 

 
The slide below illustrates the topology of the demo. Essentially, we have two ConnectX adapters cards in one of the Dell server running two different interconnect fabrics. One adapter is running 40Gb/s InfiniBand, while the other adapter is running 10 Gigabit Ethernet.

1. The 40Gb/s InfiniBand adapter is connected to our MTS3600 40Gb/s InfiniBand switch which then passes through the BridgeX BX4020 where we convert the packets to Ethernet. The packets then run through the Arista 10GigE Switch and then into the LeftHand Appliance Virtual Machine which resides on the Dell Server (which is running ESX 3.5 and our certified 10GigE driver over our ConnectX EN 10GigE SFP+ adapter). We are showing a movie from the iSCSI storage on the IB end-point (the Dell Linux Server).

2. The 10 Gigabit Ethernet Adapter connects directly to the BridgeX BX4020 where it converts the traffic to FC (effectively FCoE). The traffic then moves to the Brocade Fibre Channel switch and then directly to the NetApp storage. We are showing a movie from the FC NetApp storage on the 10GigE end-point (the Dell Linux Server).

If you are coming to VMorld Europe (or already here) come and see us in Booth #100 and we will be happy to walk you through the demo.

Brian Sparks
Director, Marketing Communications
brian@mellanox.com

I/O Agnostic Fabric Consolidation

Today, we announced one of the most innovative and strategic product – BridgeX, an I/O agnostic fabric consolidation silicon and you drop that in a 1U enclosure it becomes a full fledged system (BX4000)

Few years back we defined our product strategy to deliver a single-wire I/O consolidation to data centers.  The approach was not to support some random transports to deliver I/O consolidation but use transports that the data centers are accustomed to for the smooth running of their businesses.  ConnectX, an offspring of this strategy supports InfiniBand, Ethernet and FCoE.    ConnectX consolidates the I/O on the adapter but the data still has to go through different access switches.   BridgeX, the second offspring of our product strategy supports a stateless gateway functionality which allows for access layer consolidation.   BridgeX provides the Data Centers to innovate and remove two fabrics by deploying a single InfiniBand fabric which can support several virtualized GigE’s, 10GigE’s, 2, 4 or 8Gig FC in a single physical server.  BridgeX with its software counterpart BridgeX Manager that runs alongside on a CPU delivers management functionality for vNICs and vHBAs for both virtual OS (VMWare, XEN, Hyper-V) and non-virtual OS’s (Linux and Windows).

Virtual I/Os and BridgeX a stateless gateway implementation provides packet / frame integrity.  Virtual I/O drivers on the host adds InfiniBand headers to the Ethernet or Fibre Channel frames and the gateway (BridgeX) removes the headers and delivers it on the appropriate LAN or SAN port.  Similarly, the gateway (BridgeX) adds the InfiniBand headers to the packets / frames that it receives from the LAN / SAN side and sends it to the host which removes the encapsulation and delivers packet / frame to the application.  This simple, easy, and innovative implementation saves not only deployment costs but also saves energy and cooling costs significantly.

We briefed several analysts the last few weeks and most of them concurred that the product is innovative and in times like this a BridgeX based solution can cut costs, speed-up deployments and improve performance.

TA Ramanujam (TAR)
tar@mellanox.com

Performance Testing 29West LBM

As promised in my last blog post (over two weeks ago), this post will focus on results from a more financial market-related application. The results below come from testing performed with 29West LBM 3.3.9.

29West LBM offers topic-based Publish/Subscribe semantics without a central server. Its primary design goal is to minimize latency. Many end-users and middleware providers incorporate LBM into their own software via the LBM API. The paradigm being used is a Publisher/subscriber which is an asynchronous messaging paradigm where senders (publishers) of messages are not programmed to send their messages to specific receivers (subscribers). Rather, published messages are characterized into classes, without knowledge of what (if any) subscribers there may be. Subscribers express interest in one or more classes, and only receive messages that are of interest, without knowledge of what (if any) publishers there are.

We’ve conducted the testing with 2 servers – full set-up details on the hardware side was, as always, the default, out-of-the-box EDC testing cluster that we’ve all experienced and learned from during the first set of blog posts. Using 29West LBM we’ve used 2 separate test runs: Lbmpong – for latency and Lbmsrc/lbmrcv – for msg rate. For the 2 tests, we’re using the following interconnects: GigE, Mellanox VPI 10GigE, and Mellanox VPI 40Gb/s InfiniBand.

When using InfiniBand we’ve used 3 different Upper-Layers-Protocols (ULPs), which didn’t require any code intervention; IPoIB connect-mode (CM), IPoIB datagram mode (UD) and Socket-Direct-Protocol (SDP).

Unlike the hardware, which would not change, it is important to note the software versions used may change due to regular official software release updates, and since we’re using only off-the-shelf releases, this may change. The Mellanox ConnectX VPI Firmware version is 2.6.0 and OFED (Driver) Version is 1.4, all running on RHEL 5up2 as the OS.

We theoretically knew the results of the 40Gb/s InfiniBand would be better, but didn’t estimate the difference correctly. 10GigE and InfiniBand are better then GigE in the following order (from high to low): SDP, IPoIB Connected, IPoIB Datagram (up to 8KB) and 10GigE In latency from 30-80% in msg rate, in msg size bigger then 1kb, from 200-450%.

 

you can download the full results here.

In the next couple of weeks I will be traveling to Singapore to speak at the IDC FinTech conference. Look me up if you plan to attend. If I a not able to post anther blog before that, I will make sure to eat the famous Singapore chili-crab for my readers and I will make sure to tell you how it was… I meant the conference as well, not only the crab

Nimrod Gindi

nimrodg@mellanox.com

QuickTransit Performance Results

As previously suggested, I will review in this post a different application that is focused on converting protocols. QuickTransit, developed by a company called Transitive (recently acquired by IBM), is a cross-platform virtualization technology which allows applications that have been compiled for one operating system and processor to run on servers that use a different processors and operating systems, without requiring any source code or binary changes.

We are using: QuickTransit for Solaris/SPARC-to-Linux/x86-64 which we used to test for Latency by a basic test which was related to the financial-industry operating method and involves interconnect between servers performance.

The Topology we’ve used was 2 servers (the 1st acting as server and the 2nd as a client). We’ve measured Latency with different object sizes and rates when running using the following interconnects GigE, Mellanox ConnectX VPI 10GigE, and Mellanox ConnectX VPI 40Gb/s InfiniBand. I would like to re-iterate, to any of you who have not read the first posts, that we’re committed to our guideline of “out-of-the-box”, meaning that neither the application nor any of the drivers are to be changed after downloading it off of the web.

With InfiniBand we’ve used 3 different Upper-Layers-Protocols (ULPs) – none requiring code intervention; IPoIB connect-mode (CM), IPoIB datagram mode (UD), and Socket-Direct-Protocol (SDP). The results were stunning mainly because our assumption was that with all the layers of software, in addition to the software which converts Sparc Solaris code to x86 Linux code, the interconnect will have small impact, if at all.

We’ve learned that 40Gb/s InfiniBand performance is significantly better then GigE for a wide range of packets size and transmission rates. We could see superiority in latency of over 2x faster when using InfiniBand and 30% faster execution when using 10GigE. Go and beat that…


Let’s look at the results in a couple of different ways. In particular, let’s look at the size of the messages being sent – the above advantage is related to the small message sizes (see graph #2) while when moving to larger message sizes the advantage (which, as it is, is strikingly better) becoming humongous.
 

In my next blog I plan to show more results that are closely related to the financial markets. If anyone out there identifies an application they would like our dedicated team to benchmark, please step forward and send me an e-mail.

Nimrod Gindi
nimrodg@mellanox.com

Hitting the New Year Running – Virtualization


You don’t have to ask – vacation was awesome and as always not as long as we would like it to be.

Now that we’ve taken the rust off our fingers, we’ve made progress with a bit more complex testbed.

We’ve decided to look at the virtualization space and run our next application on top of VMware ESX 3.5. The application we’ve picked was the Dell DVD-Store application. Dell DVD Store is a complete online e-commerce test application, with a backend database component, a web application layer, and driver programs. In order to stay in-line with what is being used in the industry we’ve taken a 2-tier configuration which is using a MS SQL server (which will be running on VMware). This means we’ve used (as you can see in the picture) 2 hosts/systems running 20 Virtual Machines, Microsoft SQL server and Client driver.

The database contained a size of 1GB, serving 2,000,000 customers. During the testing we increased the number of Virtual Machines running the client driver from 2 to 20, and measured the number of generated orders per minute from the database.

The only change we performed after the out of the box deployment (which if you recall we’ve set as our goal) in order to execute the test more efficiently, was some developed scripts we created for test execution and results analysis.

The results of our tests are shown in the graph below:
 
The results clearly show a greater than 10% benefit when using VPI (both 10GigE and 40Gb/s InfiniBand). We’ve added the “up to 10 VMs” results, but from our results it seemed that the valid numbers (when jitter is not a factor) are till 8 VMs, and it seems like there is a dependency on the amount of cores on the systems running VMware.

In my next blog post I’ll plan to either review a new application or anther aspect of this application.
Nimrod Gindi
nimrodg@mellanox.com

Enabling the middle-ware to be super fast

As promised in my last post, and after reviewing the OP-EX and CAP-EX saving provided by looking at a Virtual Protocol Interconnect (VPI) oriented data center, we need to look at how the business can benefit from using such unified systems.

As described in my first post, we will be using off-the-shelf market-known applications from companies which are known in the industry. This post will review work done with GigaSpaces, a leading application provider in the financial sector, using their XAP 6.6.0.

Benchmark Software/Middleware components:
- GigaSpaces XAP 6.6.0
– GigaSpaces API: Java openspaces
– Space operation measured: write
– Sun JVM 1.6

We wanted to focus on one of the most important factors for the financial sector: low-latency and comparing the different interconnects: 1GigE, VPI (10GigE), and VPI (40Gigb/s InfiniBand). The results were stunning for both the “Mellanox High-Performance Enterprise Team” and GigaSpaces (who provided us great help in getting this benchmark running and analyzing the results).

The VPI (both IB and 10GbE) is better than GigE by 25 % to 100 % (the more partitions, more users, and larger objects to be used the more benefit the VPI technology will provide). When comparing the interconnect options provided by VPI, IB would see better performance than 10GbE. Latency as presented with GigaSpaces is below 1 ms transaction latency including sync with backup with 4K objects, with large amounts of concurrent users hitting the system in a high update rate. As you know, I truly believe in seeing the results and therefore below you’ll find the graphs of the results from our testing (which instantly generated quite of an interest with people in the industry).

In my next blog post I will review a variety of applications which we’ve conducted tests on – stay tuned.

But before I say my goodbyes I’ve got good news and bad news… Where to start?

Well, I’ll start with the bad – my next blog post will be taking place only next year; the good ones are (at least for me) that I’ll be on vacation

Have a happy new-year…
Nimrod Gindi
nimrodg@mellanox.com

Look at this beautiful rack!

This week’s blog is short, but it’s about the candy: the Rack — the Data Center’s building block.
The pictures below visually describe what each one of us would like to have in their Data Center.

Density – over 150 cores within less then 10U. Three different interconnects, 1GigE, 10GigE and 40Gb/s InfiniBand, using two adapters and no thick jungle of cables. –> 25% Savings in rack space.

Power – less servers, w/o giving up any compute power; less adapters, without giving up any capabilities; less switches, without giving up any reliability or bandwidth –> 35% Savings in power.

Cost – with a smaller amount of switches and smaller servers’ size, the saved space enables better cooling. Cost is (inevitably) lower by 25%.

Just imagine this Rack with only a single interconnect of choice, and you’ll experience what I and many people have seen: a simple tidy solution leads to better functioning of teams and faster responses to problems (if they ever occur).

Bringing the rack into a functional condition hasn’t been the easiest thing, I agree. When last time I said that some “labor pain” was involved, I mainly meant pain in finding a place in the data center… I never knew how hard it could be to allocate floor space before going through this experience. But once we got the rack built in place (standing there in the corner can be a bit claustrophobic  ), sliding in the servers and switches took almost zero time. And thanks to a pre-prepared image of the OS, the entire rack was up-and-running within less than 24 hours.

I’ll leave you at this point to see the rack for yourself. I’ll be back in my next post with the first market application that we’ve used with that “Data Center in a Rack” – GigaSpaces.

Nimrod Gindi
nimrodg@mellanox.com

 alt=