All posts by admin

Mellanox Named HP’s 2012 Innovation Supplier of the Year

We’re thrilled to start out 2013 with some great news: Mellanox was named HP’s 2012 Innovation Supplier of the Year at last month’s annual HP Supplier Summit in San Francisco.

Mellanox was chosen as the top supplier out of 600 worldwide contractors across all HP product lines. To determine the winner of the Innovation Supplier of the Year Award, HP evaluated an elite group of suppliers with outstanding performance exemplifying principles of delivering greater value, including enhanced revenue, cost savings and process efficiencies.

Earlier this year, Mellanox announced that its Ethernet and InfiniBand interconnect solutions are now available through HP to deliver leading application performance for the HP ProLiant Generation 8 (Gen8) servers. Specific products available include: Mellanox ConnectX®-3 PCIe 3.0 FDR 56Gb/s InfiniBand adapters and 10/40GbE NICs, and SwitchX® FDR 56Gb/s InfiniBand switch blades and systems. Mellanox offers the only interconnect option for the HP ProLiant Gen8 servers that includes PCIe 3.0-compliant adapters.

We look forward to the continued partnership with HP in 2013. And stay tuned to our blog to learn more about new and innovative partnerships between Mellanox and its customers throughout the year.

Microsoft WHQL Certified Mellanox Windows OpenFabrics Drivers

Last week, Mellanox released the latest Microsoft WHQL certified Mellanox WinOF 2.0 (Windows OpenFabrics) drivers. This provides superior performance for low-latency, high-throughput clusters running on Microsoft Windows® HPC Server 2008.

You may be asking yourself, how does this address my cluster computing needs? Does the Windows OFED stack released by Mellanox provide the same performance seen on the Linux OFED stack release?

Well, the Windows networking stack is optimized to address the needs of various HPC vertical segments. In our benchmark tests with MPI applications that require low-latency and high-performance, the latency is in the low 1us with bandwidth of 3GByte/sec uni-directional using the Microsoft MS-MPI protocol.

Mellanox’s 40Gb/s InfiniBand Adapters (ConnectX) and Switches (InfiniScale IV) with their proven performance efficiency and scalability, allow data centers to scale up to tens-of-thousands of nodes with no drop in performance.  Our drivers and Upper Level Protocols (ULPs) allow end-users to take advantage of the RDMA networking available in Windows® HPC Server 2008.

Here is the link to show the compute efficiency of Mellanox InfiniBand compute nodes compared to Gigabit Ethernet (GigE) compute nodes performing mathematical simulations on Windows® HPC Server 2008.

As the saying goes “The proof is in the pudding.” Mellanox InfiniBand interconnect adapters and technology is the best option for all Enterprise Data Center and High Performance computing needs.

Satish Kikkeri

QuickTransit Performance Results

As previously suggested, I will review in this post a different application that is focused on converting protocols. QuickTransit, developed by a company called Transitive (recently acquired by IBM), is a cross-platform virtualization technology which allows applications that have been compiled for one operating system and processor to run on servers that use a different processors and operating systems, without requiring any source code or binary changes.

We are using: QuickTransit for Solaris/SPARC-to-Linux/x86-64 which we used to test for Latency by a basic test which was related to the financial-industry operating method and involves interconnect between servers performance.

The Topology we’ve used was 2 servers (the 1st acting as server and the 2nd as a client). We’ve measured Latency with different object sizes and rates when running using the following interconnects GigE, Mellanox ConnectX VPI 10GigE, and Mellanox ConnectX VPI 40Gb/s InfiniBand. I would like to re-iterate, to any of you who have not read the first posts, that we’re committed to our guideline of “out-of-the-box”, meaning that neither the application nor any of the drivers are to be changed after downloading it off of the web.

With InfiniBand we’ve used 3 different Upper-Layers-Protocols (ULPs) – none requiring code intervention; IPoIB connect-mode (CM), IPoIB datagram mode (UD), and Socket-Direct-Protocol (SDP). The results were stunning mainly because our assumption was that with all the layers of software, in addition to the software which converts Sparc Solaris code to x86 Linux code, the interconnect will have small impact, if at all.

We’ve learned that 40Gb/s InfiniBand performance is significantly better then GigE for a wide range of packets size and transmission rates. We could see superiority in latency of over 2x faster when using InfiniBand and 30% faster execution when using 10GigE. Go and beat that…

Let’s look at the results in a couple of different ways. In particular, let’s look at the size of the messages being sent – the above advantage is related to the small message sizes (see graph #2) while when moving to larger message sizes the advantage (which, as it is, is strikingly better) becoming humongous.

In my next blog I plan to show more results that are closely related to the financial markets. If anyone out there identifies an application they would like our dedicated team to benchmark, please step forward and send me an e-mail.

Nimrod Gindi

Chilean Stock Exchange Streamlines Securities Transactions With IBM

In case you missed it, IBM recently made an announcement regarding their WebSphere MQ Low Latency Messaging running over native InfiniBand enabled Blade Servers.

The performance the Chiliean Stock Exchange is seeing is really impressive – 3000 orders per second with latency reduced by 100x of its current level. Latency performance is very critical for the financial markets, and InfiniBand is certainly showing it is the preferred data center connectivity platform of choice.

Motti Beck

Hitting the New Year Running – Virtualization

You don’t have to ask – vacation was awesome and as always not as long as we would like it to be.

Now that we’ve taken the rust off our fingers, we’ve made progress with a bit more complex testbed.

We’ve decided to look at the virtualization space and run our next application on top of VMware ESX 3.5. The application we’ve picked was the Dell DVD-Store application. Dell DVD Store is a complete online e-commerce test application, with a backend database component, a web application layer, and driver programs. In order to stay in-line with what is being used in the industry we’ve taken a 2-tier configuration which is using a MS SQL server (which will be running on VMware). This means we’ve used (as you can see in the picture) 2 hosts/systems running 20 Virtual Machines, Microsoft SQL server and Client driver.

The database contained a size of 1GB, serving 2,000,000 customers. During the testing we increased the number of Virtual Machines running the client driver from 2 to 20, and measured the number of generated orders per minute from the database.

The only change we performed after the out of the box deployment (which if you recall we’ve set as our goal) in order to execute the test more efficiently, was some developed scripts we created for test execution and results analysis.

The results of our tests are shown in the graph below:
The results clearly show a greater than 10% benefit when using VPI (both 10GigE and 40Gb/s InfiniBand). We’ve added the “up to 10 VMs” results, but from our results it seemed that the valid numbers (when jitter is not a factor) are till 8 VMs, and it seems like there is a dependency on the amount of cores on the systems running VMware.

In my next blog post I’ll plan to either review a new application or anther aspect of this application.
Nimrod Gindi

Enabling the middle-ware to be super fast

As promised in my last post, and after reviewing the OP-EX and CAP-EX saving provided by looking at a Virtual Protocol Interconnect (VPI) oriented data center, we need to look at how the business can benefit from using such unified systems.

As described in my first post, we will be using off-the-shelf market-known applications from companies which are known in the industry. This post will review work done with GigaSpaces, a leading application provider in the financial sector, using their XAP 6.6.0.

Benchmark Software/Middleware components:
- GigaSpaces XAP 6.6.0
– GigaSpaces API: Java openspaces
– Space operation measured: write
– Sun JVM 1.6

We wanted to focus on one of the most important factors for the financial sector: low-latency and comparing the different interconnects: 1GigE, VPI (10GigE), and VPI (40Gigb/s InfiniBand). The results were stunning for both the “Mellanox High-Performance Enterprise Team” and GigaSpaces (who provided us great help in getting this benchmark running and analyzing the results).

The VPI (both IB and 10GbE) is better than GigE by 25 % to 100 % (the more partitions, more users, and larger objects to be used the more benefit the VPI technology will provide). When comparing the interconnect options provided by VPI, IB would see better performance than 10GbE. Latency as presented with GigaSpaces is below 1 ms transaction latency including sync with backup with 4K objects, with large amounts of concurrent users hitting the system in a high update rate. As you know, I truly believe in seeing the results and therefore below you’ll find the graphs of the results from our testing (which instantly generated quite of an interest with people in the industry).

In my next blog post I will review a variety of applications which we’ve conducted tests on – stay tuned.

But before I say my goodbyes I’ve got good news and bad news… Where to start?

Well, I’ll start with the bad – my next blog post will be taking place only next year; the good ones are (at least for me) that I’ll be on vacation

Have a happy new-year…
Nimrod Gindi

Look at this beautiful rack!

This week’s blog is short, but it’s about the candy: the Rack — the Data Center’s building block.
The pictures below visually describe what each one of us would like to have in their Data Center.

Density – over 150 cores within less then 10U. Three different interconnects, 1GigE, 10GigE and 40Gb/s InfiniBand, using two adapters and no thick jungle of cables. –> 25% Savings in rack space.

Power – less servers, w/o giving up any compute power; less adapters, without giving up any capabilities; less switches, without giving up any reliability or bandwidth –> 35% Savings in power.

Cost – with a smaller amount of switches and smaller servers’ size, the saved space enables better cooling. Cost is (inevitably) lower by 25%.

Just imagine this Rack with only a single interconnect of choice, and you’ll experience what I and many people have seen: a simple tidy solution leads to better functioning of teams and faster responses to problems (if they ever occur).

Bringing the rack into a functional condition hasn’t been the easiest thing, I agree. When last time I said that some “labor pain” was involved, I mainly meant pain in finding a place in the data center… I never knew how hard it could be to allocate floor space before going through this experience. But once we got the rack built in place (standing there in the corner can be a bit claustrophobic  ), sliding in the servers and switches took almost zero time. And thanks to a pre-prepared image of the OS, the entire rack was up-and-running within less than 24 hours.

I’ll leave you at this point to see the rack for yourself. I’ll be back in my next post with the first market application that we’ve used with that “Data Center in a Rack” – GigaSpaces.

Nimrod Gindi
















System Picking: Ready, Set, Go!

To recap my previous post, we’ve been setting the stage upon which vendors were to be evaluated and we’re ready for the “big race” (which we’ll do without “naming names”):

System: I’ve considered 2 different dense systems which both followed the CPU and memory requirements: dual-socket quad core, 16GB memory (2GB per core), and support for PCI-Express Gen2. One was a blade server system from a Tier-1 vendor and the other was a 1U server which provided more-for-less (2 servers in 1U). We reviewed power requirements from each (blades were better in this category), cost (differences were >10%) and space (1U servers saved some space). Also, if we didn’t need an external switch the blades would then require less (which impacts the big 3: Power, Cost, and Space).

I/O: We wanted to have all 3 dominant interconnects and reviewed switches and NICs separately.

Switches: 1GigE (many options in 1U and we just had to compare power and cost); 10GigE (there weren’t many options and we considered 3 options which varied in the performance they provided and the price), and 40Gb/s InfiniBand (from us/Mellanox).

NICs: 1GigE (we’ve decided to use the on-board); for 10GigE and 40Gb/s InfiniBand we picked our/Mellanox ConnectX adapters which provides the Virtual Protocol Interconnect (VPI) option (best-in-class performance with both 10GigE and 40Gb/s InfiniBand on the same NIC).

Storage: As mentioned in my previous posts I wanted to use a Tier-1 vendor which would provide us the access to all I/O options, and if necessary, add a gateway to enable all of the options. (I’m planning phase 2 which would include Tier-2 vendors as well, but it is yet to be executed). The choice was fairly easy due to the limited number of players in the storage arena.

Needless to say, we negotiated prices (hopefully effectively) and shared our concerns and performance targets with all vendors involved to help them come forward with the best system which met these requirements. As a result, we’ve been exposed to many future systems which promise to meet our requirements BUT keeping-ourselves-honest to the “off-the-shelf” criteria we initially set and promised to follow, we narrowed the “sea of promises” to what we can see, touch, and use today.

Picking the system proved to be a hard and a long process but nothing prepared me for the bureaucracy of the PO process  (which I won’t go into…). At the end of the day we chose 1U servers ands storage with block-storage with file-system overriding it.

I’ll finish-up with the saving numbers (if you would like additional details on this, you can send me an email) and in my next post I’ll shortly describe the labor pains of the hardware bring-up. Last, but not least, the HUGE differences: POWER saving at ~35%, CAP-EX saving over 25%, and SPACE saving cost at the 25% mark.

Nimrod Gindi

Enterprise Data Center: Picking Hardware Can Be Hard Work

Re-capping last week’s post…I knew we wanted to have a system which would contain all the building blocks of the data center in a single (easily expendable) rack. Internally for Mellanox, I felt we should review the full procurement process to understand and provide data-center managers with better understanding/knowledge of the hard, and proven to be sometimes painful, process.

Now with that high level of understanding in place, we were required to start taking ideology to reality and decide on components to be purchased. I wish it was as simple as it sounded…let’s buy it (Storage, CPU, and I/O), receive it, use it   — ya, right. When a data center manager attempts to buy hardware for specific or a set-of applications, there are many parameters to take into consideration (I bet each of us unconsciously does this when buying something for home use).

CPU – “can you feel the need? The need for speed.” Tom Cruise’s words from Top Gun applies here better then ever – and yes, we felt it too . We wanted to consider a system which would have 8 cores (we do want it to be valid next Monday, and I guess 8 cores can carry us at least that far). Since time was essential, we couldn’t wait for next generation CPUs which were promised to be just around the corner.

Storage – when considering this component we had to ensure a stable platform with all features (DeDup, high availability, hot-spares etc.), we wanted to have a variety of speeds (from SAS/FC 15k RPM to SATA). We narrowed down things to having a block-storage with a file-system overriding it externally (which would enable us to use both when required).

I/O – we wanted to pick a variety of interconnects: 1GigE, 10GigE and 40Gb/s (QDR) InfiniBand. Having a Virtual Protocol Interconnect (VPI) available made our decision easier, as it covered 2 out of 3 in single low-power adapter.

Bearing in mind all the above, we needed to pass our options via several filters to help us zero-down on the right selection.

We started with the big 3: Business Alignment, Cost and Time.

Cost – this is a tricky one… you have CAP-EX and OP-EX, which means we were required to consider each component for being low on power consumption and still be priced at a low, reasonable price.

Time – we were eager to start, so delivery time was a factor…Waiting 4 months for something was out of the question.

Business Alignment – I guess this is the most important but hardest to capture. For us, it needed to meet the following: have all I/O options, be off-the-shelf products, and we needed them to be able to be used with any application “you’ll throw at them”.

If anyone ever thought the above took us all the way home…well, I guess he is in for some surprises…In my next blog post I’ll list what differences we’ve found between 2 set-ups, both of which could address our business needs but were very much different in other major parameters.

Enterprise Data Center – Where do we start?

From my experience of working with enterprise market users, I’ve learned that regardless of the fact that everyone uses similar building blocks for their data center, with similar requirements, there is a great concentration on the application which creates endless diversification in the deployment and need of application centric concrete data for CIOs to make a decision.

When moving back to our HQ earlier this year I was challenged on how to provide that information fast and effective.

Together with some of our marketing and architecture organizations individuals, the idea to “become an end-user” came up. Easier said than done…How does an engineering driven vendor do that?

I’ve targeted taking off-the-shelf components that typically compose enterprise data-centers to provide a complete solution and have them tested to provide the end-users some basic data points to consider (without/before any specific changes or tuning performed).

Continue reading