Accelerating Genomic Analysis

One of the biggest catchphrases in modern science is Human Genome–the DNA coding that largely pre-determines who we are and many of our medical outcomes. By mapping and analyzing the structure of the human genetic code, scientists and doctors have already started to identify the causes of many diseases and to pinpoint effective treatments based on the specific genetic sequence of a given patient. With the advanced data that such analysis provides, doctors can offer more targeted strategies for potentially terminal patients at times when no other clinically relevant treatment options exist.

Brian Klaff 072314 Dell Genome

For example, in certain aggressive types of cancer, a patient would have little chance of survival without a precise reading of the genomic code behind the disease and the tailored treatment options that result from such analysis. One such cancer is neuroblastoma, which commonly affects children younger than age 5. Neuroblastoma is very difficult to treat effectively and has a high rate of recurrence, unless the genetic coding of the cancerous cells can be pinpointed and treated accordingly.

The field of precision medicine cannot, however, take flight without high performance technology. The analysis of the human genome is a very specialized skill that falls into the interdisciplinary field of Bioinformatics, which uses today’s best technology to retrieve, store, organize, and evaluate biological data to solve practical issues. Thanks to partnerships between the scientific community and leading technology companies, there have been great advances toward Bioinformatics becoming a mainstream capability.

One such partnership between high performance computing and Bioinformatics has been that of Dell’s Genomic Data Analysis Platform (GDAP) with the Translational Genomics Research Institute (TGen). TGen uses Next Generation Sequencing technologies to retrieve human genomes of pediatric neuroblastoma patients, while GDAP provides the computing power to analyze the data into relevant results.

However, there are a number of technological challenges that the project has faced, and Mellanox has been instrumental in overcoming these challenges. The data set from a single genome can range from 100-500GB of code. However large that may seem, it pales in comparison to the nearly 5TB of raw data that is produced and stored when the code is then analyzed.

Besides physical cluster requirements, there is a need for massive computational and networking capabilities to handle the large amounts of data and the complex genomic analysis. This requires high-performance computers, large storage capacity and memory, high bandwidth to bridge storage and compute resources, and low latency to enable lightning-fast multi-node computation.

A full genomic analysis using traditional networking solutions can take up to a week. With certain cancers, however, even a few hours can make a huge difference in a patient’s chances of survival. Something needed to be done to speed up the process to give precision medicine a chance at being effective.

TGen opted for Mellanox FDR InfiniBand as the interconnect protocol for its GDAP implementation. Mellanox helped to sculpt a network that included:

• Mellanox FDR 56Gb/s Host Channel Adapters (HCA) in the compute cluster, I/O servers, and connection to the Lustre file system
• Mellanox SX6036 FDR 56Gb/s InfiniBand switches
• RDMA based-storage
• Performance optimizations for the pipeline

TGen set a goal to reduce its time-to-analysis within its sequencing system from the existing standard of one week down to five days. This would provide critical time in which doctors might provide a pinpointed treatment plan for patients with time-critical illnesses.

Incredibly, upon implementing GDAP with Mellanox FDR InfiniBand interconnect, the resulting time-to-analysis was as little as one hour! With high performance computing combined with lightning fast interconnect that overcomes bottlenecks in the transfer of data from storage and the file system to the compute cluster, as well as RDMA to avoid multiple copies of the sequencing data, there was a performance boost of over 160X.

Mellanox is proud to have accelerated the fight against pediatric cancer and helped bring precision medicine to a point where it can be clinically relevant.

Brian Klaff
Author: Brian Klaff is a Senior Technical Communicator at Mellanox. Prior to Mellanox, Brian served as Director of Technical & Marketing Communications for ExperTeam, Ltd. He has also spent time as a Technical Communications Manager at Amdocs Israel, as a Product Marketing Manager at Versaware Technologies, and as a consultant specializing in mobile telecommunications for Mercer Management Consulting. Brian holds a BA in Economics & Near Eastern Studies from Johns Hopkins University.