Fabric Collective Accelerator (FCA)


To meet the needs of scientific research and engineering simulations, supercomputers are growing at an unrelenting rate. As supercomputers increase in size from mere thousands to hundreds-of-thousands of processor cores, new performance and scalability challenges have emerged. FCA is a Mellanox MPI-integrated software package that utilizes CORE-Direct technology for implementing the MPI collective communications. FCA can be used with all major commercial and open-source MPI solutions that exist and being used for high-performance applications. FCA with CORE-Direct technology accelerates the MPI collectives runtime, increases the CPU availability to the application and allows overlap of communications and computations with collective operations. FCA allows for efficient collectives communication flow optimized to job and topology. It also contains support to build runtime configurable hierarchical collectives (HCOL) and supports multiple optimizations within a single collective algorithm.


  • Offload collectives communication from MPI process onto Mellanox interconnect hardware
  • Efficient collectives communication flow optimized to job and topology
  • Significantly reduce MPI collectives runtime
  • Native support for MPI-3
  • Blocking and nonblocking collectives
  • Hierarchical communication algorithms (HCOL)
  • Multiple optimizations within a single collective algorithm
  • Increase CPU availability and efficiency for increased application performance
  • Seamless integration with MPI libraries and job schedulers