Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system
2007; Volume: 8; Issue: 4 Linguagem: Inglês
10.12694/scpe.v8i4.432
ISSN1895-1767
AutoresThomas Häuser, Aravind Dasu, Arvind Sudarsanam, S. Young,
Tópico(s)Algorithms and Data Compression
ResumoLower/Upper triangular (LU) factorization plays an important role in scientific and high performance computing. This paper presents an implementation of the LU decomposition algorithm for double precision complex numbers on a star topology based multi-FPGA platform. The out of core implementation moves data through multiple levels of a hierarchical memory system (hard disk, DDR SDRAMs and FPGA block RAMS) using completely pipelined data paths in all steps of the algorithm. Detailed performance numbers for all phases of the algorithm are presented and compared to a highly optimized implementation for a low power microprocessor based system. We also compare the performance/Watt for the FPGA and the microprocessor system. Finally, recommendations will be given on how improvements of the FPGA design would increase the performance of the double precision complex LU factorization on the FPGA based system.
Referência(s)