Artigo Revisado por pares

Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system

2007; Volume: 8; Issue: 4 Linguagem: Inglês

10.12694/scpe.v8i4.432

ISSN

1895-1767

Autores

Thomas Häuser, Aravind Dasu, Arvind Sudarsanam, S. Young,

Tópico(s)

Algorithms and Data Compression

Resumo

Lower/Upper triangular (LU) factorization plays an important role in scientific and high performance computing. This paper presents an implementation of the LU decomposition algorithm for double precision complex numbers on a star topology based multi-FPGA platform. The out of core implementation moves data through multiple levels of a hierarchical memory system (hard disk, DDR SDRAMs and FPGA block RAMS) using completely pipelined data paths in all steps of the algorithm. Detailed performance numbers for all phases of the algorithm are presented and compared to a highly optimized implementation for a low power microprocessor based system. We also compare the performance/Watt for the FPGA and the microprocessor system. Finally, recommendations will be given on how improvements of the FPGA design would increase the performance of the double precision complex LU factorization on the FPGA based system.

Referência(s)