Artigo Revisado por pares

Fine-grained parallelization of lattice QCD kernel routine on GPUs

2008; Elsevier BV; Volume: 68; Issue: 10 Linguagem: Inglês

10.1016/j.jpdc.2008.06.009

ISSN

1096-0848

Autores

Khaled Z. Ibrahim, François Bodin, O. Pène,

Tópico(s)

Particle physics theoretical and experimental studies

Resumo

Simulation time for the classical problem of Lattice Quantum Chromodynamics (Lattice QCD) is dominated by one kernel routine responsible for computing the actions of a Dirac operator. This paper describes an experience in parallelizing this kernel routine. We explore parallelization granularities for this kernel routine on Graphical Processing Units (GPUs). We show that fine-grained parallelism can outperform coarse-grained parallelization, given that control-flow and communication effects are minimized. We propose two techniques for transforming control-flow-based code to control-free code. We also show how to reduce the communication effect by optimizing for commonly used sequences of calls to this routine. In our implementation on NVIDIA 8800 GTX, we were able to achieve an 8.3x speedup over an SSE2 optimized version on 2.8 GHz Intel Xeon CPU.

Referência(s)