Artigo Revisado por pares

Performance tuning of N-body codes on modern microprocessors: I. Direct integration with a hermite scheme on x86_64 architecture

2006; Elsevier BV; Volume: 12; Issue: 3 Linguagem: Inglês

10.1016/j.newast.2006.07.007

ISSN

1384-1092

Autores

Keigo Nitadori, Junichiro Makino, Piet Hut,

Tópico(s)

Gaussian Processes and Bayesian Inference

Resumo

The main performance bottleneck of gravitational N-body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factors between 2 and 10, depending on the code and the processor on which the code is run. These speed-ups were obtained by writing highly fine-tuned code for x86_64 microprocessors. Any existing N-body code, running on these chips, can easily incorporate our assembly code programs. In the current paper, we present an outline of our overall approach, which we illustrate with one specific example: the use of a Hermite scheme for a direct N2 type integration on a single 2.0 GHz Athlon 64 processor, for which we obtain an effective performance of 4.05 Gflops, for double-precision accuracy. In subsequent papers, we will discuss other variations, including the combinations of N log N codes, single-precision implementations, and performance on other microprocessors.

Referência(s)
Altmetric
PlumX