Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns
2009; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-642-03770-2_10
ISSN1611-3349
AutoresMarc-André Hermanns, Markus Geimer, Bernd Mohr, Felix Wolf,
Tópico(s)Distributed systems and fault tolerance
ResumoWait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work, we have defined such patterns for mpi-2 one-sided communication, although still based on a trace-analysis scheme with limited scalability. Taking advantage of a new scalable trace-analysis approach based on a parallel replay, which was originally developed for mpi-1 point-to-point and collective communication, we show how wait states in one-sided communications can be detected in a more scalable fashion. We demonstrate the scalability of our method and its usefulness for the optimization cycle with applications running on up to 8,192 cores.
Referência(s)