Remote OpenMP Offloading
2022; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-031-07312-0_16
ISSN1611-3349
AutoresAtmn Patel, Johannes Doerfert,
Tópico(s)Advanced Data Storage Technologies
ResumoOpenMP has a long and successful history in parallel programming for CPUs. Since the introduction of accelerator offloading, it has evolved into a promising candidate for all intra-node parallel computing needs. While this addition broke with the shared memory assumption OpenMP was initially developed with, efforts to employ OpenMP beyond shared-memory domains are practically non-existent. In this work, we show that the OpenMP accelerator offloading model is sufficient to seamlessly and efficiently utilize more than a single compute node and its connected accelerators. Without source code or compiler modifications, we run an OpenMP offload capable program on a remote CPU, or remote accelerator (e.g., GPU), as if it was a local one. For applications that support multi-device offloading, any combination of local and remote CPUs and accelerators can be utilized simultaneously, fully transparent to the user. Our low-overhead implementation of Remote OpenMP Offloading is integrated into the LLVM/OpenMP compiler infrastructure and publicly available (in parts) with LLVM 12 and later. LLVM-based (vendor) compilers are expected to be compatible as well. To evaluate our work, we provide detailed studies on microbenchmarks, as well as scaling results on two HPC proxy applications. We show scaling results across dozens of GPUs in multiple hosts with effectiveness that is directly proportional to the ratio of computation versus memory transfer time. Our work outlines the capabilities and limits of OpenMP 5.1 to efficiently utilize a distributed heterogeneous system without source, compiler, or language modifications, as opposed to solutions such as MPI.
Referência(s)