Article Dans Une Revue Future Generation Computer Systems Année : 2023

Asynchronous multi-phase task-based applications: Employing different nodes to design better distributions

Résumé

HPC infrastructures often present intra-node (multi-core CPUs and multiple GPUs) and system-level heterogeneity (different nodes arranged into partitions). HPC applications with several phases, each with distinct resource necessities, can exploit the taskbased programming paradigm to overlap phases and take advantage of inter-node heterogeneity to improve performance, provided their workload is correctly distributed. We study two applications with these characteristics, a machine learning framework for geostatistics data, ExaGeoStat, and a multivariate data analysis library, Diodon. We show how to both (1) organize the application to improve runtime and scheduling decisions that impact the asynchronous phases overlap with performance gains between 31% and 46% in ExaGeoStat and 29% to 40% in Diodon when running on homogeneous nodes; and (2) create a distribution per phase over heterogeneous nodes considering overlap and reducing redistribution overhead, improving performance up to 69% in ExaGeoStat and 73% in Diodon compared to a block-cyclic distribution, thereby taming the diversity in supercomputers.

Fichier principal
Vignette du fichier
FGCS.pdf (5.93 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04695275 , version 1 (12-09-2024)

Licence

Identifiants

Citer

Lucas Leandro Nesi, Arnaud Legrand, Lucas Mello Schnorr. Asynchronous multi-phase task-based applications: Employing different nodes to design better distributions. Future Generation Computer Systems, 2023, 147, pp.119-135. ⟨10.1016/j.future.2023.05.005⟩. ⟨hal-04695275⟩
56 Consultations
29 Téléchargements

Altmetric

Partager

More