PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction

Mihail Popov; Chadi Akel; Florent Conti; William Jalby; Pablo de Oliveira Castro

doi:10.1109/IPDPS.2015.19

Communication Dans Un Congrès Année : 2015

PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction

(1) , (1) , , (1) , (1, 2)

1
2

Mihail Popov

Fonction : Auteur

Université de Versailles Saint-Quentin-en-Yvelines

Chadi Akel

Fonction : Auteur

Université de Versailles Saint-Quentin-en-Yvelines

Florent Conti

Fonction : Auteur

William Jalby

Fonction : Auteur
PersonId : 964206

Université de Versailles Saint-Quentin-en-Yvelines

Pablo de Oliveira Castro

Fonction : Auteur
PersonId : 11170
IdHAL : pablooliveira
ORCID : 0000-0001-9007-6145
IdRef : 150785445

Université de Versailles Saint-Quentin-en-Yvelines

Laboratoire d'Informatique Parallélisme Réseaux Algorithmes Distribués

Résumé

Evaluating the strong scalability of OpenMP applications is a costly and time-consuming process. It traditionally requires executing the whole application multiple times with different number of threads. We propose the Parallel Codelet Extractor and REplayer (PCERE), a tool to reduce the cost of scalability evaluation. PCERE decomposes applications into small pieces called codelets: each codelet maps to an OpenMP parallel region and can be replayed as a standalone program. To accelerate scalability prediction, PCERE replays codelets while varying the number of threads. Prediction speedup comes from two key ideas. First, the number of invocations during replay can be significantly reduced. Invocations that have the same performance are grouped together and a single representative is replayed. Second, sequential parts of the programs do not need to be replayed for each different thread configuration. PCERE codelets can be captured once and replayed accurately on multiple architectures, enabling cross-architecture parallel performance prediction. We evaluate PCERE on a C version of the NAS 3.0 Parallel Benchmarks (NPB). We achieve an average speed-up of 25 times on evaluating OpenMP applications scalability with an average error of 4.9\% (median error of 1.7\%).

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Système d'exploitation [cs.OS]

Pablo De Oliveira Castro : Connectez-vous pour contacter le contributeur

https://hal.uvsq.fr/hal-01417304

Soumis le : jeudi 15 décembre 2016-15:09:08

Dernière modification le : jeudi 21 décembre 2023-11:56:04

Dates et versions

hal-01417304 , version 1 (15-12-2016)

Identifiants

HAL Id : hal-01417304 , version 1
DOI : 10.1109/IPDPS.2015.19

Citer

Mihail Popov, Chadi Akel, Florent Conti, William Jalby, Pablo de Oliveira Castro. PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction. 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015, Hyderabad, India. ⟨10.1109/IPDPS.2015.19⟩. ⟨hal-01417304⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UVSQ UNIV-PARIS-SACLAY LI-PARAD GS-ENGINEERING GS-COMPUTER-SCIENCE

78 Consultations

0 Téléchargements

PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager