Piecewise Holistic Autotuning of Parallel Programs with CERE

Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve best performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring long iterative evaluations. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program.Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate the CERE tuning of compiler optimizations, number of hreads, thread affinity, and scheduling policy on both NUMA and heterogeneous architectures. Over the NAS benchmarks, we achieve an average speedup of1.08x after tuning. Tuning a codelet is 13x cheaper than whole-program evaluation and predicts the tuning impact with a 94.7% accuracy. Similarly, exploring thread configurations and scheduling policies for a Black-Scholes solver on an heterogeneous big.LITTLE architecture is over 40x faster using CERE.

Domaines

Informatique [cs]

Fichier principal

2017_CERE_tuning_Concurrency_and_Computation__Practice_and_Experience (1).pdf (847.97 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mihail Popov : Connectez-vous pour contacter le contributeur

https://hal.uvsq.fr/hal-01542912

Soumis le : mardi 28 novembre 2017-18:19:27

Dernière modification le : jeudi 21 décembre 2023-11:56:04

Dates et versions

hal-01542912 , version 1 (27-06-2017)

hal-01542912 , version 2 (28-11-2017)

Licence

Identifiants

HAL Id : hal-01542912 , version 2
DOI : 10.1002/cpe.4190

Citer

Mihail Popov, Chadi Akel, Yohan Chatelain, William Jalby, Pablo de Oliveira Castro. Piecewise Holistic Autotuning of Parallel Programs with CERE. Concurrency and Computation: Practice and Experience, 2017, 29 (15), pp.e4190. ⟨10.1002/cpe.4190⟩. ⟨hal-01542912v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UVSQ UNIV-PARIS-SACLAY LI-PARAD GS-ENGINEERING GS-COMPUTER-SCIENCE

160 Consultations

419 Téléchargements