Piecewise Holistic Autotuning of Compiler and Runtime Parameters - Université de Versailles Saint-Quentin-en-Yvelines Access content directly
Conference Papers Year : 2016

Piecewise Holistic Autotuning of Compiler and Runtime Parameters

Abstract

Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08x after tuning. Tuning a single codelet is 13x cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.7% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11x speedup with a 200x cheaper exploration.
No file

Dates and versions

hal-01417211 , version 1 (15-12-2016)

Identifiers

Cite

Mihail Popov, Chadi Akel, William Jalby, Pablo de Oliveira Castro. Piecewise Holistic Autotuning of Compiler and Runtime Parameters. Euro-Par 2016 Parallel Processing - 22nd International Conference, Aug 2016, Grenoble, France. ⟨10.1007/978-3-319-43659-3_18⟩. ⟨hal-01417211⟩
51 View
0 Download

Altmetric

Share

Gmail Mastodon Facebook X LinkedIn More