Piecewise Holistic Autotuning of Compiler and Runtime Parameters, Euro-Par 2016 Parallel Processing -22nd International Conference, pp.238-250, 2016. ,
DOI : 10.1145/1755888.1755903
URL : https://hal.archives-ouvertes.fr/hal-01417211
Llvm: A compilation framework for lifelong program analysis & transformation In: Code Generation and Optimization, CGO 2004. International Symposium on, pp.75-86, 2004. ,
Iterative compilation in program optimization, Proc. CPC10 (Compilers for Parallel Computers, pp.35-44, 2000. ,
Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of SPEC OMP applications on intel architectures, 2011 International Conference on High Performance Computing & Simulation, pp.273-279, 2011. ,
DOI : 10.1109/HPCSim.2011.5999834
URL : https://hal.archives-ouvertes.fr/inria-00636845
Adagio, Proceedings of the 23rd international conference on Conference on Supercomputing, ICS '09, pp.460-469, 2009. ,
DOI : 10.1145/1542275.1542340
Compiler optimization-space exploration, International Symposium on Code Generation and Optimization, 2003. CGO 2003., pp.204-215, 2003. ,
DOI : 10.1109/CGO.2003.1191546
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.1622
Acovea: Analysis of compiler options via evolutionary algorithm, 2007. ,
Optimizing for reduced code space using genetic algorithms, ACM SIGPLAN Notices, vol.34, issue.7, pp.1-9, 1999. ,
DOI : 10.1145/315253.314414
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.1586
Cole, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.165-174, 2008. ,
DOI : 10.1145/1356058.1356080
Adaptive sampling for performance characterization of application kernels, Concurrency and Computation: Practice and Experience, vol.1, issue.1, pp.2345-2362, 2013. ,
DOI : 10.1109/SC.2010.2
URL : https://hal.archives-ouvertes.fr/hal-00952288
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, International Journal of Parallel Programming, vol.16, issue.2?3, pp.296-327, 2011. ,
DOI : 10.1088/1742-6596/16/1/071
URL : https://hal.archives-ouvertes.fr/hal-00685276
Cere: Llvm-based codelet extractor and replayer for piecewise benchmarking and optimization, ACM Transactions on Architecture and Code Optimization (TACO), vol.12, issue.1 6, 2015. ,
PCERE: Fine-Grained Parallel Benchmark Decomposition for Scalability Prediction, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.1151-1160, 2015. ,
DOI : 10.1109/IPDPS.2015.19
URL : https://hal.archives-ouvertes.fr/hal-01417304
A comparison of trace-sampling techniques for multi-megabyte caches, IEEE Transactions on Computers, vol.43, issue.6, pp.664-675, 1994. ,
DOI : 10.1109/12.286300
Finding groups in data: an introduction to cluster analysis, 2009. ,
DOI : 10.1002/9780470316801
Using Machine Learning to Focus Iterative Optimization, International Symposium on Code Generation and Optimization (CGO'06), pp.295-305, 2006. ,
DOI : 10.1109/CGO.2006.37
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.112.2976
CQA: A code quality analyzer tool at binary level, 2014 21st International Conference on High Performance Computing (HiPC), pp.1-10, 2014. ,
DOI : 10.1109/HiPC.2014.7116904
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
URL : http://arxiv.org/abs/1004.4431
Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963. ,
DOI : 10.1007/BF02289263
Who belongs in the family?, Psychometrika, vol.18, issue.4, pp.267-276, 1953. ,
DOI : 10.1007/BF02289263
Fine-grained Benchmark Subsetting for System Selection, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, p.132, 2014. ,
DOI : 10.1145/2581122.2544144
URL : https://hal.archives-ouvertes.fr/hal-00952256
Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI'10), 2010. ,
DOI : 10.1145/1809028.1806647
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.4481
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes, IEEE Transactions on Parallel and Distributed Systems, vol.19, issue.10, pp.1396-1410, 2008. ,
DOI : 10.1109/TPDS.2007.70804
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.6395
The NAS parallel benchmarks---summary and preliminary results, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.158-165, 1991. ,
DOI : 10.1145/125826.125925
NAS 3.0 C OpenMP. http://benchmark-subsetting.github.io/cNPB 27. Baysal, E.: Reverse time migration, Geophysics, vol.48, issue.11, p.1514, 1983. ,
Clusteringbased selection for the exploration of compiler optimization sequences, ACM Transactions on Architecture and Code Optimization (TACO), vol.13, issue.8, 2016. ,
COBAYN, ACM Transactions on Architecture and Code Optimization, vol.13, issue.2, 2016. ,
DOI : 10.1002/j.1556-6678.2002.tb00167.x
In search of near-optimal optimization phase orderings, ACM SIGPLAN Notices, vol.41, issue.7, pp.83-92, 2006. ,
DOI : 10.1145/1159974.1134663
ACME, ACM SIGPLAN Notices, vol.40, issue.7, pp.69-77, 2005. ,
DOI : 10.1145/1070891.1065921
Finding good optimization sequences covering program space, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.56, 2013. ,
DOI : 10.1145/2400682.2400715
Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., pp.2-12, 2005. ,
DOI : 10.1109/IISWC.2005.1525996
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.217.3636
Basic block distribution analysis to find periodic behavior and simulation points in applications, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pp.3-14, 2001. ,
DOI : 10.1109/PACT.2001.953283
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7813
BarrierPoint: Sampled simulation of multi-threaded applications, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014. ,
DOI : 10.1109/ISPASS.2014.6844456
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations, pp.34-53, 2007. ,
DOI : 10.1109/SC.1998.10004
URL : https://hal.archives-ouvertes.fr/inria-00084110
Improving both the performance benefits and speed of optimization phase sequence searches, ACM SIGPLAN Notices, vol.45, issue.4, pp.95-104, 2010. ,
DOI : 10.1145/1755951.1755903
Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization, Languages and Compilers for Parallel Computing, pp.308-322, 2010. ,
DOI : 10.1007/978-3-642-13374-9_21
Is Source-Code Isolation Viable for Performance Characterization?, 2013 42nd International Conference on Parallel Processing, 2013. ,
DOI : 10.1109/ICPP.2013.116
URL : https://hal.archives-ouvertes.fr/hal-00952290