|
From: | Michael Richter |
Subject: | [Tinycc-devel] Implementing a Global RegisterAllocator for TCC |
Date: | Sat, 6 Aug 2022 08:35:18 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
Did anyone take note of this diplome thesis? It in german, but I
guess there are some germanspekaing here in the list. Author
claims 30% speed up generated code...
Thesis is from 2014, "Implementing a Global RegisterAllocator for
TCC", Sebastian Falbesoner TU Wien (Vienna)
https://www.complang.tuwien.ac.at/Diplomarbeiten/falbesoner14.pdf
Michael Richter
Complete abstract:
Register allocation is a long-standing research topic of computer
science that has been studied extensively
over the last decades. Its goal is to map a theoretically
infinite number of program variables
onto a finite, small set of CPU registers during compilation.
Eventhough caches try to bridge the
gap between register and memory access time, keeping as many values in registers as long as possible is
crucial for good performance. Hence registerallocation is still considered to be one of the most
important compiler optimizations.
The present diploma
thesis describes the process of implementing a register
allocatorfor TCC, a small single-pass C
compiler written in C. While TCC is very fast (up to amagnitude faster than gcc -O0 for the x86 architecture), it produces code that is
quite inefficient – some naive kind of
register allocation is done only on the basis of statements. Our goal of implementing register allocation done
on the basis of whole functions, that is,global register allocation, should
hence result in a notable performance increase.
As prerequesite for register allocation in TCC, a proper IR (Intermediate Representation) needs to be generated in a first pass. On top of this internal representation, live variable analysis, register allocation and finally the code generation is then performed. We determine the variable live intervals with an extraordinarily simple approach that avoidsthe usual costly data-flow analysis techniques – it trades off the accuracy of the calculated intervals for higher execution speed. For the register allocation we use Linear Scan, a strategy that doesn’t abstract the problem to graph coloring, but rather takes a simple approach involving the linear traversal of variable live intervals. While linear scan is very fast, it is stated that the generated code is almost as efficient as with graph coloring, making it a popular choice for JIT compilers. As target machine, ARM, a classical RISC architecture, now widespread especially inmobile phones and in many other embedded systems, is chosen. Since data processing operands must not reside in memory in thisLoad/Store architecture, register allocation iseven more important. We determine the performance gain with various test applications, predominantly fromthe benchmark suiteMiBench. Additionally, we compare compile-time as well as run-timeperformance with the widespread compilergcc.The execution time speedup of code generated by our implementation is about 32% onaverage (compared to the original TCC), while the compile-time is increased only marginallyand is still an order of magnitude lower than forgccwithout any optimization. Conse-quently, our register allocator implementation is an attractive trade-off for dynamic codegeneration with TCC.
-- ______________________________________________________________ Aktuelle Lichtmaschinenkunst http://mir52.wordpress.com
[Prev in Thread] | Current Thread | [Next in Thread] |