tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Tinycc-devel] Implementing a Global RegisterAllocator for TCC


From: Michael Richter
Subject: [Tinycc-devel] Implementing a Global RegisterAllocator for TCC
Date: Sat, 6 Aug 2022 08:35:18 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Did anyone take note of this diplome thesis? It in german, but I guess there are some germanspekaing here in the list. Author claims 30% speed up generated code...

Thesis is from 2014, "Implementing a Global RegisterAllocator for TCC", Sebastian Falbesoner TU Wien (Vienna)

https://www.complang.tuwien.ac.at/Diplomarbeiten/falbesoner14.pdf

Michael Richter


Complete abstract:


Register allocation is a long-standing research topic of computer science that has been studied extensively over the last decades. Its goal is to map a theoretically infinite number of program variables onto a finite, small set of CPU registers during compilation. Eventhough caches try to bridge the gap between register and memory access time, keeping as many values in registers as long as possible is crucial for good performance. Hence registerallocation is still considered to be one of the most important compiler optimizations.

The present diploma thesis describes the process of implementing a register allocatorfor TCC, a small single-pass C compiler written in C. While TCC is very fast (up to amagnitude faster than gcc -O0 for the x86 architecture), it produces code that is quite inefficient – some naive kind of register allocation is done only on the basis of statements. Our goal of implementing register allocation done on the basis of whole functions, that is,global register allocation, should hence result in a notable performance increase.

As prerequesite for register allocation in TCC, a proper IR (Intermediate Representation) needs to be generated in a first pass. On top of this internal representation, live variable analysis, register allocation and finally the code generation is then performed. We determine the variable live intervals with an extraordinarily simple approach that avoidsthe usual costly data-flow analysis techniques – it trades off the accuracy of the calculated intervals for higher execution speed. For the register allocation we use Linear Scan, a strategy that doesn’t abstract the problem to graph coloring, but rather takes a simple approach involving the linear traversal of variable live intervals. While linear scan is very fast, it is stated that the generated code is almost as efficient as with graph coloring, making it a popular choice for JIT compilers. As target machine, ARM, a classical RISC architecture, now widespread especially inmobile phones and in many other embedded systems, is chosen. Since data processing operands must not reside in memory in thisLoad/Store architecture, register allocation iseven more important. We determine the performance gain with various test applications, predominantly fromthe benchmark suiteMiBench. Additionally, we compare compile-time as well as run-timeperformance with the widespread compilergcc.The execution time speedup of code generated by our implementation is about 32% onaverage (compared to the original TCC), while the compile-time is increased only marginallyand is still an order of magnitude lower than forgccwithout any optimization. Conse-quently, our register allocator implementation is an attractive trade-off for dynamic codegeneration with TCC.


--
______________________________________________________________
Aktuelle Lichtmaschinenkunst

http://mir52.wordpress.com

reply via email to

[Prev in Thread] Current Thread [Next in Thread]