tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Speed of development of a compiler.


From: Basile Starynkevitch
Subject: Re: [Tinycc-devel] Speed of development of a compiler.
Date: Tue, 24 Nov 2015 19:30:19 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.3.0

On 11/24/2015 05:09 PM, AlexandreFressange wrote:
Hello,

I saw the dates on the tcc page and wonder how much time it *realistically* 
take to create a compiler supporting one simple language like C (but not C) and 
two architectures (x86_64 and arm).

You should tell much more, and more precisely, what exactly is the language you want to code your compiler for, on what architectures and what operating systems (both host & target).


The optimizing part is obviously the biggest "issue" (<-> skills). I hack 
kernels and have a pretty good understanding of the optimizations out there and low level 
stuffs. As well as readings on the compiler optimization subject.

You should tell what kind of optimization you want. I blindly guess that you want performance similar to the code produced by gcc -O1. You should also tell a lot more about your skills (what programs have you written, what studies have you done, what programming languages and operating systems do you know very well, what forums are you participating in, ....). Remember that programming is hard, and everyone needs at least ten years to learn it. http://norvig.com/21-days.html


There isn't one answer to this question, really. I basically need your 
experience/opinion on this. From insiders.


It depends upon your skills, your objectives, and to a lesser extent the tools or languages you are using. But I imagine you'll need (assuming you have a small team of 3 to 5 persons working full time with you) several years (more than 5, less than 15) to get a C99 (nearly) compliant compiler able to produce, for x86-64/Linux (e.g. most PCs running some recent Debian distribution) and ARM/Linux (e.g. a RaspberryPi running some Debian), some object code about as efficient as GCC5 is producing with -O1.
This is still a guess, but a bit of an educated one.

Look for example into CompCert. http://compcert.inria.fr/compcert-C.html it is not free software, but the source code is available for academic usage; Xavier Leroy is probably the brightest computer scientist in activity that I have met (and worked with) in person. AFAIK he is working for 8 to 10 years on Compcert (and there also other bright people and top-class research scientists). Of course, he is also teaching and publishing papers, and advising PhD students.

Look also into TinyCC and http://nwcc.sourceforge.net/ ; they probably don't support all of C99. They surely are producing slower code than GCC does with -O1 (probably object code that is pathetically slower by more than a factor of 3x w.r.t `gcc -O1`) And both have been developed during several years (albeit by a single person initially).

I am working on MELT, see http://gcc-melt.org/ for more. It is mostly a simple domain specific language (and GPLv3 free software) to hack and customize GCC, in principle not a big deal. But I am working nearly full time (mostly alone, with some very minor outside contributions) on it since 2009. Look notably into the documentation available on http://gcc-melt.org/docum.html since I have hundreds of slides and many links there relevant to compilers. Please read some of them, they will be useful to you (in particular to explain that a compiler is not mostly parsing).


If you care about designing a programming language which can have some (compiled implementation providing an) ABI compatible with C and if you have (as I do) more fun in designing the language than in coding a compiler ex-nihilo for it, do yourself a favor, base your work on existing compilers. You could generate C code (many compilers are doing that, see http://programmers.stackexchange.com/a/273895/40065 & http://programmers.stackexchange.com/a/257873/40065 etc...); you could use GCCJIT https://gcc.gnu.org/onlinedocs/jit/ or LLVM http://llvm.org/ or even some simpler JIT libraries like libjit...). You could use or generate Common Lisp & SBCL, see http://sbcl.org/; you could generate Java or JVM bytecode. You could generate Ocaml or D or Go code. But don't lose your time on low-level optimizations, but leverage your work on existing compilers or libraries doing that and focus on the programming language and the front-end (the backend being the existing tool: a C compiler if you generate C, an Ocaml compiler if you generate Ocaml, or GCCJIT or LLVM or libjit, etc...). IMHO generating C++ is generally not worth the effort (unless you have to).

But I guess that you want to make some C compiler for ARM & x86-64.

So some suggestions:

first, make your compiler a free software from day 0. Start with an empty github project today. (There is absolutely no market for any proprietary compiler; if I am wrong, you already have found the several millions of euros in venture capital to fund your project, and you won't ask here). At the very least, you'll be able to show your work, ask for help (e.g. specific technical questions on StackOverflow or other forums), and perhaps attract other contributor(s) and get some feedback by nice people testing your thing. Notice that there is no much proprietary compilers today (even Microsoft is opensourcing theirs)!

then, read entirely an ISO C standard (either C99 or C11) and some other reference manual about languages you like You'll be able to download their latest C99 or C11 or C++11 draft from the web (see wikipedia pages on C99 or C11).

Study some existing compilers, and notably their internal representations (IR). Read at least about Gimple & Tree in GCC, and about Clang and LLVM. Understand that IR is a hard point, and that optimization passes are mostly IR -> IR transformations. The bulk of the work of any compiler is not its parser (building some AST from source), or its code generator (emitting assmbler from an IR) but the optimization passes which are transforming some IR into another IR (very often, both source and destination IR of a given optimization pass are of the same type, and GCC has hundreds of such passes!)

Decide also in what programming language you'll code your compiler. This is a difficult decision. Some points.

I don't think that coding your compiler in manually written C is worthwhile. You won't do better than TinyCC or nwcc for several years. And you probably won't have much fun. But if you do, start by building a compiler infrastructure: you'll need an efficient memory manager, and that practically means a garbage collector (able to deal with all the circular references any compiler has to work with). you'll need nice dumping routines to print IR and any internal data. you might need some persistence machinery (maybe as simple as storing some IR in JSON format in sqlite). You could want to make a multithreaded compiler (there is none AFAIK, and I believe it is useful today, but to code any kind of multithreaded compiler you need to start from scratch.). So for the first year, work on the infrastructure, not on the compiler itself.

You could choose some higher-level language to code your compiler in. I've got some opinions and hints on that.

You could code in Scheme, or Javascript, or Common Lisp or some other dynamically typed language (avoid Python or Perl, it is probably too slow). The dynamic typing, the garbage collecting, is a huge plus. You might perhaps choose some implementation which generates C code or which is written in C. For example, if using Scheme, consider Bigloo or Chicken. Both are generating C code, and that generated C code is a very good test for your own compiler (this is one of the interest of bootstrapping compilers, and it is a very significant one).

You could code in Ocaml or in Haskell or some other statically typed functional language with type inference. The type inference machinery would help finding simple bugs (but not hard ones). The functional aspect (which Javascript, Scheme & Lisp also have) is essential: you'll use functional values to code future computations (read more about continuation & continuation passing style, start with wikipedia). The garbage collection is a must.

You could design your own domain specific language or DSL (exactly like I did for MELT). If you want to code a C compiler, I strongly invite you to think that way. Notice that GCC itself has about a dozen of specialized C (or C++) code generators which are generating parts of the compiler, and you might look at that as saying that GCC has a dozen of DSLs inside it (even if most of GCC code is sadly C++). You might even design yourDSL and implement a yourDSL->C translator (that takes at least one year but it is fun, notably if you start from scratch). Then the generated C code will be a very good testcase for your C compiler.


If you have not read them, I recommend reading several books.

SICP https://mitpress.mit.edu/sicp/ is an absolute must; if you only read one book, read this one

Concepts, Techniques and Models of Computer Programming https://www.info.ucl.ac.be/~pvr/book.html

Lisp In Small Pieces https://pages.lip6.fr/Christian.Queinnec/WWW/LiSP.html ; if you read French, read the latest french version from ParaCampus editor

Programming Language Pragmatics http://www.cs.rochester.edu/~scott/pragmatics/


Artificial Beings: The Conscience of a Conscious Machine http://eu.wiley.com/WileyCDA/WileyTitle/productCd-1848211015.html ; this book by J.Pitrat is apparently far from compilation, but it thought provoking and much more relevant to programming languages design that the title is suggesting. Read also his blog on http://bootstrappingartificialintelligence.fr/WordPress3/


Hope this help. I'm waiting to read more about your skills and your languages and your efforts, and other opinions on the subject.

BTW, if you are young enough, find some PhD where your goals could fit.

Cheers

--
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***




reply via email to

[Prev in Thread] Current Thread [Next in Thread]