dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DotGNU]Melody


From: Norbert Bollow
Subject: [DotGNU]Melody
Date: Mon, 6 Aug 2001 13:28:45 +0200

I'm still trying to digest this... do I understand this correcty
that instead of making gcc compile to Java bytecode or the
Intermediate Language defined in Microsoft's specs, we should
instead launch a low-urgency DotGNU project to 

1. Invent a new intermediate language (probably incorporating
   the "bytecode" ideas that have been discussed on the arch list)
   that can be called Melody.

2. Expand GCC so that it can compile from all its supported
   languages to Melody-based Portable Executables.

3. Make the DotGNU platform support Melody-based Portable
   Executables as a third portable executable format in
   addition to the two portable executable formats (those based
   on Java bytecode and IL) that we already want to support.

?

Greetings, Norbert.


Marco Manfredini <address@hidden> wrote:

> Hello, my name is Marco Manfredini and some of you probably remember my 
> from #dotgnu as marquise.
> 
> I'm reading the discussions regarding the dostgnu-IL support with great 
> interest and would like to make a couple of annotations from here.
> 
> I was looking on the gcc->bytecode problem, after I read the call for 
> volunteers and checked out this document:
> http://cobolforgcc.sourceforge.net/cobol_toc.html
> which has a intriguing chapter 14 that deals with the undocumented 
> internals of gcc, mainly the AST->RTL part.
> 
> Was I read was disillusioning. Not only that the author clearly shows 
> the coding problems that are raisen by the current coding practice in 
> gcc, he also made clear that neither RTL or the "tree" ADT are powerful 
> enough to preserve the semantics (i.e. signatures, expression logic) of 
> the source code in a reasonable manner.
> 
> MS-IL/Metadata works AFAIK on the premise, that you can reconstruct the 
> signatures of the "exported" entities (functions, classes). This is 
> required for different reasons. One is, that you can #import an 
> assembly, for reuse it in you code, another is for bytecode-validation 
> (i.e. security). I can only suppose that the type-information problem 
> can be solved by interpretation of debugging information (given the 
> "RTL-wants-registers" problem can be solved), which looks like a kludge.
> 
> Compiling the gcc languages to IL will raise more issues. For example, 
> it will probably turn out that the ulgy "managed" stuff (including the 
> __gc et.al. keyworkd) has to be supported with gcc, because data under 
> the control of the runtime cannot be handled so frankly as C/C++ allows it.
> 
> But what I want to say is, that we should not stick on the byte-code 
> idea for the dotgnu-IL because of these reasons:
> 
> * Sure, the dotgnu-IL would support signatures of the external visible 
> entities, but a byte-code representation would loose the expression 
> semantics. This becomes clear if we consider a JIT which tries to 
> generate optimal code for a platform. The usual approach is to 
> reconstruct the expression tree from the byte code, because the 
> expression tree gives a better overview about what happens and allows 
> global optimization and inlining. So why should we remove information 
> that has to be reconstructed later?
> 
> * A byte-code representation is too low-level which makes it less 
> expressive. It deals with "values", but no more with 
> "types","functions","scopes". This can make things very hard, for 
> example predicting which variables are active and alive for a given 
> byte-code line, and which associated cleanup code they have. This 
> information would be neccesary to implement dtor-cleanup, and had to be 
> included in the generated code as "Meta-Information". On the other hand, 
> information about the currently active values and their types could be 
> also used to implement an efficient garbage collector scheme, but once 
> we defined the IL-Format we are more or less stuck to the amount of 
> information we put into it.
> 
> * I dont't say that byte-code cannot do what I want, but I am sure that 
> interesting applications (instantiation of generic types and function, 
> global code optimization and inlining, aspect-oriented programming) are 
> made unneccesarily hard.
> 
> I agree that we should invent before they patent and want to suggest 
> that we should follow, what I can recognize from the previous posts on 
> this list: The idea to use a semantical rich tree structure as a IL, 
> which preserves as many informations about the code as possible (without 
> being to language specific) and that we should give it a stupid name. 
> I'd propose "melody", since this project offers a lot obvious puns on 
> the background of C#.
> 
> I am currently developing ideas on how melody could look like, and try 
> to gather information about similar ideas (i.e. anything that's like 
> "compressed binary AST" as IL). Less for inspritation, but for the our 
> invention claims.
> 
> Currently my melody boils down to the idea, that we'd have constructor 
> expressions which build the semantitic entities we wish to represent. To 
> give you an example. This C-function:
> 
> bool even(unsigned num)
> {
>   unsigned k=num%2;
>   return k==0;
> }
> 
> would look in a human-readable version of melody as
> 
> 001 function $even($num: uint32):boolean
> 002 {
> 003  let($k:uint32 ; modulo<uint32>($num, 2);)
> 004  {
> 005  (equal<uint32>($k,0))
> 006  }
> 007 }
> 
> The text-version looks like paraphrase of the original C function, which 
> doesn't mean that I cheat, but illustrates how much I would like to 
> preserve.
> 
> Just a short description on happes here:
> 
> First, the operations which do "stuff" look like this:
> 
> op<type>(parameters).
> 
> You see for example modulo<uint32>($num,2). This means "calculate modulo 
> with two parameter which are uint32". The compiler found, that $num is 
> unsigned and compiles the right function. This is obvious if you 
> remember that CPU usually has instructions like add.l addu.l addu.w, 
> i.e. the right op-code for the right operand.
> Btw. If we can maintain a readable textual representation for melody, 
> then it would also make a valuable debugging tool, since you can see the 
> hidden operations (conversions, temporaries etc.) that you compiler has 
> produced to compile you innocent looking code.
> 
> line 001 starts the constructor for a function which is named $even (to 
> distiguish the name from the "keywords"). The num parameter has become 
> an uint32, the C-translator decided that "unsigned int" should be a 
> 32-bit value. (Btw. In RTL or gcc-tree the analogous expression wouldn't 
> mention anymore, that the parameter is an /unsigned/ value. It only 
> contains the information that it is a 32-Bit value, but we need this 
> information to extract the signature.).
> 
> Now line 3 is interesting, because it shows how locals are introduced. 
> I'd propose to use the let-syntax which you might be familiar with, if 
> you are in functional programming. If not: My idea is, that every 
> statement represents a value. For example the above "function" 
> represents a ..eh..function. The let statement again, is the value of 
> its expression with a certain name binding. In Scheme I can say:
> 
>       (let ((x 0)(y 0)) (+ x y)
> 
> 
> You can read this as: the value of this statement is x+y where x is 0 
> and y is 0.
> 
> The melody-let statement here says: the value is $k=0 where $k is $num 
> mod 2.
> 
> So the let contains a lot of information: It says, that $k is only 
> available during the inner expression (which tells us about its block 
> scope), it says that $k is initialized with '$num mod 2' (which is 
> different from assigment, if you are familiar with C++ you will 
> understand what I mean). Note that let has a third parameter (left 
> blank), which is /the optional cleanup expression/ which has to be 
> evaluated when the let-body is evaluated. This can be used by a compiler 
> to represent its destructor semantics! (Now, do you see the fine 
> difference between initialization and assigment? If the initialization 
> expression generates an exception, then we are /not/ calling the cleanup 
> expression, since there is nothing to cleanup!).
> 
> Well, this is just an example I made up to entertain you, I admit that 
> there a still lot of things to consider (data aligment, how do I define 
> classes & operations on them, avoiding machine dependencies etc.), but I 
> hope you get the picture. But here's another teaser for you. Imagine 
> what you can, if melody would allow you to define functors (that is 
> functions in the melody language). For example:
> 
> functor even(T)
> {
>   function ($num:T)
>   {
>     let($k:T ; modulo<T>($num, 2);)
>     {
>      (equal<T>($k,0))
>     }
>   }
> }
> 
> melodies value-types are functions, structs, expressions. It can operate 
> on a type like on a number. So a "functor" implements essectially 
> genericity, since he f.e. compute a new function by replacing the 
> type-parameter!! This means, you can essentially /compile templates into 
> melody code/ and instantiate these generics on demand! And I can inline 
> them as well, because I'm producing a new expression tree, not stupid 
> bytecode.
> 
> Well, I hope I'm not completely on the wrong track and that the idea is 
> useful for the project. Comment please!
> 
> -- 
> Marco
> bytecodes? we don't need no stinkin' bytecodes.
> 
> P.S.
> It should be possible to build a frontend to gcc which understands melody.

Greetings, Norbert.

-- 
A member of FreeDevelopers and the DotGNU Core Team   http://dotgnu.org
Norbert Bollow, Weidlistr.18, CH-8624 Gruet  (near Zurich, Switzerland)
Tel +41 1 972 20 59      Fax +41 1 972 20 69      http://thinkcoach.com
Your own domain with all your Mailman lists: $15/month http://cisto.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]