epsilon-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [epsilon-devel] Questions


From: Luca Saiu
Subject: Re: [epsilon-devel] Questions
Date: Tue, 31 Dec 2013 01:09:01 +0100
User-agent: Gnus (Ma Gnus v0.8), GNU Emacs 24.3.50.2, x86_64-unknown-linux-gnu

Hello.

On 2013-12-30 at 06:51, address@hidden wrote:

> You pretty much nail down epsilon1 in the tutorial
> (http://ageinghacker.net/blog/posts/17/#Top) and it's good. But as
> much as it's important to document epsilon1, it's important to also
> document epsilon0 with plain english.

You're right.  As I told you in private, I cannot realistically expect
random users to read the formal specification and understand everything
from there.  That said, in practice epsilon1 is more important to
understand than epsilon0 for most users; this is why I only dealt with
epsilon1 in the tutorial.  When documenting extension features I'll have
to speak about epsilon0 as well.  Documentation is important, and at
this time it's very lacking.

Now I'm thinking that for the first release I'll have some tutorial
documentation, but no reference documentation.  I'll include the post
you saw in the Texinfo manual, but I have to add something more as well,
including the epsilon0 part.  Luckily, epsilon0 is really really simple.

> From the discussion we had, I
> think you really want to go forward implementing new personalities on
> top of epsilon1 (or is it something else?),

In my case it will definitely be on epsilon1; but if somebody disagrees
with my choices in epsilon1, it's also possible to replace parts of it
and start from a lower level.  For example, some people might not want
to use s-expressions at all.

> it is normal and it is the
> way forward, but other peoples, just like me, will want to understand
> how epsilon is made, why it works, how it works... and not only
> through code.

You're right, but I suggest you anyway to read the code in core.e.  It
should be quite easy to understand.

> Sorry, again, I just would like to prevent any mis-interpretation of
> my attitude and stress the fact that I'm not patronizing you or
> something, I'm only documenting my work on epsilon (and asking for
> some help) trying to be useful and if you don't feel like answering or
> find something not interesting/useful/whatever just say so. Thanks :-)

Don't worry, I'm not taking offense and there's absolutely no reason why
I should.  I appreciate your interest and when you point out the lack of
documentation, well, you're right.  In the mean time, I can answer
questions.  I'm also happy that this conversation is happening in
public, so that the archive will be available later for others.

> Sadly, I'm pretty much new to language creation and not very literate
> in Lisp/Scheme/Guile so I may mis-use some terms. E.g. I know what is
> a lisp/scheme macro, but I can't write one myself.

Ok, here comes a lightning-fast informal introduction.  Please tell me
if it's understandable; I'm particularly interested because I'll have to
write something similar in the documentation.

Standard Scheme macros work in a different way, but I think that the
macro used by Common Lisp, Emacs Lisp and epsilon1, and Guile's
define-macro (all essentially identical to one another) are very simple
to understand.

I think you already know quasiquoting.  Let's do it in Guile:

--8<---------------cut here---------------start------------->8---
;; This is quoting
'(this is a literal sexpression)
=> (this is a literal sexpression)
(define x 42)
;; This is quasiquoting.
;; ` is pronounced "quasiquote"
;; , is pronounced "unquote"
`(this would be a literal sexpression except for ,x which is not)
=> (this would be a literal sexpression except for 42 which is not)
`(this would be a literal sexpression except for ,(+ x 3) which is not)
=> (this would be a literal sexpression except for 45 which is not)
(define y '(a b c))
;; ,@ is pronounced "unquote splicing"
`(this would be a literal sexpression except for ,@y which is not)
=> (this would be a literal sexpression except for a b c which is not)
--8<---------------cut here---------------end--------------->8---

A macro is (informally) a function taking some sexpressions and
returning an sexpression, which should represent valid syntax, including
more macro uses.  Macroexpansion takes place before evaluation and macro
parameters are not evaluated at macroexpansion time, but that's
essentially it: in the macro body you can use the full language.
Quasiquoting tends to be particularly useful in macros, so you use it
in most macros.

Let's make a silly macro called square, which takes an expression and
returns the expression multiplied by itself, so that the given
expression occurs twice in the expansion (often a bad idea in practice,
but I like the example):

--8<---------------cut here---------------start------------->8---
(define-macro (square e)
  `(* ,e ,e)) ;; I've built something with two copies of e
(square 4)
===> (* 4 4) => 16
;; Let's check how macroexpansion works:
(macroexpand '(square 4)) ;; print the macroexpansion, but does
                          ;; not interpret the resulting expression
=> (* 4 4)
;; Let's try the macro with something having side effects:
(square (begin
          (display "foo\n")
          4))
|- foo
|- foo ;; it's been printed twice!
=> 16
;; It's not surprising if you see what the macro build:
(macroexpand '(square (begin
                        (display "foo\n")
                        4)))
=>
(* (begin (display "foo\n") 4) (begin (display "foo\n") 4))
--8<---------------cut here---------------end--------------->8---

This macro was dumb: it just inserted copies of the parameter into a
template.  But you can also do complicated computations instead of just
using ` and , with just a variable name.

> On 2013-12-30 02:56, Luca Saiu wrote:
>> On 2013-12-29 at 23:02, address@hidden wrote:
>
>>> Where buffer:make is defined in
>>> the final implementation of epsilon?
>>
>> The real definition of buffer:make is in C; it could be an assembly
>> wrapper instead, but however still out of epsilon0.  This is what a
>> primitive is: just like a procedure, but implemented out of epsilon
>> at a
>> lower level.
>
> So, if I understand correctly primitives are the primary forms?
> functions? operations? datums? provided by epsilon0 (language) upon
> which epsilon1 is built.

Primitives are just "procedures" implemented at a lower level; they
cannot do anything a procedure couldn't do: for example they can't
performs "jumps" to and from the user code like call/cc or a Python
generator.

epsilon0 and epsilon1 are call-by-value left-to-right languages, so
their way of evaluating calls are particularly simple: at call time you
always evaluate all parameters in order, from the left to the right,
obtaining results: those results are what the procedure or primitive
receives.

By contrast other entities which are not primitives or procedures can
have other evaluation rules: they might never evaluate a parameter, or
evaluate some multiple times.

e0:if-in is an example of a language form; of course it cannot be a
procedure nor a primitive, since it does not evaluate all its
parameters: (e0:if-in (fixnum:+ 2 4) (6) 345 (die-horribly))
=> 345

fixnum:+ is an example of a primitive, implemented in C for efficiency.

fixnum:** is an example of a procedure (for exponentiation), implemented
in epsilon0 using some primitives.

Our square macro, defined above in Guile, can also be seen as another
(user-defined) language form: indeed it also violates the evaluation
rules for procedures or primitives.

> As of current bzr repository revision 239, epislon bootstraped from
> Guile (http://bzr.savannah.gnu.org/lh/epsilon/trunk/revision/239),
> they are defined in four places:
>
> - runtime/c-primitives.c
> - bootstrap/scheme/epsilon0-in-scheme.scm
> - bootstrap/scheme/unexec.e
> - bootstrap/scheme/core.e
>
> Primitives written in C are the following
> - whatever:zero
> [...]
> - e0:eval-in-c

Correct.

By the way, notice that I also defined an epsilon0 interpreter in
epsilon0 itself, called e0:eval, as a procedure (mostly for
demonstration and documentation purposes; you might find it useful to
read).  It works just like e0:eval-in-c, even if it's slower.

> Primitives written in Guile in epsilon0-in-scheme.scm are the
> following
> (http://bzr.savannah.gnu.org/lh/epsilon/trunk/annotate/head:/bootstrap/scheme/epsilon0-in-scheme.scm):
>
> - e0:value
> - e0:the-empty-list
> - e0:unspecified
> - e0:if-in
> - e0:bundle
> - e0:let
> - e0:primitive
> - e0:call
> - e0:call-indirect
> - e0:join

All of these are *not* primitives, as you will have understood at this
point; they are all language forms.  In fact, they are *all* the
language forms in epsilon0.

> Things that I think are primitives in unexec.e
> (http://bzr.savannah.gnu.org/lh/epsilon/trunk/annotate/head:/bootstrap/scheme/unexec.e):
>
> - unexec:exec

Yes, I implemented it in C as an optimization, and to make execing
easier at startup.  Again, that's implemented both as a procedure and as
a primitive: the epsilon0 implementation as a procedure might be easier
to read than the C version as a primitive.

> Some primitives seem to be defined only in core.e which I think is
> epsilon in epsilon itself
> (http://bzr.savannah.gnu.org/lh/epsilon/trunk/annotate/head:/bootstrap/scheme/core.e#L33):
>
> - e0:fork

e0:fork is a language form.

> - e0:fork*

e0:fork* is a procedure returning an e0:fork expression as a data
structure.  Look for "procedural constructors" in core.e.  By the way,
I'm considering changing the suffix from "*" to "%".

The idea is that sometimes, when doing in meta-programming (or if for
some crazy reason you want to perform procedure definition via
"surgery", as we discussed yesterday), you can *build* expressions as
data structures.  "procedural constructors" let you do that.  Procedural
constructors are implemented as ordinary procedures: even epsilon0
expressions are also made of the same stuff as other data: fixnums and
pointers to buffers.

> - future:fork-procedure
> - future:asynchronously-call-closure

These are just helpers procedures used in e1:future, which is a nicer
way of making threads in epsilon1.  It's defined late because it depends
on closures.

> Chapter 2: The core language of epsilon0 > Section 2.2: Syntax
>
> - let
> - call
> - primitive
> - if
> - fork
> - join
> - bundle

call-indirect is introduced in Chapter 5 because it's technically just
an optimization; but it's important.  However yes, those seven plus
call-indirect make up all epsilon0 forms.

> Chapter 3: Reflection and self-modification > Section 3.2: Global
> definitions
>
> - state:global-set!
> - state:global-get
> - state:global-names
>
> - state:procedure-set!
> - state:procedure-names
> - state:procedure-get-formals
> - state:procedure-get-body

Funnily enough, these are all procedures.  Using my choice of primitives
(which is very low-level: I use buffer:get and buffer:set! a lot) I can
define all of those operations, which look up or alter global
definitions, as ordinary procedures.  Which is cute.

> Chapter 3: Reflection and self-modification > Section 3.3: unexec
>
> - unexec

Unexec is also implemented as a primitive, purely for efficiency reasons
-- if it were just available as a procedure, everything would still work
fine.  The need for this optimization will lessen when the gets more
efficient.

> Request: Can you add, strike with a comment things that are not
> primitives and comment if you find it important (maybe it's documented
> I did not pay attention yet). Can you mark anything not essential, I
> mean not part of epsilon0, as such.

I think I've done it, but I've not explained what all these things are
for; in some cases it's obvious, but in other it's probably not.  Ask
me, and I'll tell you.

> Questions:
>
> 0) Do you agree to call epsilon a language (creation?) framework?

Yes, I agree.

> I) Why some of the above primitives are not discussed in the thesis?

Primitives themselves are quite interesting.  Opening files, for
example, in practice has to be done via a primitive, but that's nothing
new.  I've discussed all language forms and essentially all the other
important linguistic mechanisms such as global definition access, the
extension mechanisms and all the extensions which existed at the time.

Some features such as pattern matching and promises are not discussed in
the document simply because I implemented them after submitting it; some
such as closure conversion are discussed very quickly, just because I
implemented before the submission but still too late to be able to
explain them in every detail.

> II) In the thesis, you say that compilation works like unexec, can you
> expand a bit?

Think of unexecing an expression in a certain state:

--8<---------------cut here---------------start------------->8---
(e1:define (fibo n)
  (e1:if (fixnum:< n 2)
    n
    (fixnum:+ (fibo (fixnum:- n 2))
              (fibo (fixnum:- n 1)))))
(e1:unexec "/tmp/image"
  (fio:write (i (fibo 10)) "\n"))
--8<---------------cut here---------------end--------------->8---

When execed, the image /tmp/image will compute (fibo 10) and print the
result.  The computation and printing is *not* done at unexec time, but
later, after you load the same image.

So the data saved into the image will have to contain *at least*:
* the procedures fibo and fio:write
* the procedures needed by fibo and fio:write
* the procedures needed by the procedures needed by fibo and fio:write
* the procedures needed by the procedures needed by the procedures
  needed by ...
* all the globals referenced by the procedure above.

This is idea of searching everything referred by something which has to
be included to include that as well is called *closure* -- in the
mathematics sense, not in the programming sense; think of the phrase
"the function successor is closed over the naturals".

So unexecing might work by closing the expression which is being dumped,
and dumping to a file everything needed, both data and code; in epsilon0
it's all data: procedure bodies are expressions, stored as part of
symbols.

(in actuality unexec is a little dumber, but the compiler, where
efficiency matters, really works like this).

Now, I've only spoken about dumping procedure bodies as epsilon0
expression data structures.  This is not really compiling.  But well,
when compiling a static program (which is to say, it doesn't modify
itself at run time, as per Chapter 3), you can translate each expression
into assembly code, and dump non-expressions as ordinary data.  Notice
that the compiler can be very simple, since expressions stored as
procedure bodies are always pure epsilon0, already macroexpanded and
transformed to the point of having rewritten away any extension.

As long as the program is static, this assembly code generated at
compile time will suffice.  Compiling non-static programs is more
difficult as it involves invalidating and replacing compiled code, and
possibly compiling at runtime as well.  It can and will be done (not
implemented yet), but it is a little more expensive and complicated.

Currently the compiler has a really, really bad interface; but it should
and will have the exact same interface as unexec, adding only some
parameters specifying compilation parameters (safety, staticity,
target architecture, possibly the optimization level).

> III) It seems to me, that all the above should start with e0:, isn't it?

I don't really understand this question.  Well, when looking at a global
state, as I was saying, all expressions have already been rewritten into
pure epsilon0, so things are simple.

> IV) When an epsilon program is unexec, it's sealed somehow, it is
> described in the thesis, what is required to be able run it (except
> the code required to unmarshal the dump)?

I told you that unexec was less smart than doing the closure operation:
in fact you read in Chapter 3 that unexecing can be performed in a very
simple way, by dumping a pair containing:
- the symbol table
- the main expression to be executed at exec time.

In fact unexec works just like that.  It's overkill, since the full
symbol table will contain *all* globals, which is potentially much more
than our closure -- but it's still correct.

So what do you do at exec time?  Easy: undump the file, obtaining the
pair.  The symbol table you got will become the new symbol table,
providing you all the globals and procedures you need.  Then you have
your expression to evaluate, which will use those globals and
procedures.  Feed the expression to your epsilon0 interpreter, and
you're done.

If you want to make your interpreter very fast you can generate
JavaScript code on the fly, or do your JIT tricks; just consider that
in general the unexeced state may still be non-static.

> V) As of yet, epsilon0 is written in Guile and Guile extensions
> written in C. It's possible in theory to provide a epsilon0.so against
> which bindings of some languages can be created and upon which new
> languages can be created, is it correct?

There are three implementations of epsilon0 in the current distribution:
i)   the one you're speaking about, in Guile, is only used for the first
     phases of the bootstrapping process.  You can ignore it;
ii)  the self-interpreter (an epsilon0 interpreter written in epsilon0);
     slow, mostly for documentation purposes.  In core.e;
iii) the epsilon0 interpreter in C (you saw the primitive e0:eval-in-c).
     This is the real thing, the one you use when running epsilon code
     from the REPL.  It will be optimized and/or replaced, but its
     performance is already quite acceptable.  epsilon0-interpreter.c
They are all tree interpreters, working directly on epsilon0
expressions -- which is not necessarily the fastest solution.

So the library you mention already exists.  If I recall correctly the
interpreter is in libepsilonruntime.


> VI) Does the following diagram describe correctly the current state of
> epsilon:
>
>        [guile] [pritimives-c]
>             \     /
>            [epsilon0]
>                |
>            [epsilon1]
>                |
>     [epsilon0 in epsilon][**]
>
> [**] this part is unexec when (load "bootstrap.scm") is called from
> guile+whatever prompt.

Yes, it's a reasonable approximation.  But consider, for example, that
epsilon0 "by itself" has to contain the global state, which is
implemented in symbol structures holding slots for macros and other
epsilon1 structures.  The dependencies of epsilon0 and epsilon1 look
quite "circular", and are difficult to extricate.  This is why
bootstrapping was difficult, but it's also why epsilon0 can be so
simple: it relies on the existence of global structures such as the
symbol table, which have to exist in the initial state of epsilon0, but
actually belongs more in epsilon1.  Disentangling the two in a very
precise way seems an idle exercise to me, but of course we can explain
what depends on what very clearly, if needed.

Your [epsilon0 in epsilon] can actually be [epsilon0 in epsilon0], but
there is also a corresponding primitive, and a corresponding guile
implementation.  And, in practice, any epsilon0 code will also use
e1:define.  Doing global definitions by surgery using epsilon0
expressions alone is too inconvenient for humans -- or actually I
understood it deeply too late, when I already had a temporary version of
e1:define which was usable in the mean time before the "real" one was
ready.  If I had understood all this complicated system of circular
dependencies from the beginning, I could have bootstrapped from Scheme
in a different and slightly simpler way, by having e1:define be a Scheme
macro building an epsilon0 definition working by surgery.

I don't think you'll need to worry too much about all of this now:
you'll never start from nothing; you'll always have epsilon1 available
as well.  You can use it to destroy part of it if you want, but you can
do that in epsilon1, which is quite convenient.

> After that when one use (load "quick-start.scm") from guile+whatever
> prompt, the diagram of the executing epsilon is:
>
> [guile parser] [**]
>            \    /
>           [epsilon]

Yes, and the Guile interpreter is there as well, even if you don't use
it very much: you typically use e1:toplevel or e1:define or
e1:define-macro or some other thing like that: they are all Guile macros
which immediately pass control to the epsilon0 interpreter.

When we bootstrap away from Guile and the thing gets clean, the
Read-Eval-Print Loop will simply be what it should be: a loop written in
epsilon1 calling read, eval, and print.


This was a long answer.  I've tried to be clear, but some points may
well still be cloudy; some of your questions touch the most difficult
issues to present.  Feel free to ask again, and let me know what's clear
and what's not.

I'm going to bed now.  Tomorrow in the morning I'll work and then, for
once, on the New Year's eve I'll be somewhere different than in front of
a keyboard; I'll probably not answer the rest of your email tomorrow.
Please bear with me; I'll come back soon.

Good night, and happy new year.

To the silent subscribers as well.

-- 
Luca Saiu
Home page:   http://ageinghacker.net
GNU epsilon: http://www.gnu.org/software/epsilon
Marionnet:   http://marionnet.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]