epsilon-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[epsilon-devel] Jitter: stacks as a new argument kind


From: Luca Saiu
Subject: [epsilon-devel] Jitter: stacks as a new argument kind
Date: Wed, 30 Jan 2019 02:25:45 +0100
User-agent: Gnus (Gnus v5.13), GNU Emacs 25.1.50.2, x86_64-unknown-linux-gnu

When idly playing with changes in a VM instruction I noticed something
surprising: all the care I put in writing my C code for stack operations
might have been excessive, or possibly, but I would not be so sure about
it, my stack data structure and operation macros are so clever that GCC
can do a good job generating code.  In reality GCC never ceases to amaze
me.

This was my original definition for the plus instruction in JitterLisp,
popping two elements from the main stack and pushing back their sum.  In
case of type errors, control fast-branches to the given label.

instruction primitive-primordial-plus (?f)
  code
    JITTERLISPVM_CHECK_TYPES_2(FIXNUM, FIXNUM, JITTER_ARGF0);
    JITTERLISP_PLUS_(JITTER_TOP_MAINSTACK(),
                     JITTER_UNDER_TOP_MAINSTACK(), JITTER_TOP_MAINSTACK());
    JITTER_NIP_MAINSTACK();
  end
end

The first line in the code is just the type check.  It doesn't affect
anything else shown here, so I will just ignore it and consider an
unsafe variant of jitterlisp, compiled with type checking disabled in VM
operations.  Either jitterlisp--unsafe--no-threading or
jitterlisp--unsafe--boehm--direct-threading will do.

The JITTERLISP_PLUS_ invocation assigns the sum of the second and third
argument to the first argument, an l-value.  Every object is tagged,
even with type checks disabled.  It is clear that JITTERLISP_PLUS_
updates the top element of mainstack, without touching the undertop
element.  The nip operation in the following line eats up the undertop
without disturbing the top, which sets up the stack to the right
configuration: the final effect is ( a b -- a+b ), as it should be.

It turns out that I could have defined the same operation in a much more
naïve way.  The following variant generates exactly the same code as the
version above, up to trivial reordering, with every GCC version I have
tested on every architecture:

instruction primitive-primordial-plus (?f)
  code
    JITTERLISPVM_CHECK_TYPES_2(FIXNUM, FIXNUM, JITTER_ARGF0);
    jitterlisp_object operand0 = JITTER_TOP_MAINSTACK();
    JITTER_DROP_MAINSTACK();
    jitterlisp_object operand1 = JITTER_TOP_MAINSTACK();
    JITTER_DROP_MAINSTACK();
    jitterlisp_object result;
    JITTERLISP_PLUS_(result, operand0, operand1);
    JITTER_PUSH_MAINSTACK(result);
  end
end

Considering TOS-optimization, and the fact that operand0 and operand1
are not even const, I find it very remarkable that GCC can safely see
through the drops and push, and just generate the same code as before.

This third variant also generates the same code:

instruction primitive-primordial-plus (?f)
  code
    JITTERLISPVM_CHECK_TYPES_2(FIXNUM, FIXNUM, JITTER_ARGF0);
    asm volatile (JITTER_ASM_DEBUGGING_NOP(2):::);
    const jitterlisp_object INPUT0_RVALUE = JITTER_UNDER_TOP_MAINSTACK();
    const jitterlisp_object INPUT1_RVALUE = JITTER_TOP_MAINSTACK();
    JITTER_DROP_MAINSTACK();
#define OUTPUT0_LVALUE JITTER_TOP_MAINSTACK()
    JITTERLISP_PLUS_(OUTPUT0_LVALUE, INPUT0_RVALUE, INPUT1_RVALUE);
#undef INPUT0_RVALUE
#undef INPUT1_RVALUE
#undef OUTPUT0_LVALUE
  end
end

Let us try another instruction.  swap seems a good candidate:

instruction swap ()
  begin
    JITTER_SWAP_MAINSTACK();
  end
end

It turns out that even the following version is just as good, and
generates the same three load, store and move hardware instructions:

instruction swap ()
  begin
    const jitterlisp_object INPUT0_RVALUE = JITTER_AT_DEPTH_MAINSTACK(1);
    const jitterlisp_object INPUT1_RVALUE = JITTER_AT_DEPTH_MAINSTACK(0);;
    JITTER_DROP_MAINSTACK();
    JITTER_DROP_MAINSTACK();
    JITTER_PUSH_UNSPECIFIED_MAINSTACK();
    JITTER_PUSH_UNSPECIFIED_MAINSTACK();
#define OUTPUT0_LVALUE JITTER_UNDER_TOP_MAINSTACK()
#define OUTPUT1_LVALUE JITTER_TOP_MAINSTACK()
    OUTPUT0_LVALUE = INPUT1_RVALUE;
    OUTPUT1_LVALUE = INPUT0_RVALUE;
#undef INPUT0_RVALUE
#undef INPUT1_RVALUE
#undef OUTPUT0_LVALUE
#undef OUTPUT1_LVALUE
  end
end

You may see where I am getting here.  This version of swap would be
interesting if the user only had to write the two lines
    OUTPUT0_LVALUE = INPUT1_RVALUE;
    OUTPUT1_LVALUE = INPUT0_RVALUE;
in the body, with the rest being machine-generated.

The fact that having the *stack effect* of a VM instruction achieved
automatically, by machine-generated code, with no performance penalty at
all, convinced me that I could afford a particularly nice form of
generalization to solve a factoring problem which was bothering me.

The problem is that for many if not most operations, it would be
desirable to have a variant working on the stack, and a variant working
with at least one literal or register operand.  It is common to have to
sum 1 to an expression, for example, and 1+ can be a special case
implemented by a special instruction.  I do that in JitterLisp.  But the
pattern should be generalized.  I do not want to implement the plus
instruction multiple times.

Here is the solution: *each stack should be a possible kind*, which VM
instructions can support as an alternative to other kinds, with any
mode.  An input-mode stack operand will be taken as an implicit pop; an
output-mode as an implicit push; an input-output mode as a pop followed
by a push.

I will change stack names to a single lowercase letter, exactly like
register classes.  Stacks will then be syntactically easy to "pass" to
VM instructions --- of course there will be no overhead: each
stack-using variant will be a different specialization.  Stacks are
never many in number, anyway, and with most operations only one stack
will be usable.

Then, the only definition of swap would be:

instruction swap (?!RS, ?!RS)
  begin
    OUTPUT0_LVALUE = INPUT1_RVALUE;
    OUTPUT1_LVALUE = INPUT0_RVALUE;
  end
end

The Forth-style version swapping the top and undertop elements on the
main stack would be invoked, unspecialized, as

  swap %s, %s

But swapping the TOS and a VM register would be equally easy, for
example by

  swap %s, %r0

(notice that there is no index for the stack, differently from the
register) or by the functionally equivalent

  swap %r0, %s

.  Yes, there is a potential problem of combinatorial explosion, the
same existing with registers.  Having stack operands as well makes the
situation only slightly worse, making the specialized instruction space
grow like having *one* more fast register.  The problem will be
alleviated, but I think this kind of power is worth the cost anyway,


The new plus will be something like:

instruction primitive-primordial-plus (?SRn, ?SRn, !SR, ?f)
  code
    /* [Type checking on INPUT0_RVALUE and INPUT1_RVALUE, with a new
        macro not only working on stack elements...] */
    JITTERLISP_PLUS_(OUTPUT0_LVALUE, INPUT0_RVALUE, INPUT1_RVALUE);
  end
end

Stack effects, if any, being implicit.  A version functionally
equivalent to the current stack-only version would be available as

  primitive-primordial-plus %s, %s, %s, $error-label

An use of 1+ might be compiled into:

  primitive-primordial-plus %s, 1, %s, $error-label

But it will be also easy, for example, to compile (set! a (+ b c)) to
code not touching the stack, as

  primitive-primordial-plus %r1, %r2, %r0, $error-label

, using just a different specialization of the same VM instruction.

Yes, some details, particularly regarding the types of immediates and
labels, will almost certainly need to change in an incompatible way.  I
accept that.  It will be worth the trouble.

I posit that having more orthogonal VM instructions will make code
generation much easier.  I will develop this point in the next message.

-- 
Luca Saiu
* GNU epsilon:           http://www.gnu.org/software/epsilon
* My personal web site:  http://ageinghacker.net

I support everyone's freedom of mocking any opinion or belief, no
matter how deeply held, with open disrespect and the same unrelented
enthusiasm of a toddler who has just learned the word "poo".

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]