gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] Re: gzipped tar file on profiling


From: Camm Maguire
Subject: Re: [Gcl-devel] Re: gzipped tar file on profiling
Date: 02 Jan 2004 13:15:14 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Greetings!

Matt Kaufmann <address@hidden> writes:

> Thank you, Camm.  We already do some fixnum declaration and proclaiming in 
> ACL2
> for GCL, but I can imagine that we could benefit from doing some more.  I'll

OK, I was under the impression that the code you were running for AMD
was not stock ACL2 but something custom.  If there is an analogous
benchmark to what you've reported that can be run under stock ACL2,
that of course would facilitate the performance analysis greatly.

> try to do that at some point.  We can't just proclaim all integers to be
> fixnums, however, in case that is what sys-proclaim.lsp is intended to do.  We

No, the sys-proclaim.lsp basically collects certain information
gathered by the compiler in the form of a function proclamation.  Even
where you have declared a fixnum, the compiler will promote it to a
full lisp object if its passed as an argument to a function without a
proclamation (stating that the argument is a fixnum).


> do our own proclaiming automatically; roughly speaking, we use declare forms 
> of
> a defun for input types and THE forms for output types (but with 
> macroexpansion
> as necessary).  Our top-level function that does this is function 
> proclaim-file
> in source file acl2-fns.lisp, in case you're interested.
> 

I am indeed interested and will check it out -- thanks!

> Is there some way that make_fixnum could avoid creating new fixnum objects 
> more
> than once for each fixnum up to some limit, and also avoids garbage collecting
> such objects?  Maybe there is already such a mechanism in GCL and the limit
> merely needs to be increased, in order to fix most of this performance issue
> (since you say that GC time is the problem, presumably rather than
> fixnum-building time).  What do you think?
> 

Right now, make_fixnum returns a constant non-GC'ed fixnum for
integers between +- 1024.  If you would like to experiment with this
setting, you can change the value in object.h defined currently as

object.h:111:#define    SMALL_FIXNUM_LIMIT      1024

However, I really suspect that something like a heavily used loop
counter is not being compiled to a machine integer when it might in
principle be trivially so represented.  It would be very helpful if a
representative such function could be identified, in either stock acl2
or your custom stuff, (i.e. a heavily-used function which shows calls
to make_fixnum when run through (disassemble...)) -- its possible that
the compiler's determination of machine fixnum representations can be
improved, given the difference in performance with allegro.

Of course, we could also (eventually) take the ecl approach -- halve
or quarter the fixnum range and represent *all* fixnums as machine
integers with a flag bit set.  I'm not sure what possible negative
consequences could arise from restricting the fixnum range
(i.e. increasing the bignum range).


Take care,

> Thanks --
> -- Matt
>    Cc: address@hidden, address@hidden, address@hidden,
>          address@hidden
>    From: Camm Maguire <address@hidden>
>    Date: 30 Dec 2003 17:11:42 -0500
>    User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
>    Content-Type: text/plain; charset=us-ascii
> 
>    Greetings!  Just a preliminary note here -- the overwhelming majority
>    of the gbc cost with or without sgc is in collecting fixnum garbage.
>    Furthermore, the vast majority of this is generated in your compiled
>    lisp code.  
> 
>    As you probably know, if GCL can figure out that an integer is a
>    fixnum, usually via proclaims or declares, it will compile it directly
>    to a machine long integer avoiding any memory allocation.  Otherwise,
>    it calls the allocating routine make_fixnum.  
> 
>    At present, the only known (to me) system macro which often
>    unnecessarily allocates fixnums is dotimes -- if your code uses this a
>    lot let me know and I'll send you a replacement to try.  
> 
>    Otherwise these massive calls to make_fixnum are likely in code
>    written by the compiler.  You likely know which functions of yours
>    take the most time.  You can look for calls to make_fixnum via the
>    (disassemble...)  function which will output the generated C code.  A
>    simple (declare (fixnum i)) or similar where appropriate will likely
>    eliminate most of the runtime.  
> 
>    To be a bit more specific, to avoid any allocation, the compiler needs
>    to know both that the bounds of the integer are appropriate (which can
>    be gleaned from the (declare...) statement), and that any functions
>    using the variable as an argument have been compiled to take machine
>    integer arguments (which information typically comes from
>    (proclaims...)).  The compiler will "promote" a machine integer to a
>    lisp object via allocation if the integer is passed to a function
>    taking lisp objects as arguments.  You also probably know about the
>    automatic way of making a sys-proclaim.lsp for use by the compiler
>    in this regard.  Briefly, just load gcl_collectfn.o and execute
>    (compiler::emit-fn t) before compiling (generating .fn files for each
>    .lisp file), and then (compiler::make-all-proclaims "*.fn").  Then
>    load sys-proclaim.lsp before recompiling, or put this load command in
>    a gcl_cmpinit.lsp file in the source directory.  All this needs to be
>    automated in my opinion.
> 
>    The only current drawback of gprof profiling is that compiled lisp
>    code as conventionally loaded into the program's .data section is all
>    lumped together in one entry labeled <hicore>.  I'll be working on
>    improving this at some point.  For now, you can build your final
>    executable in an alternate way which will retain the identities of the
>    compiled lisp functions for the purposes of profiling should you be
>    unable to identify the culprit yourself.  Basically you set the
>    variable compiler::*default-system-p* to t before compiling, and
>    instead of using (si::save-system...), do
> 
>    (compiler::link
>          (list "module1.o" "module2.o" ...)
>          "new_image_name"
>          "(progn (lisp-code-to-run-after-loading-and-before-saving))")
> 
>    Here is how I do this for acl2 (this method is alas required on some
>    machines:) 
> 
>    
> =============================================================================
>    BINARIES:=acl2-fns.o axioms.o basis.o translate.o type-set-a.o 
> type-set-b.o rewrite.o simplify.o \
>            bdd.o other-processes.o induct.o history-management.o prove.o 
> defuns.o proof-checker-a.o\
>            defthm.o other-events.o ld.o proof-checker-b.o tutorial.o 
> interface-raw.o \
>            linear-a.o linear-b.o non-linear.o TMP1.o
>    BIN:=$(patsubst %,"%",$(BINARIES))
> 
>    INIT:=(initialize-acl2 (quote include-book) *acl2-pass-2-files* t t)
>    POST:=(load \"foo.lsp\")\
>        (setq compiler::*default-system-p* nil)\
>        (in-package \"acl2\")\
>        (save-acl2 (quote (initialize-acl2 (quote include-book) 
> *acl2-pass-2-files* t)) \"saved_acl2\"))
> 
>    nsaved_acl2: foo.lsp
>          echo '(load "foo.lsp")(in-package "acl2")(compile-acl2)' | gcl
>          echo '(load "foo.lsp")(in-package "acl2")(load-acl2)$(INIT)' | gcl
>          echo '(compiler::link (list $(BIN)) "$@" "$(POST)" "" nil)' |gcl
>    
> =============================================================================
> 
>    I'd be interested in knowing any results, as my current guess is that
>    Allegro can infer a fixnum where we cannot at present.  If you can
>    give me the details of such misinference, I can try to fix it.
> 
>    I would have guessed that you were doing a lot of array indexing, but
>    this appears not to be the case.
> 
>    Take care, 
> 
> 
>    Matt Kaufmann <address@hidden> writes:
> 
>    > Camm --
>    > 
>    > The run has completed.  It seemed to take much longer without SGC.  I'll 
> send
>    > you the uuencoded gzipped tar file (509K) now (see the README after you 
> expand
>    > it).  I'll spare those CCed on this email from that file.
>    > 
>    > -- Matt
>    >    cc: address@hidden, address@hidden, address@hidden,
>    >     address@hidden
>    >    From: Camm Maguire <address@hidden>
>    >    Date: 17 Dec 2003 17:11:44 -0500
>    >    User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
>    >    Content-Type: text/plain; charset=us-ascii
>    > 
>    >    Thanks Matt!  I got it now.  I wonder if you had also saved the gbc
>    >    output -- if I recall you usually run with *notify-gbc* turned on.
>    >    Also, a (room) before and after would be of interest, as well as the
>    >    effect of (si:sgc-on nil).
>    > 
>    >    On the one hand, GBC is clearly the performance culprit, and that is
>    >    very good news, as it is relatively easy to address with a better
>    >    memory layout.  What is odd about your profiling though is that
>    >    marking is taking up the lion's share of the time.  In normal GBC, its
>    >    the sweeper that costs, at least in the tests I've run so far.  And I
>    >    think I know why, it is the following extra pass that's required with
>    >    sgc marking:
>    > 
>    >      /* mark all non recent data on writable pages */
>    >      {
>    >        int t,i=page(heap_end);
>    >        struct typemanager *tm;
>    >        char *p;
>    > 
>    >        while (--i >= 0) {
>    >   if (WRITABLE_PAGE_P(i)
>    >       && (t=type_map[i]) < (int) t_end)
>    >     ;
>    >   else 
>    >     continue;
>    >   tm=tm_of(t);
>    >   p=pagetochar(i);
>    >   if ( t == t_cons) 
>    >     for (j = tm->tm_nppage; --j >= 0; p += tm_table[t_cons].tm_size/*  
> sizeof(struct cons) */) {
>    >       object x = (object) p; 
>    >       if (SGC_OR_M(x)) 
>    >         continue;
>    >       if (x->d.t==t_cons) {
>    >         x->d.m = TRUE; 
>    >         sgc_mark_cons(x);
>    >       } else
>    >         sgc_mark_object1(x);
>    >     }
>    >   else {
>    >     int size=tm->tm_size;
>    >     for (j = tm->tm_nppage; --j >= 0; p += size) {
>    >       object x = (object) p; 
>    >       if (SGC_OR_M(x)) continue;
>    >       sgc_mark_object1(x);
>    >     }
>    >   }
>    >        }
>    >      }
>    > 
>    > 
>    >    I'll have to think a bit about why this was put in, but in short, sgc
>    >    need not be a performance boost if most of your memory is writable
>    >    anyway, and might (speculation) even be less efficient.  This pertains
>    >    to that note I sent earlier about the enabling of SGC in ACL2, which
>    >    you asked me to clarify, and I never did, as I don't yet have a good
>    >    answer as to a metric to determine when SGC helps and when it does
>    >    not.  One thing should be clear, and that is many heap expansions,
>    >    which require an (si::sgc-on nil)(gbc t)(si::sgc-on t), are very
>    >    expensive, as these steps require that the read-only subset of memory
>    >    be determined all over again.
>    > 
>    >    To be a bit more verbose, SGC works by taking the memory set at time
>    >    of (si:sgc-on t), making it read-only, and allocating new writable
>    >    pages for use in subsequent allocation -- these are "SGC" pages,
>    >    i.e. the memory set is divided into two.  Write attempts on the old
>    >    memory make those pages writable, but they are not "SGC" pages as far
>    >    as the algorithm goes, and must be garbage collected separately.  The
>    >    idea is to enable sgc at the point where the most memory that will
>    >    essentially have no further modifications can be set aside.  Setting
>    >    aside a small amount of truly read-only memory, or a lot of memory
>    >    which has significant subsequent modifications, both suffer from
>    >    inefficiencies.
>    > 
>    >    >From looking at b.out, for example, half the time is spend in
>    >    sgc_mark_phase itself, and half in its main child process,
>    >    sgc_mark_object1.  This suggests to me that the ostensibly read-only
>    >    memory which was actually modified and thus made writable (writable
>    >    non-SGC pages) is substantial.  The loop shown above is similar to
>    >    that in the sweeper which is the main cost in normal GBC, so the more
>    >    memory it has to process, the worse performance will be.
>    > 
>    >    Anyway, these are just initial thoughts.  I'd like Vadim's input if
>    >    possible to construct for this case an analog of the maxima init file
>    >    he's just put together, so we can get a performance comparison with
>    >    ACL short of GBC.  This in turn will help focus our efforts on where
>    >    our out of the box algorithms and parameter settings can be improved. 
>    > 
>    >    Take care, 
>    > 
>    > 
>    >    Matt Kaufmann <address@hidden> writes:
>    > 
>    >    > Camm --
>    >    > 
>    >    > In case email from AMD to you is problematic, I've also just now 
> put the
>    >    > profile results on the web as a gzipped tar file:
>    >    > 
>    >    > /u/www/users/moore/acl2/seminar/temp/camm.tar.gz
>    >    > or
>    >    > http://www.cs.utexas.edu/users/moore/acl2/seminar/temp/camm.tar.gz
>    >    > 
>    >    > When you let me know you've received it, I'll delete this file.
>    >    > 
>    >    > -- Matt
>    >    > 
>    >    > 
>    >    > 
>    > 
>    >    -- 
>    >    Camm Maguire                                          address@hidden
>    >    
> ==========================================================================
>    >    "The earth is but one country, and mankind its citizens."  --  
> Baha'u'llah
>    > 
>    > 
>    > _______________________________________________
>    > Gcl-devel mailing list
>    > address@hidden
>    > http://mail.gnu.org/mailman/listinfo/gcl-devel
>    > 
>    > 
>    > 
> 
>    -- 
>    Camm Maguire                                               address@hidden
>    ==========================================================================
>    "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
> 
> 
> 

-- 
Camm Maguire                                            address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]