gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] Re: gzipped tar file on profiling


From: Matt Kaufmann
Subject: [Gcl-devel] Re: gzipped tar file on profiling
Date: Thu, 18 Dec 2003 10:33:19 -0600

Good morning --

Seems like a tricky business!  But I'm glad you're encouraged.

Yes, we run with *notify-gbc* on.  I'll send you (and spare the CC people) the
gc messages from the runs in a moment, from my AMD account.

>> Also, a (room) before and after would be of interest, as well as the
>> effect of (si:sgc-on nil).

I started re-running the suite last night with
(si::set-gmp-allocate-relocatable t), (room) before and (after), and
(si::sgc-on nil).  It's still running.  A single run (corresponding to file
b.out that I sent you) that took under 2 hours before has already taken over 6
hours and appears to be only about a third of the way done.  I'll send you
results when that's done.

-- Matt
   cc: address@hidden, address@hidden, address@hidden,
           address@hidden
   From: Camm Maguire <address@hidden>
   Date: 17 Dec 2003 17:11:44 -0500
   User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
   Content-Type: text/plain; charset=us-ascii

   Thanks Matt!  I got it now.  I wonder if you had also saved the gbc
   output -- if I recall you usually run with *notify-gbc* turned on.
   Also, a (room) before and after would be of interest, as well as the
   effect of (si:sgc-on nil).

   On the one hand, GBC is clearly the performance culprit, and that is
   very good news, as it is relatively easy to address with a better
   memory layout.  What is odd about your profiling though is that
   marking is taking up the lion's share of the time.  In normal GBC, its
   the sweeper that costs, at least in the tests I've run so far.  And I
   think I know why, it is the following extra pass that's required with
   sgc marking:

     /* mark all non recent data on writable pages */
     {
       int t,i=page(heap_end);
       struct typemanager *tm;
       char *p;

       while (--i >= 0) {
         if (WRITABLE_PAGE_P(i)
             && (t=type_map[i]) < (int) t_end)
           ;
         else 
           continue;
         tm=tm_of(t);
         p=pagetochar(i);
         if ( t == t_cons) 
           for (j = tm->tm_nppage; --j >= 0; p += tm_table[t_cons].tm_size/*  
sizeof(struct cons) */) {
             object x = (object) p; 
             if (SGC_OR_M(x)) 
               continue;
             if (x->d.t==t_cons) {
               x->d.m = TRUE; 
               sgc_mark_cons(x);
             } else
               sgc_mark_object1(x);
           }
         else {
           int size=tm->tm_size;
           for (j = tm->tm_nppage; --j >= 0; p += size) {
             object x = (object) p; 
             if (SGC_OR_M(x)) continue;
             sgc_mark_object1(x);
           }
         }
       }
     }


   I'll have to think a bit about why this was put in, but in short, sgc
   need not be a performance boost if most of your memory is writable
   anyway, and might (speculation) even be less efficient.  This pertains
   to that note I sent earlier about the enabling of SGC in ACL2, which
   you asked me to clarify, and I never did, as I don't yet have a good
   answer as to a metric to determine when SGC helps and when it does
   not.  One thing should be clear, and that is many heap expansions,
   which require an (si::sgc-on nil)(gbc t)(si::sgc-on t), are very
   expensive, as these steps require that the read-only subset of memory
   be determined all over again.

   To be a bit more verbose, SGC works by taking the memory set at time
   of (si:sgc-on t), making it read-only, and allocating new writable
   pages for use in subsequent allocation -- these are "SGC" pages,
   i.e. the memory set is divided into two.  Write attempts on the old
   memory make those pages writable, but they are not "SGC" pages as far
   as the algorithm goes, and must be garbage collected separately.  The
   idea is to enable sgc at the point where the most memory that will
   essentially have no further modifications can be set aside.  Setting
   aside a small amount of truly read-only memory, or a lot of memory
   which has significant subsequent modifications, both suffer from
   inefficiencies.

   >From looking at b.out, for example, half the time is spend in
   sgc_mark_phase itself, and half in its main child process,
   sgc_mark_object1.  This suggests to me that the ostensibly read-only
   memory which was actually modified and thus made writable (writable
   non-SGC pages) is substantial.  The loop shown above is similar to
   that in the sweeper which is the main cost in normal GBC, so the more
   memory it has to process, the worse performance will be.

   Anyway, these are just initial thoughts.  I'd like Vadim's input if
   possible to construct for this case an analog of the maxima init file
   he's just put together, so we can get a performance comparison with
   ACL short of GBC.  This in turn will help focus our efforts on where
   our out of the box algorithms and parameter settings can be improved. 

   Take care, 


   Matt Kaufmann <address@hidden> writes:

   > Camm --
   > 
   > In case email from AMD to you is problematic, I've also just now put the
   > profile results on the web as a gzipped tar file:
   > 
   > /u/www/users/moore/acl2/seminar/temp/camm.tar.gz
   > or
   > http://www.cs.utexas.edu/users/moore/acl2/seminar/temp/camm.tar.gz
   > 
   > When you let me know you've received it, I'll delete this file.
   > 
   > -- Matt
   > 
   > 
   > 

   -- 
   Camm Maguire                                         address@hidden
   ==========================================================================
   "The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]