chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Speed up srfi-69 by caching current min and max length based


From: Mario Domenech Goulart
Subject: Re: [PATCH] Speed up srfi-69 by caching current min and max length based on load and *actually resizing*
Date: Sun, 21 Mar 2021 18:29:32 +0100

Hi,

On Thu, 18 Mar 2021 11:18:52 +0100 Peter Bex <peter@more-magic.net> wrote:

> I'm sure you've all read Ben Hoyt's blog post from earlier this week,
> https://benhoyt.com/writings/count-words/
>
> Of course (duh) I tried to implement a version in CHICKEN.  So did
> Vasilij and Mario, and we all came to the conclusion that our performance
> on that simple task is abysmal, we can't even outperform Python.
>
> So I had a look and came up with one somewhat simple improvement to
> srfi-69; every time you insert something into a hash table, it checks
> if it needs to be resized.  This check involves a computation which is
> needlessly heavy; it multiplies the min and max load factor with the
> current length, calls "floor" and inexact->exact on the result, just to
> compare it with the new size.
>
> The attached patch moves these calculations to when the hash table is
> created or resized, so that it can just read it out when checking for
> resize.  It is a bit ugly because now we have 13 slots in the hash table
> object, and keeping track of all the indexes becomes a bit cumbersome.
> This could be changed in a refactor if necessary.
>
> Then, I noticed that the resize check doesn't actually even resize it
> when needed!  The argument passed to hash-table-resize! is "len", while
> I think it should be "newsiz".  I've also fixed that in the patch.
>
> This improves performance on my machine by getting the benchmark down
> from 12 seconds to 4 seconds using Vasilij's code at http://ix.io/2Ths.
> This puts us at least in the same ball park as Python, using the "simple"
> version at least (which is comparable in code to Vasilij's).

Thanks, Peter.  I've applied your patch and released srfi-69 0.4.2.

I could not reproduce the same performance improvements on my system
(CHICKEN 5.2.0, GCC 8.3.0, Linux, X86-64), but I did get some
improvement: the run time of the test program by Vasilij went from 8.8s
to 7.0s on my system.

I ran salmonella over all eggs that directly depend on srfi-69 and
everything seems to look ok.

Since such operation is not trivial, I'll document here the procedure to
test a patched egg with salmonella.  The challenge here is to make
salmonella (or, rather, chicken-install) pick the patched egg, not the
upstream one.

What I did, basically, was:

1. Clone eggs-5-latest

   $ git clone git://code.call-cc.org/eggs-5-latest

2. Use henrietta-cache-overlay
   (https://github.com/mario-goulart/henrietta-cache-overlay) to convert
   the directory layout in eggs-5-latest to a directory layout that
   chicken-install understands:

   $ henrietta-cache-overlay $PWD/eggs-5-latest $PWD/eggs

3. Patch srfi-69 with your changes in the eggs/srfi-69 directory (which
   is a symlink to eggs-5-latest/srfi-69/<latest-version>)

4. Tweak the `location' field of setup.defaults to point to the `eggs'
   directory created by henrieta-cache-overlay (step 2).

5. Determine the direct dependencies of srfi-69.  To do that, I sloppily
   parsed the .dot file generated by salmonella-html-report.  By default
   our salmonella machines don't keep those files, so I just fetched a
   recent salmonella log file from a salmonella machine and ran
   salmonella-html-report with the `--keep-dot-files' option:

   $ wget 
http://salmonella-linux-x86.call-cc.org/master-debugbuild/gcc/linux/x86/2021/03/19/salmonella.log.bz2
   $ bzip2 -d salmonella.log.bz2
   $ salmonella-html-report ­-keep-dot-files salmonella.log html
   $ cd rev-dep-gaphs
   $ grep ' -> _srfi_69;' srfi-69.dot | awk '{print $1}' | sed 's/^_//; s/_/-/g'
   json memoize spiffy-request-vars awful-static-pages glls lmdb-ht
   statistics srfi-171 object-evict shen callable-data-structures
   postgresql srfi-209 webview condition-utils beaker medea inotify
   pyffi protobuf srfi-99 git sql-de-lite yasos dataframe moremacros
   s11n awful kiwi hyde http-client sqlite3 string-utils srfi-29 utf8
   edn srfi-113 srfi-19 sdl2 lowdown fmt chicken-doc http-session iup
   vlist comparse pwdb minissh stty concurrent-native-callbacks
   levenshtein chicken-doc-admin

   Those are the 52 eggs which directly depend on srfi-69.

6. Now that we know the eggs which depend on srfi-69, we can use
   salmonella with the chicken installation whose setup.defaults has
   been tweaked to point to a local directory of egg sources.  In my
   case I used salmonella-epidemy to speed things up a little:

   $ salmonella-epidemy --keep-repo --instances=$(nproc) <eggs from step 5>

   Since we tweaked setup.defaults to point to the egg source directory
   where srfi-69 has been patched, salmonella will build all eggs with
   the patched srfi-69.

7. salmonella-epidemy (step 6) generates a salmonella.log file, which
   where we can see the results and compare with the ones we get from
   the salmonella machines.  The comparison I did manually (just checked
   what was expected to fail).

   $ salmonella-log-viewer salmonella.log | grep '^==='

I actually executed that on one of our salmonella machines, just to be
sure that all the system dependencies were properly installed, so that I
could compare the results obtained with the patched srfi-69 against the
ones with the upstream egg.

All the best.
Mario
-- 
http://parenteses.org/mario



reply via email to

[Prev in Thread] Current Thread [Next in Thread]