chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] [PATCH] Improve irregex matching performance a lot by


From: Peter Bex
Subject: [Chicken-hackers] [PATCH] Improve irregex matching performance a lot by adding two type declarations
Date: Tue, 8 Dec 2015 22:12:46 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi all,

I've done some benchmarking of the irregex unit and I found that the
single most performance-critical part of the code is "cset-contains?".

This procedure is used to check the current character in the input string
against the regex DFA state's transition table while matching the input,
so as probably figured, it's called EXTREMELY often.

This procedure is compiled to pretty much optimal code (the inner loop
is compiled to a "goto loop" in C, which is the absolute best case in
CHICKEN), but it uses safe accessors.  Since the csets are never
exposed to the user, it's fine to use unsafe accessors here, so I tried
to add type declarations to the two arguments, and I'm happy to report
that this results in a very big improvement.

Using the uri-generic egg's irregex-based alternative from trunk:

-------------
(load "uri-generic.irregex.so") ;; Use the compiled version!
(load "../uri-generic.import.scm")
(import uri-generic)
(time 
  (let loop ((i 0))
     (unless (> i 10000)
       (uri-reference 
"http://call-with-current-continuation.org/foo/bar?mooh#qux";)
       (loop (add1 i)))))
-------------

Using the unpatched version:
1.336s CPU time, 0.028s GC time (major), 600061/2562 mutations (total/tracked), 
11/2823 GCs (major/minor)

Using the patched version:
0.828s CPU time, 0.036s GC time (major), 600061/2562 mutations (total/tracked), 
11/2823 GCs (major/minor)

The improvement in some other tests that I did weren't as drastic.  For
example the medea benchmarks are about 10% faster, but then it doesn't
rely so heavily on irregex as the uri-generic alternative above.  So
all in all, I think this is quite a worthwhile improvement to include.
In several cases where irregex-match is the bottleneck, this modified
version caused the bottleneck to move to other parts of the code.

Now, if you look at the patch you'll see that it changes irregex-core.scm,
which means it creates a "fork" against that part of the code in upstream.
I don't see a way to declare the type from the "outside", in irregex.scm,
unless we also copy that code body as a macro, but that feels really ugly
and will cause some code duplication in the resulting C code.

I really, really dislike patching the code for irregex itself, but in
this case it really pays off.  And the change is pretty straightforward:
just add two lines and reindent.  It'll create another maintenance burden
of course, but this is less invasive than the cond-expand just below the
definition of vector-copy (is that (still) needed?).

Cheers,
Peter

Attachment: 0001-Improve-irregex-matching-performance.chicken-5.patch
Description: Text Data

Attachment: 0001-Improve-irregex-matching-performance.master.patch
Description: Text Data

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]