[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] regex speedup

From: Jim Ursetto
Subject: [Chicken-hackers] regex speedup
Date: Tue, 11 Mar 2008 00:30:51 -0600

Core hackers,

The attached patch has been shown to speed up regex matching considerably
in a typical case.  It should also do so across the board.

Background: I observed a 40KB POST request to Spiffy
(x-www-form-urlencoded) to require 1.8 seconds on my machine for a
response.  Investigation showed the entire time was spent in
http:canonicalize-string.  There was a suboptimal algorithm there
which generated a lot of garbage; I fixed that in the http egg.

However, a core patch will complete the fix.  The problem is in
string-search-positions and ultimately in re-match.  Any time
a scheme string is passed to pcre_exec(), it is copied.
For large strings, or a typical use case for string-search-positions
-- e.g. repeatedly calling it to get the next match -- the overhead
is very high.

The patch just changes the type from c-string to scheme-pointer,
as pcre_exec() takes a length argument and doesn't need a null

With both changes made Spiffy response time drops from 1.8 s to 0.1 s.

For overkill purposes, here is some supporting data.

(use url http-utils)
(define (encode str)
  (string-translate (url-encode str (list #\_ #\- #\* #\. #\@ #\space))
                    " " "+"))
(define raw ...)    ; raw text, length 30029
(define cooked (encode raw))  ; length 36889

,t (begin (http:canonicalize-string cooked) (void))

original                w/regex patch

 1.957 s elapsed        0.334 s elapsed
 1.831 s in GC           0.23 s in GC
  2465 major GCs          311 major GCs

http fixes              w/regex patch

  0.26 s elapsed        0.022 s elapsed
 0.195 s in GC              0 s in GC
   175 major GCs            0 major GCs

Attachment: re-match.txt
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]