[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-hackers] regex speedup
From: |
Jim Ursetto |
Subject: |
[Chicken-hackers] regex speedup |
Date: |
Tue, 11 Mar 2008 00:30:51 -0600 |
Core hackers,
The attached patch has been shown to speed up regex matching considerably
in a typical case. It should also do so across the board.
Background: I observed a 40KB POST request to Spiffy
(x-www-form-urlencoded) to require 1.8 seconds on my machine for a
response. Investigation showed the entire time was spent in
http:canonicalize-string. There was a suboptimal algorithm there
which generated a lot of garbage; I fixed that in the http egg.
However, a core patch will complete the fix. The problem is in
string-search-positions and ultimately in re-match. Any time
a scheme string is passed to pcre_exec(), it is copied.
For large strings, or a typical use case for string-search-positions
-- e.g. repeatedly calling it to get the next match -- the overhead
is very high.
The patch just changes the type from c-string to scheme-pointer,
as pcre_exec() takes a length argument and doesn't need a null
terminator.
With both changes made Spiffy response time drops from 1.8 s to 0.1 s.
For overkill purposes, here is some supporting data.
Prelude:
(use url http-utils)
(define (encode str)
(string-translate (url-encode str (list #\_ #\- #\* #\. #\@ #\space))
" " "+"))
(define raw ...) ; raw text, length 30029
(define cooked (encode raw)) ; length 36889
Timing:
,t (begin (http:canonicalize-string cooked) (void))
original w/regex patch
1.957 s elapsed 0.334 s elapsed
1.831 s in GC 0.23 s in GC
2465 major GCs 311 major GCs
http fixes w/regex patch
0.26 s elapsed 0.022 s elapsed
0.195 s in GC 0 s in GC
175 major GCs 0 major GCs
re-match.txt
Description: Text document
- [Chicken-hackers] regex speedup,
Jim Ursetto <=