chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] need irregex benchmark


From: Peter Bex
Subject: Re: [Chicken-hackers] need irregex benchmark
Date: Sun, 22 May 2011 23:33:45 +0200
User-agent: Mutt/1.4.2.3i

On Sun, May 22, 2011 at 11:25:44AM -0700, matt welland wrote:
> Hi Felix,
> 
> It isn't a particularly complex benchmark but my logpro app relies
> heavily on regexes and I'm seeing some somewhat slow performance. You
> can get it here http://www.kiatoa.com/fossils/logpro.

I tried to clone it, but wasn't able to update:

$ fossil clone http://www.kiatoa.com/fossils/logpro logpro.fossil
                Bytes      Cards  Artifacts     Deltas
Sent:              53          1          0          0
Received:         312          1          0          0
Sent:             680         27          0          0
fossil: *** time skew *** server is slow by 89.7 seconds
Total network traffic: 641 bytes sent, 0 bytes received
Rebuilding repository meta-data...
  0.0% complete...
project-id: (null)
server-id:  78c9b6811813c697324d20c0ac39eb9850f38ca2
admin-user: sjamaan (password is "7de28c")
$ mkdir logpro
$ cd logpro
$ fossil open ../logpro.fossil
$ ls
_FOSSIL_
$

I have no idea what's going on here.  Christian was able
to rebuild the repo structure using today's fossil:
http://paste.call-cc.org/paste?id=16cedff25f7cfa8bc83f6cb677bad9ba8e02274f

> We process some
> very large log files and have many waivers, ignores and error patterns.
> The procedure using 50% of the cycles, misc:line-match-regexs, merely
> applies a list of regexes to a line of text looking for the first match.
> I suspect there is a better way (suggestions welcome).

Perhaps you can construct one big regex from the list with an
(or X Y Z)-like combination.  If you need to know which regex
was matched you can use (submatch X) or (submatch-named X), and
query the result object to see which submatch is non-#f.

When possible, irregex tries to collapse common prefixes.  So
(or "aaabz" "aaacz") will be compiled to something equivalent
to (seq "aaa" (or "b" "c") "z").  Of course the prefix can be
easily checked in a loop.  The suffixes are two state changes
which only need to check their two characters.

If you match something like "aaaby" against "aaabz" and *then*
against "aaacz", it will need to check the prefix twice.

I apologize if this is obvious to you; I haven't been able to
look at your code, so I'm basically just guessing.  Perhaps
you can enable the code browser for anonymous users?

> On a separate note I'd like to turn logpro and megatest into egg apps
> someday. I read the distributed egg system docs and will give it a go
> one of these days....

I'd be interested to hear how this goes; I was unable to convince
fossil to create pre-packaged tarballs or point to the "tip" of
a file through the web interface.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth



reply via email to

[Prev in Thread] Current Thread [Next in Thread]