[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] need irregex benchmark

From: matt welland
Subject: Re: [Chicken-hackers] need irregex benchmark
Date: Sun, 22 May 2011 15:26:18 -0700

On Sun, 2011-05-22 at 23:33 +0200, Peter Bex wrote:
> On Sun, May 22, 2011 at 11:25:44AM -0700, matt welland wrote:
> > Hi Felix,
> > 
> > It isn't a particularly complex benchmark but my logpro app relies
> > heavily on regexes and I'm seeing some somewhat slow performance. You
> > can get it here
> I tried to clone it, but wasn't able to update:
> $ fossil clone logpro.fossil
>                 Bytes      Cards  Artifacts     Deltas
> Sent:              53          1          0          0
> Received:         312          1          0          0
> Sent:             680         27          0          0
> fossil: *** time skew *** server is slow by 89.7 seconds
> Total network traffic: 641 bytes sent, 0 bytes received
> Rebuilding repository meta-data...
>   0.0% complete...
> project-id: (null)
> server-id:  78c9b6811813c697324d20c0ac39eb9850f38ca2
> admin-user: sjamaan (password is "7de28c")
> $ mkdir logpro
> $ cd logpro
> $ fossil open ../logpro.fossil
> $ ls
> $

My mistake. I created that repo with a newer version of fossil than what
the server is using. Fossil's a bit poor on the error message and
failure behavior though. I think it will work if you try again.

> I have no idea what's going on here.  Christian was able
> to rebuild the repo structure using today's fossil:
> > We process some
> > very large log files and have many waivers, ignores and error patterns.
> > The procedure using 50% of the cycles, misc:line-match-regexs, merely
> > applies a list of regexes to a line of text looking for the first match.
> > I suspect there is a better way (suggestions welcome).
> Perhaps you can construct one big regex from the list with an
> (or X Y Z)-like combination.  If you need to know which regex
> was matched you can use (submatch X) or (submatch-named X), and
> query the result object to see which submatch is non-#f.
> When possible, irregex tries to collapse common prefixes.  So
> (or "aaabz" "aaacz") will be compiled to something equivalent
> to (seq "aaa" (or "b" "c") "z").  Of course the prefix can be
> easily checked in a loop.  The suffixes are two state changes
> which only need to check their two characters.
> If you match something like "aaaby" against "aaabz" and *then*
> against "aaacz", it will need to check the prefix twice.
> I apologize if this is obvious to you; I haven't been able to
> look at your code, so I'm basically just guessing.  Perhaps
> you can enable the code browser for anonymous users?

It did occur to me that sticking all the regexes into a single regex
might be faster but now I have an idea why :) However the subtleties
concern me. Will it stop at the first match? Does it work left to right?
Thanks for the idea, I will experiment with it.

> > On a separate note I'd like to turn logpro and megatest into egg apps
> > someday. I read the distributed egg system docs and will give it a go
> > one of these days....
> I'd be interested to hear how this goes; I was unable to convince
> fossil to create pre-packaged tarballs or point to the "tip" of
> a file through the web interface.

The more recent versions will make a tarball (browse the timeline to a
version node), and you can point to the tip of a file using the
doc/trunk path, for example:

> Cheers,
> Peter

reply via email to

[Prev in Thread] Current Thread [Next in Thread]