[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: memmem speedup
From: |
Ralf Wildenhues |
Subject: |
Re: memmem speedup |
Date: |
Mon, 7 Jan 2008 20:59:57 +0100 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Hi Eric,
* Eric Blake wrote on Sun, Jan 06, 2008 at 02:23:33AM CET:
>
> I'd appreciate any reviews before checking it in.
Here's a rough glance at it. FWIW, the diff is not very readable
(there was a patch to diffutils out there for --more-readable).
> +/* We use the Two-Way string matching algorithm, which guarantees
> + linear complexity with constant space. Additionally, for long
> + needles, we also use a bad character shift table similar to the
> + Boyer-Moore algorithm to acheive sub-linear performance.
s/acheive/achieve/ (2 instances in the code)
> +/* Peform a critical factorization of NEEDLE, of length NEEDLE_LEN.
s/Peform/Perform/
> +static void *
> +two_way_short_needle (const unsigned char *haystack, size_t haystack_len,
> + const unsigned char *needle, size_t needle_len)
> +{
[...]
> +
> + /* Perform the search. Each iteration compares the right half
> + first. */
> + if (memcmp (needle, needle + period, suffix) == 0)
> + {
In order to test the code in both blocks following this test, you may
want to also test with a periodic needle in test-memmem.c:
{
const char input[] = "ABC ABCDAB ABCDABCDABDE";
const char *result = memmem (input, strlen (input), "ABCDABCD", 8);
ASSERT (result == input + 11);
}
This still won't give you full code coverage (but for the other corner
cases I'd probably need to go and read the description of the
algorithm ;-)
[...]
> diff --git a/tests/test-memmem.c b/tests/test-memmem.c
> index 976f293..e900e1c 100644
> --- a/tests/test-memmem.c
> +++ b/tests/test-memmem.c
> @@ -152,5 +152,32 @@ main (int argc, char *argv[])
> free (haystack);
> }
>
> + /* Check that a some long needles not present in a haystack can be
s/a some //
> + handled with sublinear speed. */
Nice work, BTW!
Cheers,
Ralf