Re: [Chicken-hackers] [PATCH] Adding iset to core and using it for Unico

On Mon, Feb 4, 2013 at 12:17 AM, Peter Bex <address@hidden> wrote:

On Mon, Feb 04, 2013 at 12:02:44AM +0900, Alex Shinn wrote:
> iset manages integer sets. The srfi-14 proposed
> is a thin wrapper around iset, first translating chars to
> integers. We could alternately remove the integer
> interface and just use chars for everything.

That would require an entirely different implementation, wouldn't it?
At least, iset being based on bit-vectors wouldn't be able to
store characters as-is.

Right now we have:

(define (iset-contains? iset n)

(... <lookup> iset n ...))

(define (char-set-contains? cset ch)

(iset-contains? cset (char->integer ch)))

It's not a big change, but we could remove the wrapper:

(define (char-set-contains? cset ch)

(.. <loookup> cset (char->integer ch) ...))

Not much actual code is removed, but we do get rid of

a whole unit and its exported bindings.

> > Why did you chose to import the entire iset egg? It would
> > make more sense to migrate the cset stuff from irregex into
> > srfi-14 and change irregex to use srfi-14, wouldn't it?
> > This would mean less code, overall.
> >
>
> That's why I asked in advance what people wanted
> (is lib size or runtime memory or speed more important?),
> and since the only response I got was "could you provide
> a patch" I sent what I think is the best option.

I guess most of us prefer speed and perhaps runtime memory,
but I just had no idea the code would be this large :)

> If we want to trim down the size, there's quite a lot that
> can be removed from iset. Almost all of the bit-vector
> operations can be removed, and because of the odd way
> I defined those I believe that's taking up most of the
> space.

I think if we need to have most of iset in core, it would
make sense to embrace it and export the interface so that
users can use it for their own code as well. I'm less sure
about the bit-vector API, so I think it would be great if
that could be boiled down to a minimum.

I had a look at it but didn't immediately see a lot that could
be removed. Every bit-vector operation was eventually used somewhere
deep in some procedure that made its way into an iset procedure.
Maybe this is just my unfamiliarity with the iset code.

Of the boolean ops, only bit-vector-ior is needed.

The optimization code is also not needed. Trimming

the obvious things, the result was down to 70k larger

than 4.8.0.1 (haven't checked what other diffs that

includes). I suspect merging the units would bring

this under 50k. I'm not sure how much would be

reclaimed by using this for irregex. There are tricks

to bum the code for size, but I don't have a lot of

time to spend on this.

SRFI-14 alas is a huge API - we can't really make it

_small_.

Alex

From:	Alex Shinn
Subject:	Re: [Chicken-hackers] [PATCH] Adding iset to core and using it for Unicode-capable SRFI-14.
Date:	Mon, 4 Feb 2013 00:44:27 +0900