mifluz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Mifluz-dev] Question re. word data


From: Steven J. DeRose
Subject: [Mifluz-dev] Question re. word data
Date: Sun, 27 Jan 2002 15:40:46 -0500

I've been looking at the MiFluz doc, and it looks quite nice. I'm wondering what it would take to enhance it to support storing the scope or extent of a token, rather than just a single integer

The goal for me would be to make it able to do containment and structure queries on hierarchical data, particularly mbox files and XML. By storing a "word" record with start and end offset for each XML element, or each MIME mail message (and maybe each MIME header line), mifluz could find words only when they occur in a certain context.

A quick look suggests this would mainly involve adding a new record type besides DATA and STRING, and adding the necessary APIs to get them in, found, and out.

I've built systems like this before that scaled into the 100s of MB per document, so I know most of the general constraints, but I don't know anything about the internals of Mifluz (yet). Does this sound like a feasible enhancement and approach?

Any advice appreciated.

Thanks!

--
Steve DeRose



reply via email to

[Prev in Thread] Current Thread [Next in Thread]