koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] Re: [Koha-devel] playing with the zebra on cvs head


From: Mike Rylander
Subject: Re: [Koha-zebra] Re: [Koha-devel] playing with the zebra on cvs head
Date: Fri, 19 Aug 2005 11:51:17 +0000

On 8/19/05, Thomas D <address@hidden> wrote:
> If you were only using the leader, 000, that may be relatively trivial.
> However, even using 000 involves evaluating multiple values from positions
> 06 and 07 to determine whether printed material is a book or a serial.  I
> was working on some little Perl code a week or so ago to partially
> demonstrate the media type issue for your question from MARC 21
> bibliographic 000/06 /07 into 008, 006, 007, 245 $h $k, 300 $a $b $c $e
> depending upon the availability of the fields and subfields and how they are
> populated.  The UNIMARC equivalent would be 000/06 /07, 100, 105-140, 200
> $b, and 215.  The issue is ultimately much too complex for a simple test
> against indexes across all relevant fields and subfields at query time.
> 
> The media type issue becomes complex very quickly unless you only ever had
> one mechanised cataloguer creating all original records in a uniform manner,
> and then the issue would be a little less complex.

[snip]

> This is MARC, if you think something is simple; then you have not looked
> closely enough, considered carefully enough, or examined a sufficiently
> diverse set of records.  MARC is too complex for many purposes and not
> thorough enough for some desirable purposes.

This is actually this issue that I was getting at.  The problem isn't
indexing specific substrings from the leader or from 008, it's the
fact that in order to look up datapoint C you need to investigate the
existence and value of datapoints A and B.  As a simple example, there
are four positions that are involved, depending on format, in
discovering Form of Item (
http://www.oclc.org/bibformats/en/fixedfield/form.shtm ).  But worse
than that, at least one of those positions is used for something
different ( http://www.oclc.org/bibformats/en/fixedfield/conf.shtm )
depending on Type and Bibliographic Level.

The only assumption I'm comfortable making right now is that, as
Thomas suggested, you would preprocess all the records to give them a
local use field that would store simplified versions of this
information.  If that's the case, my worry would be more with the
speed of inserts and updates than with searches.  Of course, the
overhead of this preprocessing most likely wouldn't be noticeable and
would be swamped by the standard Zebra insert and update overhead, so
it's not as much of a concern.

Mike (Taylor), I know you're a busy man ;), but I would appreciate any
details you could give on this matter whenever you /do/ have time,
mostly to satisfy my own curiosity. TIA :)

> 
> The question then becomes will Zebra index any arbitrary set of data chosen
> for use in local use fields for retrieving in a precise manner?
> 

I think that using excact, rather than fulltext/stemmed/fuzzy,
matching, the answer to that would be "yes."  But please correct me if
I'm wrong.

> 
> Thomas D

-- 
Mike Rylander
address@hidden
GPLS -- PINES Development
Database Developer
http://open-ils.org

> 
> Quoting Mike Taylor <address@hidden> :
> > ---------------- Beginning of the original message ------------------
> >
> > > Date: Thu, 18 Aug 2005 02:21:09 +0000
> > > From: Mike Rylander <address@hidden>
> > >
> > > Along these same lines, will the Zebra index be able to
> > filter on
> > > parts of the MARC record's fixed fields?  As an example, one
> > of the
> > > search requirements for Evergreen is to be able to limit a
> > search to
> > > just books or just video recordings (type of record), or
> > even to
> > > large print books (form of item).
> >
> > Have no fear -- this won't be a problem.
> >
> > No time to go into details now, but you can make plans on the
> > assumption that this kind of thing will work just fine.
> >
> >  _/|_
> > ___________________________________________________________________
> > /o ) \/  Mike Taylor  <address@hidden>
> > http://www.miketaylor.org.uk
> > )_v__/\  "Politicians, ad agencies, and other liars are prone
> > to using
> >        high-sounding, low-content, prose to back their points.
> > Heck,
> >        if people really understood what they were saying, they
> > might
> >        be in big trouble" -- Rheal Nadeau.
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference &
> > EXPO
> > September 19-22, 2005 * San Francisco, CA * Development
> > Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams *
> > Testing & QA
> > Security * Process Improvement & Measurement *
> > http://www.sqe.com/bsce5sf
> > _______________________________________________
> > Koha-zebra mailing list
> > address@hidden
> > https://lists.sourceforge.net/lists/listinfo/koha-zebra
> >
> > ------------------- End of the original message ---------------------
> 
> 
> 
> 
> ---------------------------------------------
> Protect your mails from viruses thanks to Alinto Premium services 
> http://www.alinto.com
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]