[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Koha-zebra] Zebra and non-filing characters
From: |
Sebastian Hammer |
Subject: |
Re: [Koha-zebra] Zebra and non-filing characters |
Date: |
Thu, 29 Dec 2005 11:05:24 -0500 |
User-agent: |
Mozilla Thunderbird 1.0.7 (Macintosh/20050923) |
Paul POULAIN wrote:
Sebastian Hammer a écrit :
Joshua Ferraro wrote:
Hello everyone,
This is just generic question regarding Zebra's handling of
MARC non-filing characters. I know there is a 'stopwords'-like
function available using the 'map' directive:
map (^The\s) @
but I'm wondering whether Zebra is also capable of examining the
non-filing character specs within each MARC field to decide
whether to index or not to index ...
You mean using an indicator in the field to determine how many
characters to skip? To the best of my knowledge, this is not
supported at present, sorry.
Would really be a nice feature, at least for MARC-lover catalogers
(that still exists !)
What I don't like about that approach anyway is that it leaves it
ambiguous what happens when the user put a leading article into a
search term... I think yu'd be better off just configuring the system
to ignore the most common leading articles as described above.
pro : will work even if the cataloger forget to set the indicator &
makes them more and more useless.
con : MARC-lover catalogers will hate such a behaviour, because there
are few exceptions. I think i can assume the noise french catalogers
will make ;-)
But I think the issue with searching is pretty serious, though.. I've
been noticing lately a few Z39.50 servers that will return zero hits for
a full-field search if the user forgets (or doesn't know) to remove any
leading article himself. Now even for a MARC-fetishist, I think that is
just plain wrong. If you are going to eliminate leading articles from
searches, the least you can do is make it optional..
One way to do that with the dumb MARC21 character-skipping scheme would
be to generate two indexing entries for phrase indexes -- with and
without the offending leading article. That would fix searching, but it
would be a problem for sorting unless we were careful.
Browsing can also be a challenge.
My vote would be to start with the prefix-ignoring list, which in my
experience is enough to satisfy 99.9% of librarians, most of whom have
no clue about that feature of MARC21 anyway. Leave the other stuff as a
nice-to-have to be addressed at leisure at some point when we're
re-examining that part of the indexing logic anyway.
--Sebastan
It is true that this would require separate configuration for
different languages, but you probably wouldn't get around that
anyway, since many non-English-speaking countries use other record
formats than MARC21, and the use of indicators to control indexing is
not universal.. the Danish MARC (cleverly named DANMARC) format, for
instance, use a special character inside of the subfields to mark the
part which should not be indexed.
In what is already developped in Koha 3.0, we will clearly have
UNIMARC-french, MARC21-english, and probably other MARC-language
flavours. So I agree with you.
Happy new year to everyone, with lot of free software & happiness !
--
Sebastian Hammer, Index Data
address@hidden www.indexdata.com
Ph: (603) 209-6853
- Re: [Koha-zebra] Incomplete CQL Support in Net-Z3950-ZOOM, (continued)
[Koha-zebra] Re: [Koha-devel] Perl-ZOOM is a GO!, Nathan Gray, 2005/12/20
[Koha-zebra] Zebra and non-filing characters, Joshua Ferraro, 2005/12/22