|
From: | paul POULAIN |
Subject: | Re: [Koha-devel] marc_word and searching |
Date: | Wed May 26 07:40:30 2004 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.6) Gecko/20040115 |
Stephen Hedges a écrit :
At what point does marc_word become so big and clunky that it becomes a liability instead of an asset? NPL's marc-word file is full of 'junk' entries like "(pa." (picked up when an ISBN number has "(pa.)" after it to denote paperback) and other such MARC oddities. Our stopword file should ideally be expanded to catch all of this junk, but I haven't done that yet. Now we're talking about adding punctuation marks and single letters! I agree with Joshua that this is what should be done if we're going to depend on using marc_word and expect to get any meaningful search results. My question is: maybe it would be more efficient to just use marc_subfield_table for these searches and forget about marc_word?
you're right stephen...I have an other idea that could be coded quickly : in the MARC framework, we could add a checkbox called "do NOT index this subfield". If checked, the subfield wouldn't be stored in marc_word (but stored in marc_subfield_table)
(Needs a script to clean the DB too, should be quite easy : foreach subfield in marc_subfield_structure { if checkbox checked { delete from marc_word where subfield= this one } } ...) -- Paul POULAIN Consultant indépendant en logiciels libres responsable francophone de koha (SIGB libre http://www.koha-fr.org)
[Prev in Thread] | Current Thread | [Next in Thread] |