[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Koha-devel] MARC import issues
From: |
Stephen Hedges |
Subject: |
[Koha-devel] MARC import issues |
Date: |
Sun Aug 10 14:24:17 2003 |
I would like to update all the Koha developers on two issues that have come up
as NPL has been migrating to Koha. Both relate to the difference between the
way Koha stores bibliographic information and the way MARC records (USMARC)
structures the same information.
First a reminder of what those differences are. Koha subdivides bibliographic
information into three tables (basically): biblio, biblioitems, and items.
Biblio holds the basic information about the work -- title and author,
copyright date, that sort of thing. Biblioitems holds information about a
particular manifestation of the work -- item type, classification, actual
publication year (which may be different from the copyright year), etc. And
items stores information about individual copies of the particular
manifestation of the work -- price, barcode number, date acquired, etc. MARC,
on the other hand, currently subdivides bibliographic data into two area:
"Bibliographic," which holds the information that Koha would put in biblio and
(some) biblioitems, and "Holdings," which has the information that Koha would
put in items and (some) biblioitems. It's the process of fitting two-part
information into a three-part database that leads to complications.
1. The process of importing MARC records results in one and only one row in
biblioitems for each row in biblio. That's because MARC makes an individual
record for each manifestation of a work, so when you import a MARC record you
are really importing each record into biblioitems, with a related row in biblio
to hold the information that cannot be mapped to biblioitems (author, title,
etc.). That, of course, is exactly backwards from the way Koha is designed to
work. The import works OK, but when you do a search, you get lots of duplicate
titles listed, because each printing or video or audio recording of a work has
its own row in biblio. That makes it very hard to decide which title listing
you want to look at more closely. Which "Gone with the wind" is the audio
recording?
MARC handles this problem by providing tag 245h for the "Medium" of the work,
surrounded by square brackets. (The title itself is in 245a.) Many MARC-based
library systems display this tag when showing the results of a search, so you
know that "Gone with the wind [audio recording]" is different from "Gone with
the wind [videorecording(DVD)]" or "Gone with the wind."
There are two solutions I can think of for this problem, neither of them very
satisfactory. One is to add a column to biblio to hold "medium" for the 245h
tag. That, of course, violates the whole philosophy of what the biblio table
is supposed to store. The other is to actually modify the title that is stored
in biblio. That's the workaround solution we are using at NPL. We
periodically run a crude but efficient script that handles the job:
my $sth_getformat = $dbh->prepare("SELECT bibid,subfieldvalue FROM
marc_subfield_table WHERE tag = '245' and subfieldcode = 'h'");
my $sth_gettitle = $dbh->prepare("SELECT title FROM biblio WHERE biblionumber =
?");
my $sth_put = $dbh->prepare("UPDATE biblio SET title = ? WHERE biblionumber =
?");
$sth_getformat->execute();
my $row;
while ($row = $sth_getformat->fetchrow_arrayref) {
my $bibid = $row->[0];
my $subfieldvalue = $row->[1];
$sth_gettitle->execute($bibid);
my $titleref = $sth_gettitle->fetchrow_arrayref;
my $title = $titleref->[0];
$sth_gettitle->finish;
$subfieldvalue =~ /.+]/;
my $newtitle = "$title $&";
$sth_put->execute($newtitle,$bibid);
}
Could something similar be included as part of the MARC import process? It's
not elegant, but it does solve the problem. Or better yet, can anyone think of
a way to combine duplicate biblio rows into one biblio row? (Seems like this
would really screw up the relationships between tables.)
2. While Koha stores the copyright date in biblio and the publication year in
biblioitems, MARC puts both in one tag (260c), which of course can only be
mapped to one Koha table.column. So currently the library importing their MARC
records into Koha has to decide which Koha table.column to fill, and then the
Koha Biblio.pm strips out the first date found in the 260c tag and puts it
there. This is not good, because: a) either the screens which display
biblio.copyrightdate or the screens which display biblioitems.publicationyear
are going to have nothing to display; b) we're losing information in the
import which could easily be retrieved; and c) it leads to inaccurate
information, because (in the US) if 260c has two dates, the first is always the
publication year and the second is the copyright date, and the current Koha
solution could end up putting the publication year in the copyright date
column.
Again, we periodically run a (crude) script at NPL to load both table.columns:
my $sth_get = $dbh->prepare("SELECT bibid,subfieldvalue FROM
marc_subfield_table WHERE tag = '260' and subfieldcode = 'c'");
my $sth_cprdate = $dbh->prepare("UPDATE biblio SET copyrightdate = ? WHERE
biblionumber = ?");
my $sth_pubdate = $dbh->prepare("UPDATE biblioitems SET publicationyear = ?
WHERE biblionumber = ?");
$sth_get->execute();
my $row;
while ($row = $sth_get->fetchrow_arrayref) {
my $bibid = $row->[0];
my $subfieldvalue = $row->[1];
if (length $subfieldvalue > 8) { # if it is this long (even with extra
# letters and punctuation), it must be
# publication date, copyright date
$subfieldvalue =~ /(\d{4}).+?(\d{4})/;
$pubdate = $1;
$cprdate = $2;
} elsif ($subfieldvalue =~ /(\d{4})/) { # only one date
$pubdate = $1;
$cprdate = $1;
} else { # no dates
$pubdate = '';
$cprdate = '';
}
$sth_cprdate->execute($cprdate,$bibid);
$sth_pubdate->execute($pubdate,$bibid);
}
Again, could this somehow be worked into the MARC import process?
Stephen Hedges
Director, Nelsonville Public Library
95 W. Washington St., Nels.,OH 45764
(740) 753-2118 fax (740) 753-3543
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Koha-devel] MARC import issues,
Stephen Hedges <=