By exploiting zebraidx's dumpdict option I have been able to create an
Aspell dictionary and accompanying lookup script paving the way for a
Did You Mean? (alternative spelling) service against my zebra indexes.
It is not perfect but a decent start.
First is zebra2aspell.pl:
#!/usr/bin/perl
# zebra2aspell.pl - create an Aspell dictionary from a zebra index
# Eric Lease Morgan <address@hidden>
# October 5, 2006
# require
use strict;
# define the zebraidx and aspell binaries
my $ZEBRAIDX = '/usr/local/bin/zebraidx dumpdict';
my $ASPELL = '/usr/local/bin/aspell --lang=en create master
/home/emorgan/idzebra-2.0.2/examples/marc21/aspell.dict';
# initialize input and output words
my @words;
# get the list of words from the index
open INPUT, "$ZEBRAIDX |";
while ( <INPUT> ) {
chop; # get rid of trailing return
next if ( ! /^\d\d:\s/ ); # only look for word lines, not debugging
s/^\d\d:\s+\d+\s//; # remove "leader"
s/\s-\d.*$//; # remove "trailer"
next if ( / / ); # no words containing spaces; why do
they exist?
next if ( /\W/ ); # no non-word characters
next if ( /\d/); # no words containing digits
next if ( ! $_ ); # has content
push @words, $_;
}
close INPUT;
# remove duplicates; from perl cookbook pg. 102
my %seen = ();
my @words = grep { ! $seen{$_} ++ } @words;
# build a list aspell can use
my $words;
foreach ( @words ) { $words .= $_ . "\n" }
# create a dictionary
open OUTPUT, "| $ASPELL";
print OUTPUT $words;
close OUTPUT;
# done
exit;
Next is lookup.pl. Usage: ./lookup.pl foobar
#!/usr/bin/perl
# lookup.pl - look up a word in a aspell dictionary
# and return alternative spellings
# Eric Lease Morgan <address@hidden>
# October 5, 2006
# require
use Text::Aspell;
use strict;
# define
use constant DICTIONARY => './aspell.dict';
# get the query
my $query = $ARGV[0];
# branch accordingly
if ( ! $query ) { print "Usage: $0 word\n" }
else {
# initalize dictionary
my $dictionary = Text::Aspell->new;
$dictionary->set_option( 'master', DICTIONARY );
# get suggestions
my @suggestions = $dictionary->suggest( $query );
# display the suggestions
print "Alternative spellings for $query:\n";
foreach ( @suggestions ) { print $_, "\n" }
}
# done
exit;
--Eric "It Feels Good To Hack Again" Morgan
University Libraries of Notre Dame
_______________________________________________
Zebralist mailing list
address@hidden
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist