koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-devel] dumpdict => something to investigate...


From: Paul POULAIN
Subject: [Koha-devel] dumpdict => something to investigate...
Date: Fri, 13 Oct 2006 15:04:56 +0200
User-agent: Thunderbird 1.5.0.7 (X11/20060909)

I'm copying a mail from Eric Lease Morgan, about a dictionnary coming from zebra. Sounds like an idea to investigate later...


Eric Lease Morgan a écrit :

By exploiting zebraidx's dumpdict option I have been able to create an Aspell dictionary and accompanying lookup script paving the way for a Did You Mean? (alternative spelling) service against my zebra indexes. It is not perfect but a decent start.

First is zebra2aspell.pl:

  #!/usr/bin/perl

  # zebra2aspell.pl - create an Aspell dictionary from a zebra index

  # Eric Lease Morgan <address@hidden>
  # October 5, 2006

  # require
  use strict;

  # define the zebraidx and aspell binaries
  my $ZEBRAIDX  = '/usr/local/bin/zebraidx dumpdict';
my $ASPELL = '/usr/local/bin/aspell --lang=en create master /home/emorgan/idzebra-2.0.2/examples/marc21/aspell.dict';

  # initialize input and output words
  my @words;

  # get the list of words from the index
  open INPUT, "$ZEBRAIDX |";
  while ( <INPUT> ) {

      chop;                      # get rid of trailing return
      next if ( ! /^\d\d:\s/ );  # only look for word lines, not debugging
      s/^\d\d:\s+\d+\s//;        # remove "leader"
      s/\s-\d.*$//;              # remove "trailer"
next if ( / / ); # no words containing spaces; why do they exist?
      next if ( /\W/ );          # no non-word characters
      next if ( /\d/);           # no words containing digits
      next if ( ! $_ );          # has content
      push @words, $_;
}
  close INPUT;

  # remove duplicates; from perl cookbook pg. 102
  my %seen = ();
  my @words = grep { ! $seen{$_} ++ } @words;

  # build a list aspell can use
  my $words;
  foreach ( @words ) { $words .= $_ . "\n" }

  # create a dictionary
  open OUTPUT, "| $ASPELL";
  print OUTPUT $words;
  close OUTPUT;

  # done
  exit;


Next is lookup.pl. Usage: ./lookup.pl foobar

  #!/usr/bin/perl

  # lookup.pl - look up a word in a aspell dictionary
  #             and return alternative spellings

  # Eric Lease Morgan <address@hidden>
  # October 5, 2006


  # require
  use Text::Aspell;
  use strict;

  # define
  use constant DICTIONARY => './aspell.dict';

  # get the query
  my $query = $ARGV[0];

  # branch accordingly
  if ( ! $query ) { print "Usage: $0 word\n" }
  else {
# initalize dictionary
      my $dictionary = Text::Aspell->new;
      $dictionary->set_option( 'master', DICTIONARY );
# get suggestions
      my @suggestions = $dictionary->suggest( $query );

      # display the suggestions
      print "Alternative spellings for $query:\n";
      foreach ( @suggestions ) { print $_, "\n" }
}

  # done
  exit;


--Eric "It Feels Good To Hack Again" Morgan
University Libraries of Notre Dame




_______________________________________________
Zebralist mailing list
address@hidden
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist




--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)
Tel : 04 91 31 45 19




reply via email to

[Prev in Thread] Current Thread [Next in Thread]