[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-patch-tracker] [patch #8426] Speeding up the edit distance
From: |
Max G. |
Subject: |
[Octave-patch-tracker] [patch #8426] Speeding up the edit distance |
Date: |
Fri, 28 Mar 2014 17:06:16 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/33.0.1750.152 Chrome/33.0.1750.152 Safari/537.36 |
URL:
<http://savannah.gnu.org/patch/?8426>
Summary: Speeding up the edit distance
Project: GNU Octave
Submitted by: maxg
Submitted on: Fr 28 Mär 2014 17:06:15 GMT
Category: None
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Email:
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
Calculating the Levenshtein distance provided by the package "strings" is a
pain.
It is unbelievable slow, not only because it uses the algorithm of Wagner and
Fisher, but also because it does this using loops. That leads to calculation
times of 10 seconds for simple strings of length 200 and wastes plenty of
memory. See for example http://savannah.gnu.org/bugs/?41920 .
Therefore I implemented the algorithm of Berghel and Roach
(http://www.berghel.net/publications/asm/asm.php), who improve on Ukkonen
(http://www.sciencedirect.com/science/article/pii/S0019995885800462).
Great things first: Now it is possible to calculate
editdistance(repmat('AB',1,20000),repmat('BA',1,20000)
in less than 2 seconds.
The new version is fully compatible with the former version, meaning that
(i) it calculates a specific matrix when asked for
(ii) can apply some specific weights
That comes at the cost of
(i) a new optional input parameter
(ii) not being able to use the new algorithm all the time.
Depending on its preferences, the user now can decide to
(i) use the former algorithm
(ii) use the algorithm of Berghel and Roach, which is almost always better but
may have space issues as well
(iii) use a version with m*n runtime but linear memory usage.
I tested the correctness of the new algorithms by running the provided script
uncountable times. Please note, that this produces test which are almost worst
case for the algorithm by Berghel and Roach.
Please note further, that there exists a version of the algorithm used here,
that uses only O(dist) memory
(https://code.google.com/p/google-web-toolkit/source/browse/trunk/dev/core/src/com/google/gwt/dev/util/editdistance/ModifiedBerghelRoachEditDistance.java?r=8928).
However, I do not have the time to implement that one.
I really hope this helps and will be applied.
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Fr 28 Mär 2014 17:06:15 GMT Name: editdistance.m Size: 6kB By: maxg
<http://savannah.gnu.org/patch/download.php?file_id=31070>
-------------------------------------------------------
Date: Fr 28 Mär 2014 17:06:15 GMT Name: testEditDistance.m Size: 614B By:
maxg
<http://savannah.gnu.org/patch/download.php?file_id=31071>
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/patch/?8426>
_______________________________________________
Nachricht gesendet von/durch Savannah
http://savannah.gnu.org/
- [Octave-patch-tracker] [patch #8426] Speeding up the edit distance,
Max G. <=