help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Let ispell use use multiple dictionaries stored in different files.


From: Hongyi Zhao
Subject: Re: Let ispell use use multiple dictionaries stored in different files.
Date: Sun, 8 Aug 2021 12:16:38 +0800

On Sun, Aug 8, 2021 at 11:58 AM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > combine the wamerican-insane and Webster_s_Unabridged_3 into
> > one even bigger word list file, which includes 790592
> > entries at the moment:
> >
> > $ awk '!a[$0]++' /usr/share/dict/american-english-insane
> > Webster_s_Unabridged_3.txt |
> >    tee ~/american-english-insane-Webster_s_Unabridged | wc
> >  790592  892589 8649982
>
> Hm, is awk '!a[$0]++' faster than sort -u?

The above awk code keep the occurrences order of all words appeared in
the original word list files.

>   $ sort -u A B > AB
>   $ wc -l AB

But the sort command will sort them accordingly. Keep in mind that the
ispell and any autocompletion tools/frameworks needs the already well
sorted word list file for achieving affordable performance when the
word list file is huge, at least this is the case for Emacs's ispell
initialization stage. Out of this consideration, I don't want to
change the occurrences order of the words given in the original word
list files.

> Where did you find Webster_s_Unabridged_3.txt ?

This file is built by myself, and I have given the specific steps to
create it [1]:

$ sudo apt-get install python3-tk tix
# pyenv python environment for this operation:
$ pyenv shell datasci
$ pip install gobject PyGObject pyglossary
$ mkdir -p ~/.stardict/dic && cd $_
$ curl -O 
http://download.huzheng.org/bigdict/stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ tar xvf stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ cd stardict-Webster_s_Unabridged_3-2.4.2
$ pyglossary Webster_s_Unabridged_3.ifo Webster_s_Unabridged_3.csv
$ awk -F, 'NR > 8 {sub(/^["]/,"",$1);sub(/["]$/,"",$1);print $1}'
Webster_s_Unabridged_3.csv > Webster_s_Unabridged_3.txt

[1] 
https://github.com/company-mode/company-mode/issues/1146#issuecomment-886172208

Best regards
-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province



reply via email to

[Prev in Thread] Current Thread [Next in Thread]