help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to generate a wordlist for a document


From: Arnaldo Mandel
Subject: Re: How to generate a wordlist for a document
Date: Tue, 16 Aug 2011 17:27:44 -0300

On Tue, Aug 16, 2011 at 1:45 PM, Andreas Röhler <andreas.roehler@easy-emacs.de> wrote:
Am 15.08.2011 23:20, schrieb Thorsten:

Hi list,
how do I generate a word list for a document in Emacs (in my case a
multi-file LaTex document)?
(With wordlist I mean a list with all unique words in the document)
Thanks for any hints
Thorsten

Hi,

would export first all into plain texts.

Put all into one file.

than inside Emacs

you could use something like that:

(defun wordlist (&optional beg end)
[...]

This is a bit too simplistic.  For instance, it would list words inside comments, macro parameters, environment names.  Things can get really complicated.

There is a perl script called texcount, which is part of many TeX distributions.  It embodies a lot of LaTeX knowledge into deciding what is and what is not a word, and its sheer size shows the difficulty of the problem.  As the name says, the program count words.  However, with option -v1 it outputs a "cleaned-up" version of the text, tagged with ansi color codes.  That seems to be amenable to processing by a code similar to what you propose - with a more complex underlying automaton.

Still, it also depends on what Thorsten's concept of word is, in his question.  For instance, texcount reports

\documentclass{article}
\begin{document}
\textsc{w}o\emph{r}\texttt{d}.
\end{document}

as containing 4 words; it can be reasonably construed as a one word text.

Arnaldo


reply via email to

[Prev in Thread] Current Thread [Next in Thread]