[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Any faster way to find frequency of words?
From: |
Jean Louis |
Subject: |
Re: Any faster way to find frequency of words? |
Date: |
Sun, 9 May 2021 20:16:42 +0300 |
User-agent: |
Mutt/2.0.6 (2021-03-06) |
* Eric Abrahamsen <eric@ericabrahamsen.net> [2021-05-09 17:57]:
> Jean Louis <bugs@gnu.support> writes:
>
> > I am interested if there is some better way for Emacs Lisp to find
> > frequency of words.
> >
> > Purpose is to create HTML clickable tag clouds similar to image tag
> > clouds. But I will invoke Perl from Emacs to generate it. For that, I
> > have to analyze the text first.
>
> Is there any particular improvement you're trying to make?
I am invoking Perl on the fly and producing clickable HTML tag
cloud. It would be boring and tiresome to re-write Perl's module into
Emacs Lisp, though useful. For now, I rather just do it on the fly.
As HTML tags are created from text, I need nothing but alphabetical
characters. Function is invoked rarely.
It is also useful to generate tags for particular text, that helps me
to curate WWW pages.
> I guess I'd suggest using Emacs syntax parsing functions, ie
> `forward-word' and `buffer-substring'. Then you can fine tune the
> definition of words using the local syntax table.
That is also interesting approach, it could just go over the words and
enter them into list.
> > (mapc (lambda (word)
> > (when (> (length word) 2)
> > (let ((word (downcase word)))
> > (if (numberp (gethash word hash))
> > (puthash word (1+ (gethash word hash)) hash)
> > (puthash word 1 hash)))))
>
> While hash tables are probably best for very large texts, alists are
> nice because you can use place-setting with a default, simplifying the
> above to:
>
> (cl-incf (alist-get word frequency-alist 0 nil #'equal))
The idea gave me idea to use the defaults from hashes, so I have made
it now as below (puthash word (1+ (gethash word hash 0)) hash), that
is result of brain storming here...
(defun rcd-word-frequency (text &optional length)
"Returns word frequency as hash from TEXT.
Words smaller than LENGTH are discarded from counting."
(let* ((hash (make-hash-table :test 'equal))
(text (text-alphabetic-only text))
(length (or length 3))
(words (split-string text " " t " "))
(words (mapcar 'downcase words))
(words (mapcar (lambda (word) (when (> (length word) length) word))
words))
(words (delq nil words)))
(mapc (lambda (word)
(puthash word (1+ (gethash word hash 0)) hash))
words)
hash))
I am not sure if I should rather collect it into alist. Maybe I could
collect it straight into by frequency ordered list like:
(("word" 9) ("another" 7) ("more" 3))
That is what I am doing here, to construct string of most frequent tags:
(defun rcd-word-frequency-string (text &optional length how-many-words)
(let* ((words (rcd-word-frequency text length))
(words (hash-to-list words))
(number (or how-many-words 20))
(frequent (seq-sort (lambda (a b)
(> (cadr a) (cadr b)))
words)))
(mapconcat (lambda (a) (car a)) (butlast frequent (- (length frequent)
number)) " ")))
(rcd-word-frequency-string text nil 5) ⇒ "consectetur ipsum amet maecenas
congue"
--
Jean
Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns
Sign an open letter in support of Richard M. Stallman
https://stallmansupport.org/
https://rms-support-letter.github.io/
- Any faster way to find frequency of words?, Jean Louis, 2021/05/09
- Re: Any faster way to find frequency of words?, Eric Abrahamsen, 2021/05/09
- Re: Any faster way to find frequency of words?, Emanuel Berg, 2021/05/09
- Re: Any faster way to find frequency of words?,
Jean Louis <=
- Re: Any faster way to find frequency of words?, Eric Abrahamsen, 2021/05/09
- Re: Any faster way to find frequency of words?, Jean Louis, 2021/05/10
- RE: [External] : Re: Any faster way to find frequency of words?, Drew Adams, 2021/05/10
- Re: [External] : Re: Any faster way to find frequency of words?, Jean Louis, 2021/05/10
- RE: [External] : Re: Any faster way to find frequency of words?, Drew Adams, 2021/05/10
- Re: [External] : Re: Any faster way to find frequency of words?, Jean Louis, 2021/05/10
Re: Any faster way to find frequency of words?, Emanuel Berg, 2021/05/09