help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Any faster way to find frequency of words?


From: Jean Louis
Subject: Any faster way to find frequency of words?
Date: Sun, 09 May 2021 17:38:05 +0300

I am interested if there is some better way for Emacs Lisp to find
frequency of words.

Purpose is to create HTML clickable tag clouds similar to image tag
clouds. But I will invoke Perl from Emacs to generate it. For that, I
have to analyze the text first.

(setq text "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a 
diam
lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam
viverra nec consectetur ante hendrerit. Maecenas congue ligula ac quam
viverra nec consectetur ante hendrerit..")

(defun text-alphabetic-only (text)
  "Return alphabetic characters from TEXT."
  (replace-regexp-in-string "[^[:alpha:]]" " " text))

(defun word-frequency (text &optional length)
  "Returns word frequency as hash from TEXT."
  (let* ((hash (make-hash-table :test 'equal))
         (text (text-alphabetic-only text))
         (words (split-string text " " t " ")))
    (mapc (lambda (word)
            (when (> (length word) 2)
              (let ((word (downcase word)))
                (if (numberp (gethash word hash))
                    (puthash word (1+ (gethash word hash)) hash)
                  (puthash word 1 hash)))))
          words)
    hash))

(word-frequency text) ⇒ #s(hash-table size 65 test equal rehash-size 1.5 
rehash-threshold 0.8125 data ("lorem" 1 "ipsum" 2 "dolor" 1 "sit" 2 "amet" 2 
"consectetur" 3 "adipiscing" 1 "elit" 1 "donec" 1 "diam" 1 "lectus" 1 "sed" 1 
"mauris" 1 "maecenas" 2 "congue" 2 "ligula" 2 "quam" 2 "viverra" 2 "nec" 2 
"ante" 2 "hendrerit" 2))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]