[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Most used words in current buffer
From: |
Udyant Wig |
Subject: |
Re: Most used words in current buffer |
Date: |
Sun, 22 Jul 2018 23:49:01 +0530 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 07/22/2018 09:27 AM, Eric Abrahamsen wrote:
> As Stefan said, going character by character is going to be
> slow... But my example with `forward-word' collects a lot of cruft. So
> I would suggest doing what `forward-word' does internally and move by
> syntax. This also opens up the possibility of tweaking the behavior
> of your function (ie, what constitutes a word) by setting temporary
> syntax tables. Here's a word scanner that only picks up actual words
> (according to the default syntax table):
>
> (defun test-buffer (&optional f)
> (let ((file (or f "/home/eric/org/hollowmountain.org"))
> pnt lst)
> (with-temp-buffer
> (insert-file-contents file)
> (goto-char (point-min))
> (skip-syntax-forward "^w")
> (setq pnt (point))
> (while (and (null (eobp)) (skip-syntax-forward "w"))
> (push (buffer-substring pnt (point)) lst)
> (skip-syntax-forward "^w")
> (setq pnt (point))))
> (nreverse lst)))
Thank you for the idea! It did wonders for the running time, a sample
of which I have put after the following adaption of your idea to the
code.
---
(defun buffer-most-used-words-4 (n)
"Make a list of the N most used words in buffer."
(let ((counts (make-hash-table :test #'equal))
sorted-counts
start
end)
(save-excursion
(goto-char (point-min))
(skip-syntax-forward "^w")
(setf start (point))
(cl-loop until (eobp)
do
(skip-syntax-forward "w")
(setf end (point))
(incf (gethash (buffer-substring start end) counts 0))
(skip-syntax-forward "^w")
(setf start (point))))
(cl-loop for word being the hash-keys of counts
using (hash-values count)
do
(push (list word count) sorted-counts)
finally (setf sorted-counts (cl-sort sorted-counts #'>
:key #'second)))
(mapcar #'first (cl-subseq sorted-counts 0 n))))
---
Compiled, this version takes about half the time the previous version --
going character by character -- took to process a 4.5 MB text file.
Average timing after ten runs on the above mentioned file: 2.75 seconds.
On syntax tables, the ability to determine what is a word or other
construct in a buffer could be very handy indeed. One application
beyond prose text that comes to mind could be to count the most used
variable or function in a file of source code. There might be others of
course.
Udyant Wig
--
We make our discoveries through our mistakes: we watch one another's
success: and where there is freedom to experiment there is hope to
improve.
-- Arthur Quiller-Couch
- Re: Most used words in current buffer, (continued)
- Re: Most used words in current buffer, Bob Proulx, 2018/07/19
- Re: Most used words in current buffer, Bob Newell, 2018/07/19
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/21
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Message not available
- Re: Most used words in current buffer,
Udyant Wig <=
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Re: Most used words in current buffer, Stefan Monnier, 2018/07/21
- Re: Most used words in current buffer, tomas, 2018/07/22
- Re: Most used words in current buffer, Bob Proulx, 2018/07/23
- Re: Most used words in current buffer, tomas, 2018/07/23
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/23
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Stefan Monnier, 2018/07/21
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22