help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most used words in current buffer


From: Eric Abrahamsen
Subject: Re: Most used words in current buffer
Date: Sat, 21 Jul 2018 20:57:45 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Udyant Wig <udyantw@gmail.com> writes:

> On 07/21/2018 09:45 PM, Eric Abrahamsen wrote:
>> Interesting... In general I think Emacs is highly optimized to use the
>> buffer as its textual data structure, more so than a string.
>> Particularly when the code is compiled (many of the text-movement
>> commands have opcodes). I made the following two commands to collect
>> words from a novel in an Org file, and the one that uses
>> `forward-word' and `buffer-substring' runs around twice as fast as the
>> `split-string'.
>>
>> Of course, they don't collect the same list of words! But even if you
>> add more code for trimming, etc., it will still likely be faster than
>> operating on a string.
>> [snip code]
>
> I have acted upon the advice (yours and Stefan Monnier's) to operate on
> the buffer directly using BUFFER-SUBSTRING.  Please see my follow up to
> Stefan's message.
>
> BUFFER-SUBSTRING did gain me (somewhat) better performance.

As Stefan said, going character by character is going to be slow... But
my example with `forward-word' collects a lot of cruft. So I would
suggest doing what `forward-word' does internally and move by syntax.
This also opens up the possibility of tweaking the behavior of your
function (ie, what constitutes a word) by setting temporary syntax
tables. Here's a word scanner that only picks up actual words (according
to the default syntax table):

(defun test-buffer (&optional f)
  (let ((file (or f "/home/eric/org/hollowmountain.org"))
        pnt lst)
    (with-temp-buffer
      (insert-file-contents file)
      (goto-char (point-min))
      (skip-syntax-forward "^w")
      (setq pnt (point))
      (while (and (null (eobp)) (skip-syntax-forward "w"))
        (push (buffer-substring pnt (point)) lst)
        (skip-syntax-forward "^w")
        (setq pnt (point))))
    (nreverse lst)))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]