help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most used words in current buffer


From: Udyant Wig
Subject: Re: Most used words in current buffer
Date: Fri, 20 Jul 2018 18:48:39 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 07/20/2018 02:12 AM, Bob Proulx wrote:
> I don't know if the AVL package you used was implemented in elisp or
> in C or otherwise.  And even though I am a long time user of emacs I
> have never acquired the elisp skill to the same level as other
> languages and therefore can't comment on that part.  But I know that
> when people have implemented such data structures in Perl that the
> result has never been as fast and efficient as in a native C version.
> If so then that may easily account for performance differences.  And
> also the native implementation of "hashes" in awk, perl, python, ruby
> is quite optimized and very fast.  They have had years of eyes and
> tweaking upon them.

The AVL tree package comes with Emacs.  It bears a creation date of 10
May 1991 and copyright dates of 1995, and 2007-2015 on my system.

Now, it could be just that I have used trees in an ignorant way, or it
could be that it is as fast as it can go.  I hope it is the former as
that is something I can feasibly affect.

My issue in this general problem (of counting words) was this: I have to
store and operate on one kind of data during insertion (strings), and
another kind of data during retrieval (numbers); the keys are different
in the two cases.  I could not think of a way to do this nicely with a
tree.

> Writing clear code that can be understood immediately by the entire
> range of programmer skill is important in my not so humble opinion.
> One shouldn't need to be a master experienced programmer to understand
> what has been written.  Therefore I usually use 'head' specifically
> for the clarity of it to everyone.  Seeing "head -n40" is not going to
> confuse anyone.  Therefore I usually use it instead of "sed 40q" even
> though I could remove 'head' entirely from my system if I were to
> uniformly implement one in terms of the other.  Clarity is more
> important.
>
> And before someone mentions performance let me remind that we are
> talking shell scripts.  In a shell script clarity is more important
> than performance.  Always.  If the resulting shell script results in a
> performance problem than choosing a better algorithm will almost
> certainly be the better solution.  And if not than then choosing a
> different language more efficient at the task is next.

These are two paragraphs worth recollecting every so often.  Thank you.

> I do expect some skill to be learned with 'awk' however.  It is so
> very useful that seeing "awk '{print$1}' should not be that confusing
> that it is printing the first field column.  Or that '{print$NF}' is a
> common idiom for printing the last field.  (NF is the Number of Fields
> in the line that was split by whitespace.  $NF is therefore the last
> field.  If NF is 5 then $NF is saying $5 and therefore always the last
> field of the line.)  A little bit of awk learning pays back a large
> return on the investment.

Indeed.

> Bob

Udyant Wig
-- 
We make our discoveries through our mistakes: we watch one another's
success: and where there is freedom to experiment there is hope to
improve.
                                -- Arthur Quiller-Couch



reply via email to

[Prev in Thread] Current Thread [Next in Thread]