bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Makes sort create random order


From: Frederik Eaton
Subject: Re: [PATCH] Makes sort create random order
Date: Thu, 27 Jan 2005 00:46:07 -0800
User-agent: Mutt/1.5.6+20040907i

This is in response to a mail I found in the archive from "Wed, 25 Aug
2004 16:14:15 +0200" that adds an -R option to sort randomly. I've
been seeking to add such functionality to sort for a while.

1. It looks like this isn't in 5.2.1 yet, has something been added to
cvs? Is something planned? Is somebody working on a patch?

2. Was this patch actually tested? It doesn't work for me, e.g.:

$ sort -R < /usr/share/dict/words | tail -n 15
more
palming
palindromes
muting
mutilation
palindrome's
mopes
palest
mister
palette
pall
moribund
mooted
morbid
mordant

I tried modifying sort in this way a long time ago, by having the key
comparison functions return random bits, and found that it gave
similar results. The reason is that gnu 'sort' uses mergesort - but
problems may arise with other algorithms as well. When you're merging
two lists, if you pick elements randomly from one list or the other
with equal probability, you'll find that after 'n' picks you'll be on
average sqrt(n) farther into one list or the other, so when you finish
one list, you still have about sqrt(n) leftover elements in the other
list. These end up consecutive in your result. After several merges,
the cumulative effect is that the tail of your list is not very well
sorted.

A better algorithm would be as follows: combine each key you want to
randomize with some salt (your "random seed"), and compute its hash.
(By the way, it would be best to be able to randomize on a per-key
basis) Then compare as you would other keys. It is hard to see how
this could give incorrect results. Furthermore, it has other desired
properties: reproducibility across systems, no dependence on libc's
rand(). Would the maintainers be receptive to a patch which
accomplished this if it followed the style guidelines etc.? I think a
feature like this would be quite useful to all manner of users, from
those involved in statistical investigations to those who want to
randomize their mp3's without installing a separate piece of software.

3. Is there a way to subscribe to this mailing list?

Thanks,

Frederik Eaton




reply via email to

[Prev in Thread] Current Thread [Next in Thread]