bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Makes sort create random order


From: Paul Eggert
Subject: Re: [PATCH] Makes sort create random order
Date: Thu, 02 Sep 2004 22:23:30 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

address@hidden (Paul Jarc) writes:

>> Sort of, but not quite.
>
> I couldn't find the "not quite" part of your explanation.

Well, I tried.  :-)

>> "sort -rR" should output in the reverse order of "sort -R".
>
> Nit: they shouldn't expect that unless they also specify a seed.

Yes, of course.

> But sort -R can still provide this just by permuting the original
> input order, rather than the correct sort order.

I don't understand this claim.  If "sort -R" operates by permuting the
original input order, and then sorts the result, then it will generate
the same output as if it hadn't permuted anything (assuming there are
no ties).

> we do:
> $ sort -R A > B
> $ sort -R --seed=deadbeef A > A1
> $ sort -R --seed=deadbeef A > A2
> $ sort -R --seed=deadbeef B > B1
> $ sort -R --seed=deadbeef B > B2
>
> Then we should expect that A1 and A2 have the same contents, and that
> B1 and B2 have the same contents.  But the TODO requirement would also
> ensure that A1/A2 have the same contents as B1/B2.

Yes, assuming no ties.

> Is that really needed?

If it's not needed, then why is this relevant to "sort"?  You are
asking for a program that randomly permutes its input.  Then let's
design another program to do that, and not get bogged down with how
its features work together with "sort"'s existing zoo of options.

> I'm also not sure that clustering lines with equivalent sort keys is
> desirable.

Again, it depends on whether you want something relevant to the
collating order (i.e., a sort), or you want something that's
completely irrelevant (i.e., a permutation).  If the latter, then I
suspect we should be talking about a different tool.



>>>>> This means that two different files, that happen to sort to the
>>>>> same output, should give the same output when randomized with
>>>>> the same SEED. Is that right? [*]
>>>>     if you sort a permutation of the same input file
>>>>     with the same --random-seed=SEED option twice, you'll get the same
>>>>     output. [**]
>> If two  files sort  to the same  output, then they're  permutations of
>> each other.  So  [**] implies [*].  (The converse  does not hold.  See
>> what I mean about the logic being tricky here?...)
>
> No, I think [*] implies [**] only.  [*] is the more general case
> placing a requirement on all permutations of the same input; [**] is
> the special case where the two files are the same permutation of the
> same input.

Ah, OK, I think see the problem.  By [**] I meant that if you sort two
permutations of the same input file, and use the same random seed for
both sorts, you'll get the same output.  This is roughly the same as
[*], then.  I say "roughly" because it's not clear from either
statement what should be done with ties.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]