bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sort -R: a more-conservative approach


From: Paul Eggert
Subject: sort -R: a more-conservative approach
Date: Mon, 12 Dec 2005 14:52:39 -0800
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

After looking through the sort -R patch and noting the segfault it
caused on 64-bit systems
<http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00137.html>
I thought I'd back off to a more-conservative approach for now.

2005-12-12  Paul Eggert  <address@hidden>

        * Version 6.0-cvs.

        Install a more-conservative approach for sort -R.  It's the
        same basic idea as the existing code, except it uses the full ISAAC
        approach (called the "more kosher" approach in the existing comments).
        This makes "sort -R" quite a bit slower (about a factor of 2 on my
        little tests involving 10000 lines on a 2.4 GHz P4), but I think it's
        better to be conservative here at first, and review any performance
        improvements carefully.
        * .x-sc_require_config_h: Add src/rand-isaac.c.
        * src/rand-isaac.h: Remove.  All uses now simply include rand-isaac.c.
        * src/Makefile.am (noinst_HEADERS): Remove rand-isaac.h.
        (shred_SOURCES, sort_SOURCES): Remove.
        (EXTRA_DIST): Add rand-isaac.c.
        * src/rand-isaac.c: Revert to what used to be in shred.c, without
        changing it to allow for varying numbers of words in the state.
        Alter so that we include rand-isaac.c directly rather than
        compiling it and linking to it.  Don't include config.h or
        system.h; that's the includer's responsibility.
        Omit functions that are specific to shred.
        (ISAAC_LOG, ISAAC_WORDS, ISAAC_BYTES, struct isaac_state, ind):
        (isaac_step, struct irand_state):
        Resurrect these, with the same defns that used to be in shred.c.
        (ISAAC_SIZE, isaac_new, isaac_copy): Remove.
        (isaac_refill, isaac_seed_start, isaac_seed_data, irand_init, irand32):
        static again.
        (struct isaac_state, isaac_refill, isaac_mix, isaac_init):
        (isaac_seed_start, isaac_seed_data, isaac_seed_finish, isaac_seed):
        (irand_init, irand32, irand_mod):
        Number of words is constant again.
        (struct irand_state, irand_init, irand32, irand_mod): Move to shred.c.
        * src/shred.c: Include rand-isaac.c rather than rand-isaac.h.
        * src/sort.c: Likewise.
        * src/shred.c (fillrand, dopass, main): Undo previous change.
        (struct irand_state, irand_init, irand32, irand_mod): Moved back here,
        from rand-isaac.c.
        * src/sort.c: Don't include md5.h; it wasn't needed.
        (struct keyfield): Rename random_hash to random, for consistency
        with the other member names.  All uses changed.
        (usage): Tweak wording to mention STRING for --seed option.
        (short_options): Rorder for consistency with other programs.
        (rand_state): Now a struct, not a pointer to one.  All uses changed.
        (HASH_WORDS, HASH_SIZE): Remove.
        (get_hash): Remove comments around resbuf size, since we can assume C89.
        Use a "more-kosher" (but slower) approach of invoking isaac_refill.
        (keycompare): Adjust to the new get_hash.
        Add a FIXME.
        (badfieldspec): Omit recently-introduced comment; it isn't needed.
        (main): Don't set need_random simply because gkey has it set; that
        doesn't necessarily mean we'll need random numbers.
        Redo seeding to match new get_hash approach.

Index: .x-sc_require_config_h
===================================================================
RCS file: /fetish/cu/.x-sc_require_config_h,v
retrieving revision 1.2
retrieving revision 1.3
diff -p -u -r1.2 -r1.3
--- .x-sc_require_config_h      24 Nov 2005 09:04:34 -0000      1.2
+++ .x-sc_require_config_h      12 Dec 2005 22:41:42 -0000      1.3
@@ -22,4 +22,5 @@
 ^src/ls-dir\.c$
 ^src/ls-ls\.c$
 ^src/ls-vdir\.c$
+^src/rand-isaac\.c$
 ^src/tac-pipe\.c$
Index: src/Makefile.am
===================================================================
RCS file: /fetish/cu/src/Makefile.am,v
retrieving revision 1.61
retrieving revision 1.63
diff -p -u -r1.61 -r1.63
--- src/Makefile.am     10 Dec 2005 22:54:31 -0000      1.61
+++ src/Makefile.am     12 Dec 2005 22:42:37 -0000      1.63
@@ -40,13 +40,12 @@ noinst_HEADERS = \
   dircolors.h \
   fs.h \
   ls.h \
-  rand-isaac.h \
   remove.h \
   system.h \
   wheel-size.h \
   wheel.h
 
-EXTRA_DIST = dcgen dircolors.hin tac-pipe.c \
+EXTRA_DIST = dcgen dircolors.hin rand-isaac.c tac-pipe.c \
   groups.sh wheel-gen.pl extract-magic
 CLEANFILES = $(SCRIPTS) su
 
@@ -175,8 +174,6 @@ vdir_SOURCES = ls.c ls-vdir.c
 ls_SOURCES = ls.c ls-ls.c
 chown_SOURCES = chown.c chown-core.c
 chgrp_SOURCES = chgrp.c chown-core.c
-shred_SOURCES = shred.c rand-isaac.c
-sort_SOURCES = sort.c rand-isaac.c
 
 mv_SOURCES = mv.c copy.c cp-hash.c remove.c
 rm_SOURCES = rm.c remove.c
Index: src/rand-isaac.c
===================================================================
RCS file: /fetish/cu/src/rand-isaac.c,v
retrieving revision 1.4
retrieving revision 1.6
diff -p -u -r1.4 -r1.6
--- src/rand-isaac.c    10 Dec 2005 22:10:53 -0000      1.4
+++ src/rand-isaac.c    12 Dec 2005 22:42:58 -0000      1.6
@@ -1,4 +1,4 @@
-/* rand-isaac.c - ISAAC random number generator
+/* Bob Jenkins's cryptographic random number generator, ISAAC.
 
    Copyright (C) 1999-2005 Free Software Foundation, Inc.
    Copyright (C) 1997, 1998, 1999 Colin Plumb.
@@ -21,13 +21,9 @@
 
 /*
  * --------------------------------------------------------------------
- *     Bob Jenkins' cryptographic random number generator, ISAAC.
- *     Hacked by Colin Plumb.
- *
- * We need a source of random numbers for some of the overwrite data
- * for shred. Cryptographically secure is desirable, but it's not
- * life-or-death so I can be a little bit experimental in the choice
- * of RNGs here.
+ * We need a source of random numbers for some data.
+ * Cryptographically secure is desirable, but it's not life-or-death
+ * so I can be a little bit experimental in the choice of RNGs here.
  *
  * This generator is based somewhat on RC4, but has analysis
  * <http://burtleburtle.net/bob/rand/isaacafa.html>
@@ -36,86 +32,72 @@
  * --------------------------------------------------------------------
  */
 
-
-#ifdef HAVE_CONFIG_H
-# include <config.h>
-#endif
-
-#include "system.h"
 #include "gethrxtime.h"
 
-#include "rand-isaac.h"
-
-#define ISAAC_SIZE(words) \
-    (sizeof (struct isaac_state) - \
-    (ISAAC_MAX_WORDS - words) * sizeof (uint32_t))
+/* Size of the state tables to use.  ISAAC_LOG should be at least 3,
+   and smaller values give less security.  */
+#define ISAAC_LOG 8
+#define ISAAC_WORDS (1 << ISAAC_LOG)
+#define ISAAC_BYTES (ISAAC_WORDS * sizeof (uint32_t))
 
-/*
- * Create a new state, optionally at the location specified. This
- * should be called to create/initialize each new isaac_state. 'words'
- * must be a power of 2, and should be at least 8. Smaller values give
- * less security.
- */
-extern struct isaac_state *
-isaac_new (struct isaac_state *s, int words)
-{
-  size_t w;
-  size_t ss = ISAAC_SIZE (words);
-  if (!s)
-    {
-      s = xmalloc (ss);
-    }
-  memset (s, 0, ss);
-  s->words = words;
-  s->log = 0;
-  for (w=1; w<words; w<<=1, s->log++) {}
-  return s;
-}
+/* RNG state variables */
+struct isaac_state
+  {
+    uint32_t mm[ISAAC_WORDS];  /* Main state array */
+    uint32_t iv[8];            /* Seeding initial vector */
+    uint32_t a, b, c;          /* Extra index variables */
+  };
+
+/* This index operation is more efficient on many processors */
+#define ind(mm, x) \
+  (* (uint32_t *) ((char *) (mm) \
+                  + ((x) & (ISAAC_WORDS - 1) * sizeof (uint32_t))))
 
 /*
- * Copy a state. Destination block must be at least as large as
- * source.
+ * The central step.  This uses two temporaries, x and y.  mm is the
+ * whole state array, while m is a pointer to the current word.  off is
+ * the offset from m to the word ISAAC_WORDS/2 words away in the mm array,
+ * i.e. +/- ISAAC_WORDS/2.
  */
-extern void
-isaac_copy (struct isaac_state *dst, struct isaac_state *src)
-{
-  memcpy(dst, src, ISAAC_SIZE (src->words));
-}
+#define isaac_step(mix, a, b, mm, m, off, r) \
+( \
+  a = ((a) ^ (mix)) + (m)[off], \
+  x = *(m), \
+  *(m) = y = ind (mm, x) + (a) + (b), \
+  *(r) = b = ind (mm, (y) >> ISAAC_LOG) + x \
+)
 
 /*
  * Refill the entire R array, and update S.
  */
-extern void
-isaac_refill (struct isaac_state *s, uint32_t r[/* s>-words */])
+static void
+isaac_refill (struct isaac_state *s, uint32_t r[/* ISAAC_WORDS */])
 {
   uint32_t a, b;               /* Caches of a and b */
   uint32_t x, y;               /* Temps needed by isaac_step macro */
   uint32_t *m = s->mm; /* Pointer into state array */
-  int w = s->words;
 
   a = s->a;
   b = s->b + (++s->c);
 
   do
     {
-      isaac_step (s, a << 13, a, b, m, w / 2, r);
-      isaac_step (s, a >> 6, a, b, m + 1, w / 2, r + 1);
-      isaac_step (s, a << 2, a, b, m + 2, w / 2, r + 2);
-      isaac_step (s, a >> 16, a, b, m + 3, w / 2, r + 3);
+      isaac_step (a << 13, a, b, s->mm, m, ISAAC_WORDS / 2, r);
+      isaac_step (a >> 6, a, b, s->mm, m + 1, ISAAC_WORDS / 2, r + 1);
+      isaac_step (a << 2, a, b, s->mm, m + 2, ISAAC_WORDS / 2, r + 2);
+      isaac_step (a >> 16, a, b, s->mm, m + 3, ISAAC_WORDS / 2, r + 3);
       r += 4;
     }
-  while ((m += 4) < s->mm + w / 2);
-
+  while ((m += 4) < s->mm + ISAAC_WORDS / 2);
   do
     {
-      isaac_step (s, a << 13, a, b, m, -w / 2, r);
-      isaac_step (s, a >> 6, a, b, m + 1, -w / 2, r + 1);
-      isaac_step (s, a << 2, a, b, m + 2, -w / 2, r + 2);
-      isaac_step (s, a >> 16, a, b, m + 3, -w / 2, r + 3);
+      isaac_step (a << 13, a, b, s->mm, m, -ISAAC_WORDS / 2, r);
+      isaac_step (a >> 6, a, b, s->mm, m + 1, -ISAAC_WORDS / 2, r + 1);
+      isaac_step (a << 2, a, b, s->mm, m + 2, -ISAAC_WORDS / 2, r + 2);
+      isaac_step (a >> 16, a, b, s->mm, m + 3, -ISAAC_WORDS / 2, r + 3);
       r += 4;
     }
-  while ((m += 4) < s->mm + w);
-
+  while ((m += 4) < s->mm + ISAAC_WORDS);
   s->a = a;
   s->b = b;
 }
@@ -137,7 +119,7 @@ isaac_refill (struct isaac_state *s, uin
 
 /* The basic ISAAC initialization pass.  */
 static void
-isaac_mix (struct isaac_state *s, uint32_t const seed[/* s->words */])
+isaac_mix (struct isaac_state *s, uint32_t const seed[/* ISAAC_WORDS */])
 {
   int i;
   uint32_t a = s->iv[0];
@@ -148,9 +130,8 @@ isaac_mix (struct isaac_state *s, uint32
   uint32_t f = s->iv[5];
   uint32_t g = s->iv[6];
   uint32_t h = s->iv[7];
-  uint32_t w = s->words;
 
-  for (i = 0; i < w; i += 8)
+  for (i = 0; i < ISAAC_WORDS; i += 8)
     {
       a += seed[i];
       b += seed[i + 1];
@@ -185,15 +166,15 @@ isaac_mix (struct isaac_state *s, uint32
 
 #if 0 /* Provided for reference only; not used in this code */
 /*
- * Initialize the ISAAC RNG with the given seed material. Its size
- * MUST be a multiple of s->words * sizeof (uint32_t), and may be
+ * Initialize the ISAAC RNG with the given seed material.
+ * Its size MUST be a multiple of ISAAC_BYTES, and may be
  * stored in the s->mm array.
  *
  * This is a generalization of the original ISAAC initialization code
- * to support larger seed sizes. For seed sizes of 0 and s->words *
- * sizeof (uint32_t), it is identical.
+ * to support larger seed sizes.  For seed sizes of 0 and ISAAC_BYTES,
+ * it is identical.
  */
-extern void
+static void
 isaac_init (struct isaac_state *s, uint32_t const *seed, size_t seedsize)
 {
   static uint32_t const iv[8] =
@@ -219,10 +200,10 @@ isaac_init (struct isaac_state *s, uint3
       /* First pass (as in reference ISAAC code) */
       isaac_mix (s, seed);
       /* Second and subsequent passes (extension to ISAAC) */
-      while (seedsize -= s->words * sizeof (uint32_t))
+      while (seedsize -= ISAAC_BYTES)
        {
-         seed += s->words;
-         for (i = 0; i < s->words; i++)
+         seed += ISAAC_WORDS;
+         for (i = 0; i < ISAAC_WORDS; i++)
            s->mm[i] += seed[i];
          isaac_mix (s, s->mm);
        }
@@ -230,7 +211,7 @@ isaac_init (struct isaac_state *s, uint3
   else
     {
       /* The no seed case (as in reference ISAAC code) */
-      for (i = 0; i < s->words; i++)
+      for (i = 0; i < ISAAC_WORDS; i++)
        s->mm[i] = 0;
     }
 
@@ -240,7 +221,7 @@ isaac_init (struct isaac_state *s, uint3
 #endif
 
 /* Start seeding an ISAAC structire */
-extern void
+static void
 isaac_seed_start (struct isaac_state *s)
 {
   static uint32_t const iv[8] =
@@ -260,15 +241,14 @@ isaac_seed_start (struct isaac_state *s)
   for (i = 0; i < 8; i++)
     s->iv[i] = iv[i];
 
-  /* used to do memset here to clear the state array, but now it's
-     done in isaac_new */
+  memset (s->mm, 0, sizeof s->mm);
 
   /* s->c gets used for a data pointer during the seeding phase */
   s->a = s->b = s->c = 0;
 }
 
 /* Add a buffer of seed material */
-extern void
+static void
 isaac_seed_data (struct isaac_state *s, void const *buffer, size_t size)
 {
   unsigned char const *buf = buffer;
@@ -276,7 +256,7 @@ isaac_seed_data (struct isaac_state *s, 
   size_t avail;
   size_t i;
 
-  avail = s->words - (size_t) s->c;    /* s->c is used as a write pointer */
+  avail = sizeof s->mm - (size_t) s->c;        /* s->c is used as a write 
pointer */
 
   /* Do any full buffers that are necessary */
   while (size > avail)
@@ -288,7 +268,7 @@ isaac_seed_data (struct isaac_state *s, 
       size -= avail;
       isaac_mix (s, s->mm);
       s->c = 0;
-      avail = s->words;
+      avail = sizeof s->mm;
     }
 
   /* And the final partial block */
@@ -300,7 +280,7 @@ isaac_seed_data (struct isaac_state *s, 
 
 
 /* End of seeding phase; get everything ready to produce output. */
-extern void
+static void
 isaac_seed_finish (struct isaac_state *s)
 {
   isaac_mix (s, s->mm);
@@ -314,7 +294,7 @@ isaac_seed_finish (struct isaac_state *s
  * Get seed material.  16 bytes (128 bits) is plenty, but if we have
  * /dev/urandom, we get 32 bytes = 256 bits for complete overkill.
  */
-extern void
+static void
 isaac_seed (struct isaac_state *s)
 {
   isaac_seed_start (s);
@@ -353,56 +333,3 @@ isaac_seed (struct isaac_state *s)
 
   isaac_seed_finish (s);
 }
-
-extern void
-irand_init (struct irand_state *r, struct isaac_state *s)
-{
-  r->numleft = 0;
-  r->s = s;
-  memset (r->r, 0, s->words * sizeof (uint32_t));
-}
-
-/*
- * We take from the end of the block deliberately, so if we need
- * only a small number of values, we choose the final ones which are
- * marginally better mixed than the initial ones.
- */
-extern uint32_t
-irand32 (struct irand_state *r)
-{
-  if (!r->numleft)
-    {
-      isaac_refill (r->s, r->r);
-      r->numleft = r->s->words;
-    }
-  return r->r[--r->numleft];
-}
-
-/*
- * Return a uniformly distributed random number between 0 and n,
- * inclusive.  Thus, the result is modulo n+1.
- *
- * Theory of operation: as x steps through every possible 32-bit number,
- * x % n takes each value at least 2^32 / n times (rounded down), but
- * the values less than 2^32 % n are taken one additional time.  Thus,
- * x % n is not perfectly uniform.  To fix this, the values of x less
- * than 2^32 % n are disallowed, and if the RNG produces one, we ask
- * for a new value.
- */
-extern uint32_t
-irand_mod (struct irand_state *r, uint32_t n)
-{
-  uint32_t x;
-  uint32_t lim;
-
-  if (!++n)
-    return irand32 (r);
-
-  lim = -n % n;                        /* == (2**32-n) % n == 2**32 % n */
-  do
-    {
-      x = irand32 (r);
-    }
-  while (x < lim);
-  return x % n;
-}
Index: src/shred.c
===================================================================
RCS file: /fetish/cu/src/shred.c,v
retrieving revision 1.119
retrieving revision 1.121
diff -p -u -r1.119 -r1.121
--- src/shred.c 10 Dec 2005 10:02:24 -0000      1.119
+++ src/shred.c 12 Dec 2005 22:43:16 -0000      1.121
@@ -108,7 +108,8 @@
 #include "inttostr.h"
 #include "quotearg.h"          /* For quotearg_colon */
 #include "quote.h"             /* For quotearg_colon */
-#include "rand-isaac.h"              /* Random number stuff */
+
+#include "rand-isaac.c"
 
 #define DEFAULT_PASSES 25      /* Default */
 
@@ -226,6 +227,66 @@ to be recovered later.\n\
   exit (status);
 }
 
+/* Single-word RNG built on top of ISAAC */
+struct irand_state
+{
+  uint32_t r[ISAAC_WORDS];
+  unsigned int numleft;
+  struct isaac_state *s;
+};
+
+static void
+irand_init (struct irand_state *r, struct isaac_state *s)
+{
+  r->numleft = 0;
+  r->s = s;
+}
+
+/*
+ * We take from the end of the block deliberately, so if we need
+ * only a small number of values, we choose the final ones which are
+ * marginally better mixed than the initial ones.
+ */
+static uint32_t
+irand32 (struct irand_state *r)
+{
+  if (!r->numleft)
+    {
+      isaac_refill (r->s, r->r);
+      r->numleft = ISAAC_WORDS;
+    }
+  return r->r[--r->numleft];
+}
+
+/*
+ * Return a uniformly distributed random number between 0 and n,
+ * inclusive.  Thus, the result is modulo n+1.
+ *
+ * Theory of operation: as x steps through every possible 32-bit number,
+ * x % n takes each value at least 2^32 / n times (rounded down), but
+ * the values less than 2^32 % n are taken one additional time.  Thus,
+ * x % n is not perfectly uniform.  To fix this, the values of x less
+ * than 2^32 % n are disallowed, and if the RNG produces one, we ask
+ * for a new value.
+ */
+static uint32_t
+irand_mod (struct irand_state *r, uint32_t n)
+{
+  uint32_t x;
+  uint32_t lim;
+
+  if (!++n)
+    return irand32 (r);
+
+  lim = -n % n;                        /* == (2**32-n) % n == 2**32 % n */
+  do
+    {
+      x = irand32 (r);
+    }
+  while (x < lim);
+  return x % n;
+}
+
 /*
  * Fill a buffer with a fixed pattern.
  *
@@ -255,19 +316,18 @@ fillpattern (int type, unsigned char *r,
 
 /*
  * Fill a buffer, R (of size SIZE_MAX), with random data.
- * SIZE is rounded UP to a multiple of s->words * sizeof (uint32_t).
+ * SIZE is rounded UP to a multiple of ISAAC_BYTES.
  */
 static void
 fillrand (struct isaac_state *s, uint32_t *r, size_t size_max, size_t size)
 {
-  size_t bytes = s->words * sizeof (uint32_t);
-  size = (size + bytes - 1) / bytes;
+  size = (size + ISAAC_BYTES - 1) / ISAAC_BYTES;
   assert (size <= size_max);
 
   while (size--)
     {
       isaac_refill (s, r);
-      r += s->words;
+      r += ISAAC_WORDS;
     }
 }
 
@@ -369,7 +429,7 @@ dopass (int fd, char const *qname, off_t
   size_t soff;                 /* Offset into buffer for next write */
   ssize_t ssize;               /* Return value from write */
   uint32_t *r;                 /* Fill pattern.  */
-  size_t rsize = 3 * MAX (s->words, 1024) * sizeof *r; /* Fill size.  */
+  size_t rsize = 3 * MAX (ISAAC_WORDS, 1024) * sizeof *r; /* Fill size.  */
   size_t ralign = lcm (getpagesize (), sizeof *r); /* Fill alignment.  */
   char pass_string[PASS_NAME_SIZE];    /* Name of current pass */
   bool write_error = false;
@@ -1103,7 +1163,6 @@ main (int argc, char **argv)
 
   atexit (close_stdout);
 
-  isaac_new (&s, ISAAC_MAX_WORDS);
   isaac_seed (&s);
 
   memset (&flags, 0, sizeof flags);
Index: src/sort.c
===================================================================
RCS file: /fetish/cu/src/sort.c,v
retrieving revision 1.332
retrieving revision 1.333
diff -p -u -r1.332 -r1.333
--- src/sort.c  10 Dec 2005 10:04:12 -0000      1.332
+++ src/sort.c  12 Dec 2005 22:09:27 -0000      1.333
@@ -30,10 +30,8 @@
 #include "error.h"
 #include "hard-locale.h"
 #include "inttostr.h"
-#include "md5.h"
 #include "physmem.h"
 #include "posixver.h"
-#include "rand-isaac.h"
 #include "quote.h"
 #include "stdlib--.h"
 #include "stdio--.h"
@@ -79,7 +77,7 @@ double strtod ();
 # define DEFAULT_TMPDIR "/tmp"
 #endif
 
-#define SORT_ISAAC_WORDS 8
+#include "rand-isaac.c"
 
 /* Exit statuses.  */
 enum
@@ -151,7 +149,7 @@ struct keyfield
   bool numeric;                        /* Flag for numeric comparison.  Handle
                                   strings of digits with optional decimal
                                   point, but no exponential notation. */
-  bool random_hash;            /* Shuffle by sorting on random hash of key */
+  bool random;                 /* Sort by random hash of key.  */
   bool general_numeric;                /* Flag for general, numeric comparison.
                                   Handle numbers in exponential notation. */
   bool month;                  /* Flag for comparison by month name. */
@@ -319,7 +317,7 @@ Other options:\n\
   -k, --key=POS1[,POS2]     start a key at POS1, end it at POS2 (origin 1)\n\
   -m, --merge               merge already sorted files; do not sort\n\
   -o, --output=FILE         write result to FILE instead of standard output\n\
-      --seed=STRING         use specified seed for random number generator\n\
+      --seed=STRING         seed random hash function with STRING\n\
   -s, --stable              stabilize sort by disabling last-resort 
comparison\n\
   -S, --buffer-size=SIZE    use SIZE for main memory buffer\n\
 "), stdout);
@@ -361,12 +359,15 @@ native byte values.\n\
   exit (status);
 }
 
-static char const short_options[] = "-bcdfgik:mMno:rRsS:t:T:uy:z";
+/* For long options that have no equivalent short option, use a
+   non-character as a pseudo short option, starting with CHAR_MAX + 1.  */
 enum
 {
   SEED_OPTION = CHAR_MAX + 1
 };
 
+static char const short_options[] = "-bcdfgik:mMno:rRsS:t:T:uy:z";
+
 static struct option const long_options[] =
 {
   {"ignore-leading-blanks", no_argument, NULL, 'b'},
@@ -1166,27 +1167,19 @@ getmonth (char const *month, size_t len)
   return 0;
 }
 
-static struct isaac_state *rand_state;
-
-#define HASH_WORDS 1
-#define HASH_SIZE (HASH_WORDS * sizeof (uint32_t))
+/* The ISAAC state resulting from the user-specified seed, or from
+   random data taken from the environment.  */
+static struct isaac_state rand_state;
 
+/* Set *RESULT to the ISAAC state resulting from applying TEXT (with
+   length LEN) to rand_state.  */
 static void
-get_hash (char const *text, size_t len, uint32_t resbuf[/*HASH_WORDS*/])
+get_hash (char const *text, size_t len, uint32_t resbuf[ISAAC_WORDS])
 {
-  struct isaac_state s;
-  int i;
-  isaac_copy (&s, rand_state);
+  struct isaac_state s = rand_state;
   isaac_seed_data (&s, text, len);
-  /* we can call isaac_seed_finish multiple times, but should only
-     call isaac_seed_start once */
   isaac_seed_finish (&s);
-
-  /* getting an irand_state and drawing random numbers would be more
-     kosher here, but also slower. so we just peek at the ISAAC state
-     array instead */
-  for (i = 0; i < HASH_WORDS; i++)
-    resbuf[i] = s.mm[s.words - 1 - i];
+  isaac_refill (&s, resbuf);
 }
 
 /* Compare two lines A and B trying every key in sequence until there
@@ -1217,15 +1210,27 @@ keycompare (const struct line *a, const 
 
       /* Actually compare the fields. */
 
-      if (key->random_hash)
+      if (key->random)
         {
-          uint32_t diga[HASH_WORDS];
-          uint32_t digb[HASH_WORDS];
+         int i;
+         uint32_t diga[ISAAC_WORDS];
+         uint32_t digb[ISAAC_WORDS];
           get_hash (texta, lena, diga);
           get_hash (textb, lenb, digb);
-          diff = memcmp (diga, digb, sizeof (diga));
-         if (diff)
-           goto not_equal;
+
+         /* Go backwards, since the last words are marginally better
+            mixed.  */
+         for (i = ISAAC_WORDS - 1; 0 <= i; i--)
+           if (diga[i] != digb[i])
+             {
+               diff = (diga[i] < digb[i] ? -1 : 1);
+               goto not_equal;
+             }
+
+         /* FIXME: Should break ties by rehashing with a different
+            random hashing function (and repeat until the tie is
+            broken) unless --seed was specified.  The probability of
+            this being needed should be infinitesimal.  */
         }
 
       if (key->numeric | key->general_numeric)
@@ -2038,7 +2043,6 @@ badfieldspec (char const *spec, char con
 {
   error (SORT_FAILURE, 0, _("%s: invalid field specification %s"),
         _(msgid), quote (spec));
-  /* added to satisfy compiler for NORETURN: */
   abort ();
 }
 
@@ -2132,7 +2136,7 @@ set_ordering (const char *s, struct keyf
          key->numeric = true;
          break;
        case 'R':
-         key->random_hash = true;
+         key->random = true;
          break;
        case 'r':
          key->reverse = true;
@@ -2162,8 +2166,8 @@ main (int argc, char **argv)
   int c = 0;
   bool checkonly = false;
   bool mergeonly = false;
-  char const *randseed = NULL;
-  bool need_rand = false;
+  char const *seed = NULL;
+  bool need_random = false;
   size_t nfiles = 0;
   bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
   bool obsolete_usage = (posix2_version () < 200112);
@@ -2242,7 +2246,7 @@ main (int argc, char **argv)
   gkey.sword = gkey.eword = SIZE_MAX;
   gkey.ignore = NULL;
   gkey.translate = NULL;
-  gkey.numeric = gkey.general_numeric = gkey.random_hash = false;
+  gkey.numeric = gkey.general_numeric = gkey.random = false;
   gkey.month = gkey.reverse = false;
   gkey.skipsblanks = gkey.skipeblanks = false;
 
@@ -2397,7 +2401,7 @@ main (int argc, char **argv)
          break;
 
         case SEED_OPTION:
-          randseed = optarg;
+         seed = optarg;
           break;
 
        case 's':
@@ -2474,16 +2478,14 @@ main (int argc, char **argv)
        }
     }
 
-  need_rand |= gkey.random_hash;
   /* Inheritance of global options to individual keys. */
   for (key = keylist; key; key = key->next)
     {
-      need_rand |= key->random_hash;
       if (! (key->ignore || key->translate
              || (key->skipsblanks | key->reverse
                  | key->skipeblanks | key->month | key->numeric
                  | key->general_numeric
-                 | key->random_hash)))
+                 | key->random)))
         {
           key->ignore = gkey.ignore;
           key->translate = gkey.translate;
@@ -2492,31 +2494,35 @@ main (int argc, char **argv)
           key->month = gkey.month;
           key->numeric = gkey.numeric;
           key->general_numeric = gkey.general_numeric;
-          key->random_hash = gkey.random_hash;
+          key->random = gkey.random;
           key->reverse = gkey.reverse;
         }
-    }
 
-  if (need_rand)
-    {
-      rand_state = isaac_new (NULL, SORT_ISAAC_WORDS);
-      if (randseed)
-        {
-          isaac_seed_start (rand_state);
-          isaac_seed_data (rand_state, randseed, strlen (randseed));
-          isaac_seed_finish (rand_state);
-        }
-      else
-        isaac_seed (rand_state);
+      need_random |= key->random;
     }
 
   if (!keylist && (gkey.ignore || gkey.translate
                   || (gkey.skipsblanks | gkey.skipeblanks | gkey.month
                       | gkey.numeric | gkey.general_numeric
-                       | gkey.random_hash)))
-    insertkey (&gkey);
+                       | gkey.random)))
+    {
+      insertkey (&gkey);
+      need_random |= gkey.random;
+    }
+
   reverse = gkey.reverse;
 
+  if (need_random)
+    {
+      if (seed)
+       {
+         isaac_seed_start (&rand_state);
+         isaac_seed_data (&rand_state, seed, strlen (seed));
+       }
+      else
+       isaac_seed (&rand_state);
+    }
+
   if (temp_dir_count == 0)
     {
       char const *tmp_dir = getenv ("TMPDIR");




reply via email to

[Prev in Thread] Current Thread [Next in Thread]