[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] $RANDOM excludes value zero when called in a subscript
From: |
Greg Wooledge |
Subject: |
Re: [Help-bash] $RANDOM excludes value zero when called in a subscript |
Date: |
Tue, 6 Aug 2019 09:17:17 -0400 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Tue, Aug 06, 2019 at 10:20:50AM +0200, Roger Price wrote:
> The test program I wrote may be found at http://rogerprice.org/randtest.sh .
> The anomaly also appears on Debian stretch BASH_VERSION=4.4.12(1)-release. Is
> this a bug or am I doing something wrong? - any hint would be very welcome.
First mistake is making us go to a web page to retrive the script. It's
better just to attach it to the email. Your web site may go away in the
future, so someone reading this email thread in the archives may not be
able to see it.
Therefore I've attached it here, after "wget"ting it.
Second mistake:
declare a HIST1 # Histogram of $RANDOM
declare a HIST3 # Histogram of $RANDOM in sub-script
Pretty sure you meant to use "-a" there, not "a". Not that it matters,
really, since indexed arrays don't have to be pre-declared.
Third mistake: you're generating a program in /tmp using a static
filename. Really, it's two mistakes: one, using a static name in a
publicly writable directory, and two, expecting to be able to execute
a program in /tmp, which may or may not be mounted with noexec
permissions.
Fourth mistake:
LL=$(( $L + 1 )); HH=$(( $H - 1 )) # Additional indexes to histograms
Don't use $scalar syntax inside arithmetic contexts unless it's absolutely
necessary. Use the C-like syntax (variable name without leading $).
The same applies here:
for (( i=$L ; i<=$H ; i++ ))
and here:
do HIST1[$i]=0; HIST3[$i]=0
but you got it right here:
B=10; NB=$(( N / B ))
And while I'm at it, fifth mistake: don't use all-caps variable names.
At least you got "i" right.
OK, now let's take a look at the actual meat of the issue.
R3=$( $SS ) # $RANDOM in sub-script
So, you're creating this file in /tmp whose filename is in the variable SS.
Then you execute it in a command substitution, and capture stdout in
the variable R3.
And all of this is done inside a tight loop.
Next, we need the content of the script whose filename is in SS.
Fortunately, that one is very simple:
#!/bin/bash
echo $RANDOM
So, you're executing this script repeatedly in a tight loop. You didn't
seed the random number generator, so it's going to use whatever its
fallback seeding strategy is, which is not documented, so I'll have to
look it up in the source code, but *usually* one would expect it to be
something like "use the current time, with second resolution".
The real stumper is that you didn't find *lots* more issues with the
randomness, using the default seeding of $RANDOM in a tight loop. I'm
amazed you're not just getting the same value every single time.
So, the only mystery left to uncover is "how does bash seed $RANDOM"?
A few greps leads me to this function, in variables.c:
static void
seedrand ()
{
struct timeval tv;
SHELL_VAR *v;
gettimeofday (&tv, NULL);
#if 0
v = find_variable ("BASH_VERSION");
sbrand (tv.tv_sec ^ tv.tv_usec ^ getpid () ^ ((u_bits32_t)&v & 0x7fffffff));
#else
sbrand (tv.tv_sec ^ tv.tv_usec ^ getpid ());
#endif
}
So, it takes the current time in second-resolution, the current time
in microsecond-resolution, the current process ID, and XORs them together
to get the initial seed value.
Well, that's *slightly* better than what I expected. I guess it explains
why you didn't just get "9" every time.
However, if you actually *care* about randomness, and from the fact that
you tried to construct a histogram of $RANDOM's results, it seems you
do care at least a little, then you really need to understand the care
and feeding of a linear congruential pseudorandom number generator.
=========================================================================
Using a pseudorandom number generator:
(1) Seed the PRNG one time, and one time only. Do not reseed it partway
through.
(2) After seeding, continue using the same PRNG instance's output for all
values. Do not restart the program. Do not start multiple instances
of the program.
(3) With some PRNGs, the first few results may be of questionable
randomness. You want the PRNG to run for a while and hit its
stride. See rule 2.
=========================================================================
So, you can see that running a full bash process, which does it own
default seeding, in a tight loop (probably every instance has the same
second, a slightly different microsecond, and a sequentially incrementing
PID) violates *all* of these rules.
Whatever you're really doing, you need to design it so that it uses a
single instance of a PRNG, and retrieves all random values from that
same instance. This could mean that you run a coprocess whose sole
job is to provide random numbers. Or a full-blown child process
communicating through a named pipe. Or a full-blown service daemon
communicating through a TCP socket.
Or, in the simplest cases, simply call $RANDOM directly from the *main*
process, instead of delegating the PRNG to a different process.
Knowing more about your actual project and why you're trying to farm
out the $RANDOM stuff would help us help you design it properly.
randtest.sh
Description: Bourne shell script