Randomizing should be easy, right? oh, well, maybe not..

A simple problem...

Let's say you have a regression test or fuzzy testing suite that relies on generating a random set of operations, and verifying their results (like ldap-torture).

You want this set operations to be reproducible, so if you find a bug, you can easily get to the exact same conditions that triggered it.

There are many ways to do this, but one simple way is to use one of many pseudo random generators, one that given the same starting seed generates the same sequence of random numbers. Example?

Let's look at perl:

# Seed the random number generator.
srand($seed);

# Generate 100 random numbers.
for (my $count = 0; $count < 100; $count++) {
  print rand() . "\n";
}

Given the same $seed, the sequence of random numbers will always be the same. Not surprising, right?

Now, let's go back to our original problem: you want your test to be reproducible, but still be random. Something you can do is get rid of $seed, and just call srand(). srand will return the seed generated, that you can helpfully print on the screen and reuse if you need to. The final code would look like:

if ($seed) {
  # Use an existing seed to reproduce a failing test.
  srand($seed);
} else {
  # Let srand pick a seed to start a newly randomized test.
  $seed = srand();
}

print "TO REPRODUCE TEST, USE SEED: " . $seed . "\n";

A broken solution...

Now, where is the problem? Well, the problem is that before perl 5.14 (~2011, in case you are wondering), srand() did not return the seed it set. Just doing $seed = srand() did not work.

I was debugging a piece of code I wrote a long time ago (2004), and here's what I was doing:

...
} else {
  my $seed = int(rand(~0));
  srand($seed);
}
...

Now, looks nice, doesn't it? rand() will automatically seed to some random value the first time it is called, which means rand() will produce a reasonable value (well, reasonable for non-crypto purposes, and with reasonable versions of perl), which I can then store, use as a seed, and be done with it.

But what about the ~0 in parenthesis? Well, if I just call rand() by itself, without parameters, the returned value is a number between 0 and 1. srand() takes an integer, so something like $seed = rand(); srand($seed) would always lead to seeding the prng with 0, not good at all.

According to the man page, rand($limit) instead will return a random between 0 and $limit. By using ~0 as a parameter I get the maximum integer that perl can represent, an integer entirely made of 1s in binary. On my laptop, this is 2^64 - 1, pretty high.

So: get a random number in the widest range I can possibly get, feed it to srand, print it on the screen, and have a reasonably ok reproducible random sequence at every run of my tool, right?

WRONG! What, why?

Well, turns out that there are (at least) 2 problems.

rand() returns floating points. As you may remember, floating numbers are made of a significand and an exponent. if you ask too large of a number, and convert to integer, there will be no entropy in the lower bits (results of the exponent).
... and well, srand only seems to be using the lower bits of a number to actually seed the prng.

Try yourself if you don't believe me:

my $count;
for ($count = 0; $count < 10; $count ++) {
  srand(); # This just seeds the prng to a non-great but ok value.
  $seed = int(rand(~0));
  srand($seed);
  print "SEED: $seed, NEXT: " . rand() . "\n";
}

Let's try to run this program:

$ perl /tmp/srand.pl
SEED: 1060381934447820800, NEXT: 0.559209994114102
SEED: 7074176055472357376, NEXT: 0.559209994114102
SEED: 1895145662064951296, NEXT: 0.559209994114102
SEED: 8633284823558979584, NEXT: 0.559209994114102
SEED: 18297293223351091200, NEXT: 0.559209994114102
SEED: 12078737747670532096, NEXT: 0.559209994114102
SEED: 11431093298324897792, NEXT: 0.559209994114102
SEED: 15164597111904862208, NEXT: 0.559209994114102
SEED: 11321558760259911680, NEXT: 0.559209994114102
SEED: 7997988821801172992, NEXT: 0.559209994114102

Note that despite the SEED being reasonably random and significantly different every time, the next random number generated is always the same. This means I'd get the same sequence, despite the different seeds. But are the seeds so different? Let's look at them in hex, let's add a sprintf("%x"):

perl /tmp/srand.pl 
SEED: 871400663299457024 c17d64951010000 NEXT: 0.559209994114102
SEED: 8131900614985711616 70da508651010000 NEXT: 0.559209994114102
SEED: 2174713905224417280 1e2e22c651010000 NEXT: 0.559209994114102
SEED: 18069664157141106688 fac457d051010000 NEXT: 0.559209994114102
SEED: 16751261221931515904 e8786f5451010000 NEXT: 0.559209994114102
SEED: 13819218582325755904 bfc7ba1551010000 NEXT: 0.559209994114102
SEED: 4060176321643347968 3858a4da51010000 NEXT: 0.559209994114102
SEED: 17907160938465787904 f88303ff51010000 NEXT: 0.559209994114102
SEED: 12784784062795546624 b16cad5e51010000 NEXT: 0.559209994114102
SEED: 13701418886505758720 be2537e251010000 NEXT: 0.559209994114102

Tadah! Note that the last 32 bits of the seeds are always the same! Now, let's assume for a second that srand() is only using the last 32 bits of a number for seeding. If I shift the number right by 1, with >>1, I should have one bit of entropy, right? and two different outcomes for the next rand? Let's try:

$ perl /tmp/srand.pl
SEED: 40252668253339648, NEXT: 0.365019015110196
SEED: 7239562128531685376, NEXT: 0.865019015110196
SEED: 1399728002052423680, NEXT: 0.365019015110196
SEED: 6143080778424156160, NEXT: 0.365019015110196
SEED: 1155420768230735872, NEXT: 0.865019015110196
SEED: 1359272858982842368, NEXT: 0.365019015110196
SEED: 8077490881973747712, NEXT: 0.865019015110196
SEED: 1391389776166289408, NEXT: 0.865019015110196
SEED: 3567395640554061824, NEXT: 0.865019015110196
SEED: 5663678882486714368, NEXT: 0.365019015110196

Seems like the theory is correct: I only obtain two different values.

So.. I wish that whatever srand() is doing was documented somewhere, the manual page makes no mention of srand only looking at the lowest 32 bits. And I feel naive for not having thought about float conversion to int, and well, very large numbers.

In fairness, this was almost 9 years ago, haven't used perl in a while, and well, have been spoiled by integers in python, which can have arbitrary length.

Cleaning up a CSS Getting back to use openldap