Generating Random Values

Why Randomness?

It is often helpful if values used for a program are not the same for every run of the program. This kind of thing occurs in lofty applications like wind-tunnel simulations, economic simulations, and weather forecasting as well as in more mundane applications like games. This semester we'll use [so-called pseudo-] random values to roll virtual dice, to shuffle a list of data (cards?), and even to speculate on a user's relatives!

How Random Is It?

Not very. Although some computers come with random number generators supported by hardware — in which properties of a physical phenomenon being measured inside the computer case are used to generate truly random values — this is very rare. Your typical computer only has the pseudo-random number generator (PRNG) provided by the compiler (or programmer). The numbers are only pretend (pseudo-) random because they are 'picked' by calculating the next one from the previous one. That means two things: they must start somewhere and there must be a formula. Since there is a formula, you can generate the same sequence of values over and over if you always start at the same place.

That's what happens if you just begin asking for random values — the compiler picks where to start and it will pick the same place each time the program is run. However, we have been given the srand function to change where the random sequence starts (think Start RANDom or Seed RANDom). All we have to do is come up with a way for it to always be given a different place each time the program is run. Hmm ... we need a different number each time the program is run ... each time we run it ... Hey, what about that time function? It reports seconds, as I recall. If the user is running our program multiple times in a second, they need to just get a life. (I'd say they have too much time on their hands, but that's just too bad a pun to even mention ...oops!)

So, we call the srand function with the output of the time function (which we give the nullptr constant each time we call it). What does srand give us? Nothing. It returns no values. All of its work is done inside cstdlib where it and rand reside (rand being the function that reports the next random number to us). How does that look? Something like this:

    srand(static_cast<unsigned>(time(nullptr)));

We normally place this in our main function just after the variable declarations. Never place it with the call to rand. Since billions of calculations are done each second, this could restart the random sequence many times in the same second — giving the same 'random' value over and over.

Alright, Already, Let's Get To It!

So, let's say you wanted to do something simple like roll a six-sided die (singular die, plural dice). We'd need random integers between 1 and 6 inclusive or [1..6] (recall the square brackets in math mean inclusive; many computer programmers use '..' instead of ',' when denoting ranges or sequences). And what will rand give us? Unfortunately, it gives us values between 0 (inclusive) and RAND_MAX (possibly included). That's obnoxious! First off, what's RAND_MAX? It is a constant that describes the end of the random sequence range. Unfortunately, reading the standard left some compiler writers with the impression that RAND_MAX should be included in the generated random sequence while others felt the random sequence should be capped by RAND_MAX and so it wouldn't be in the sequence. *shiver* Luckily we don't need to refer to it at all (generally).

How do we transform [0..RAND_MAX} (I'll just use a curly brace here to say it might be round or square ... *shrug*) into [1..6]? Let's start at the opposite end and work toward rand's output:

   [1..6]
...
   [0..RAND_MAX}

The first step is to get them to both start at the same place. Subtracting 1 from everything in our desired range gives us:

   [1..6]
          -1
   [0..5]
...
   [0..RAND_MAX}

Okay. Now how can we chop down a large positive integer to always be between 0 and 5 (inclusive). Hmm ... math operations ... results always between 0 and 5 ... integer math ... 0 to 5 ... Ooo! Hey! How about that modulo thingy? That was like remainders, right? Remainders are always between 0 and the divisor minus one (if it were ever the divisor, it would have divided in another time leaving a 0 remainder). Yeah! So, to get remainders between 0 and 5, we'd need to divide by 6 (the divisor minus one; so the divisor is the maximum value plus one). So now we have:

   [0..RAND_MAX}
                % 6
   [0..5]
                + 1
   [1..6]

Modulo the rand result and then shift to the right starting position by adding 1! Where do the modulo value and the shift value come from? They look like the minimum and maximum of the desired range. Let's try another example to make sure. Let's make random values between 5 and 10 inclusive:

    [5..10]
            -5
    [0..5]
...
    [0..RAND_MAX}

That means the modulo value for [5..10] is gonna be 6 too! Must not be the maximum. Where does it come from? A little investigation will lead you to see that the 6 comes from the number of integers that are in the desired range. There are 6 values in [1..6]. There are 6 values in [5..10]. If this holds, to get [8..10] we should be modulo'ing by 3, right? Let's see:

    [8..10]
            -8
    [0..2]
...
    [0..RAND_MAX}

Yup! There it is. So, how can we calculate the number of values in a range? Most will jump to maximum minus minimum, but that's not quite right. See: 6-1=5, 10-5=5, 10-8=2. In fact, it is always 1 short of correct. Weird. Wonder why..? It is called the fencepost problem. Look at this (terrible) drawing of a fence:

    |      |      |      |
    +------+------+------+
    |      |      |      |
    +------+------+------+
    |      |      |      |

To build a fence with three sections like this one, we need four posts. Without the 4^th post, the last section would fall off. So, the modulo value is actually the maximum minus the minimum plus one. But what's that to do with fences? Well, nothing really, but they are similar. Look at what the subtraction does:

     ||||||
   -      |
    --------
     |||||

It takes away the one from the six and so the one isn't there anymore. But we need to count the one and so we add it back in. Again:

    ||||||||||
  -   ||||||||
   ------------
    ||

We take 8 from 10 leaving only 2. But we didn't want to take away number 8 — just [1..7]. So, we need to put number 8 back in and that's what the plus one is all about.

(The same goes for reading assignments. When the teacher says read pages 577-580, they didn't just give you 3 pages to read, they gave you 4 pages to read: 577, 578, 579, 580. So 580 - 577 = 3 and then put 577 back in by adding 1: 3 + 1 = 4.)

Where does this long and seemingly winding discussion lead us? To a general formula:

    short max, min, num;
    // get max and min somehow
    num = rand() % (max - min + 1) + min;

To get a random number in [min..max], modulo by (max-min+1) and then shift by adding min. Does this only work on short? No, it should work on long as well (but since rand returns the evil int, it may not). Does this only work on positive numbers? Nope, it actually works with negatives and zero, too. Watch:

    [0..RAND_MAX}
                   % (-2 - -5 + 1)
    [0..3]
                   + -5
    [-5..-2]

Or:

    [0..RAND_MAX}
                   % (5 - -5 + 1)
    [0..10]
                   + -5
    [-5..5]

Oooo...Awww...Ohhh!

Random Integers ... phbbt

Random bool Values

What if we want random characters or random logical values? What about random decimal values? Those can all be handled, too. Let's start with the easy one: bool. To get random bool values, just do a little type-casting:

    bool rand_value;

    rand_value = static_cast<bool>(rand() % 2);

This modulo gives results of 0 or 1. Casting 0 gives false and casting 1 gives true. We don't even have to shift the range!

(Incidentally, if you want to weight true more heavily — make it more likely — you can just increase the modulo value. Only 0 will cast into false and so all the other results will turn into true. You can reverse this weighting, but it needs a little more knowledge about manipulating bool.)

Random char Values

characters are similar but with more typecasting. Let's say we wanted to get letters between 'A' and 'D' inclusive. That'd be:

    char grade;

    grade = static_cast<char> (rand() % ( static_cast<short>('D') -
                                          static_cast<short>('A') + 1 )
                               + static_cast<short>('A'));

Whew! Now that's a lot of casting! Basically, we are generating random ASCII values in the proper range and then casting those back to actual char's. The short casts are simply turning the letters into their ASCII numeric equivalents so we can do math with them.

(Don't forget that not all characters are letters. That data type includes single digits, spacing, punctuation, and others. Anything the user can type — plus '\0'. Unfortunately, only the digits, uppercase, and lowercase are placed together in the ASCII encoding scheme. The punctuation are intersperced as are the spacing characters. See your ASCII Appendix for the layout of the code.)

Random double Values

Finally to the floating point random values. Well, there are probably thousands of ways to generate random numbers with decimal places. Many are difficult or tricky. Many do a poor job. But we'll look at a few and see which might be best for different situations.

Fixed Precision

First, if you know that the numbers you are trying to generate are to have a specific number of digits past the decimal point, you can use this method:

    [0..RAND_MAX}
                  % 81
    [0..80]
                  + -42
    [-42..38]
                  /10.0
    [-4.2..3.8]

Note that the eventual values are floating point with one digit past the decimal point. Also note that we got there by generating random integers and then dividing by 10.0 (see the .0 on there!) to shift the decimal 1 position. (We couldn't divide by 10 because that would be integer division and we'd get just the quotient — losing the remainder.)

In code, then, this would be:

    double rand_val;
    double min, max;
    // assign/read/calculate min and max somehow
    rand_val = (rand() % (static_cast<long>(max*10) -
                          static_cast<long>(min*10) + 1)
                + static_cast<long>(min*10))/10.0;

Lots of type-casting going on. Also a couple of duplicated calculations ... maybe we could simplify a bit:

    double rand_val;
    double min, max;
    long min_whole, max_whole;
    // assign/read/calculate min and max somehow
    min_whole = static_cast<long>(min*10);
    max_whole = static_cast<long>(max*10);
    rand_val = (rand() % (max_whole - min_whole + 1) + min_whole)/10.0;

That's a little nicer. The type-casts on the assignments avoid a warning about lost precision during assignment. I chose long, btw, because it has the closest to double precision (10 digits to 15). It still might not work in some situations, but we'll see how to address that in the last method below.

Remember, that the *10 and /10.0 in this code refer to the fact that we wanted 1 digit past the decimal on all our random values. If you want 2 digits, you'd multiply by 100 and divide by 100.0.

As Precise As Possible

The next method is a little more general in that it doesn't assume you need a specific number of decimal digits. It simply produces random decimal values in the range you want. It also requires no type-casting! Here's the derivation:

    [0..RAND_MAX}
                   % RAND_MAX
    [0..RAND_MAX-1]
                   / (RAND_MAX - 1.0)
    [0..1]
                   * (max-min)
    [0..max-min]
                   + min
    [min..max]

Here we get an arbitrary random value between 0 and 1 (inclusive) by dividing [0..RAND_MAX-1] by RAND_MAX-1.0 (again, the .0 is terribly important!). Then we scale this range to [0..max-min] with a multiplication and finally shift it to the correct position by adding min (which may be negative and so shift left).

So, in code this would be:

    double rand_val;
    double min, max;
    // assign/read/calculate min and max somehow
    rand_val = rand() % RAND_MAX / (RAND_MAX - 1.0) * (max-min) + min;

And Then...

The next (and last) method requires decision making and so we'll defer it until we do branching later. But the previous method is actually the best (most generally applicable) of the three. The first being limited to situations where number of decimal places is important (fairly rare). We'll see the good/bad of the third method when we get there.

Random bool Values — Revisited

Now that we have a decent random double generator, we can utilize it to generate more fluid bool values. Our modulo-cast method above will only allow us to implement rational ratios within the range of rand's integer type. What if we want some arbitrary value like 63.457% true values (and 36.543% falses)? This kind of thing can be more difficult to deal with — but not with our latest double generator!

In fact, we don't even need the whole thing. We just need a random value within the range [0, 1]. After all, probabilities must be within this range, right? (Remember that 100% is mathematically 1.00. Likewise, 60% is 0.60. And so on...) So, we pick a probability at random and compare it with the desired probability that something should be true. Then we result in true depending on that comparison. Look at it more graphically:

    0%                          100%
    +----------------------------+
    |                 :          |
    +----------------------------+
                      ^
                      |____ p

In the space of all probabilities, p represents our desired percent chance of true values. As you can see from the diagram, exactly p percent of all probabilities lie to p's left within this space. Thus, to generate true p percent of the time, all we need do is code like this:

    double p;   // .63457 represents 63.457% true's

    bool rain_today = rand() % RAND_MAX / (RAND_MAX - 1.0) <= p;

Again, the rand code there will generate a random value between 0 and 1 in decimal form. We then see if this is one of the probabilities left of p within the space of all probabilities. If so, we generate true (the less-than-or-equal-to will be true). If not, we get false (this should happen 1-p percent of the time).

Beating the Spread

So I'm getting random numbers, but they aren't spreading out right for my desired range. In particular:

random longs are clumped at the beginning of the range
random doubles are spread out, but jump over/never hit large groups of values

To alleviate these problems, first realize that not all systems created RAND_MAX equal. On some systems it is as small as 32767! So to combat the clumping/spreading issues, we'll have to account for this potential limitation.

Fixing long Clumping

The simplest way to fix a clumping issue with a random long integer is to build it in digit groups. (Kinda the opposite of what we did with modulo that time.) For instance, if we wanted to create a random number between 100000 and 999999, we can just create two numbers between 000 and 999 and put them together:

    short top = rand() % 1000, bottom = rand() % 1000;
    long final = top * 1000L + bottom;

Note that I have used the L suffix on my multiplier to ensure a long integer result from a predominantly short integer calculation.

Of course, this isn't quite right. We wanted the final value to be between 100000 and 999999, but the one we've generated is between 0 and 999999. We just need to tweak the 'top' half calculation a little:

    short top = rand() % (999-100+1) + 100, bottom = rand() % 1000;
    long final = top * 1000L + bottom;

Now the values should come out just fine. If you need a different range of values, just change how many digits or how many groups. You can have up to four (4) digits in a group safely and portably. You can put together groups at least to nine (9) digits long — and some ten (10) digits in length.

Fixing double Spread

If you need hundreds of thousands of values to come out of a random double calculation and your RAND_MAX is only 32767, then you are out-of-luck. Well, it appears so at first. The values you'll generate won't clump like the long integer case — they will spread throughout the desired range. Unfortunately, they will spread widely — with long ranges of never picked values in between each sometimes picked value pair.

This is not as easy to fix as the clumping problem, but can be overcome with a bit of planning. What we'll need to do is break our initial desired range down into ranges of 30000 values or less and then roll a die to decide which of these ranges to randomly generate. Keep in mind that the fewer values in each of the ranges, the more precise the values you can generate can be.

Let's say you wanted values between 1e5 and 9.99999e5. We can make ourselves instead 300 ranges from 1e5 to 1.0333333e5, 1.0333333e5 to 1.0666666e5, etc.

It seems insurmountable at first, no? But it isn't! Now we'll just pick which of the 300 ranges we want our value to come from with a simple random integer pick:

    short range = rand() % 300;

Note that we've designated the first group to be group zero (0) and the last group is group 299.

Next generate a value in the smaller size of [0, 3.33333e3] to tell us where we are within our selected range:

    double slide = rand() % RAND_MAX / (RAND_MAX - 1.0) * 3.33333e3;

Now just put these pieces of information together:

    //             min    which    width      where
    //                    range               within
    double final = 1e5 + range * 3.33333e3 + slide;

See? As simple as pie! (Yes, pie is indeed hard. But...)

Example Usage

Here are the examples from lecture.