C++: Shooting yourself in the foot #4

written on Thu 07 February 2019

C++11 has introduced a better way to generate random numbers than the immortal srand(time(NULL)) and rand() % N method. However, this family of functions sometimes may behave in a not very intuitive way, especially when it comes to multi-platform programming.

TL;DR: before commenting, please at least read the Conclusion from the end of this blog post. Thanks.

Generators

The C++ way of generating random numbers is based on functions such as std::default_random_engine, which provides us with actual random numbers. Here's a small snippet showing you how can you use this class to generate your own set of numbers:

#include <random>
#include <iostream>

int main() {
   std::default_random_engine e(12345);
   std::default_random_engine::result_type a, b, c;

   a = e();
   b = e();
   c = e();

   std::cout << a << " " << b << " " << c << "\n";
}

The number 12345 is a seed.

Update: Yes, you shouldn't use a constant seed in the initialization phase of your random number generator. If you want to have properly generated random numbers, make sure your seed is always properly initialized i.e. by using std::random_device. Even more, you should be aware what generation engine you'll use. For example, if you use std::mt19937, there's a property std::mt19937::state_size which contains information about the internal size of the state buffer. You need to feed this many bytes to this generator's initialization function as a seed, possibly by using i.e. std::seed_seq class. You also shouldn't directly use any timestamps as your seed. Use a good source of entropy for this. However, this subject (proper seeding of the engine) is beyond the scope of this blog post, so I won't dig it further. Since we're doing research here, I'll continue with using constant seeds.

So the expectation is that for each seed the number generator should produce the same set random values, but in reality, a different set of numbers is generated when compiling this code under Visual Studio and GCC:

Compiler Generated numbers
Visual Studio 2017 3992670690 3823185381 1358822685
GCC 8 207482415 1790989824 2035175616

This means we need to assume the default_random_engine produces different random numbers on a different compiler. Why?

It appears that default_random_engine is not an engine by itself, it's just a meta-class which points to an actual implementation class. On Visual Studio, the implementation algorithm is the Mersenne Twister generator (std::mt19937), while on GCC it's chosen to be a multiplicative congruential pseudo-random number generator (std::minstd_rand0). So what happens if we change our program to use one of those algorithms instead of the meta-class?

#include <random>
#include <iostream>

int main() {
   std::mt19937 e(12345);
   std::mt19937::result_type a, b, c;

   a = e();
   b = e();
   c = e();

   std::cout << a << " " << b << " " << c << "\n";
}

We'll get consistent results this time:

Compiler Generated numbers
Visual Studio 2017 3992670690 3823185381 1358822685
GCC 8 3992670690 3823185381 1358822685

Same thing happens if you'll use std::minstd_rand0 instead of std::mt19937 -- generated numbers are consistent between compilers, systems and standard libraries. It's actually a result of including the implementation of generator algorithms inside the C++ standard; they need to work the same way on each platform. By looking at the implementation classes for mt19937 and minstd_rand0 we can see that they are only specializations for more general classes: std::mersenne_twister_engine and std::linear_congruential_engine, respectively. You can check those sections in the N3337 standard to see if the engine you're using is mandated or not:

Generator engine Mandated by N3337?
std::mersenne_twister_engine yes, 26.5.2.3, rand.eng.mers
std::linear_congruential_engine yes, 26.5.3.1, rand.eng.lcong
std::discard_block_engine yes, 26.5.4.2, rand.adapt.disc
std::subtract_with_carry_engine yes, 26.5.3.3, rand.eng.sub
etc...

This seems to apply both for the actual engine functions, and engine adaptor functions, that require the use of some other engine to operate correctly. So if you're using a concrete implementation of a number generator engine, your code should be safe when compiling on a different platform.

Distributions

The C++ library contains more functions that deal with random numbers. What about the situation where you'd like to limit to i.e. 4 possible outcomes? Before you could write: 10 + rand() % 5 to get a random number from the 10-14 range. But with C++11 you can use a distribution function that is more flexible than that.

To generate A, B, C or D outcomes according to this probability table:

Outcome Probability of happening
A 41%
B 9%
C 40%
D 10%

we can use this code snippet:

#include <random>
#include <iostream>

int main() {
    std::mt19937 generator(200);
    std::discrete_distribution<int> distribution({41, 9, 40, 10});

    for(int i = 0; i < 16; i++) {
        char a = 'A' + distribution(generator);
        std::cout << a;
    }

    std::cout << "\n";
    return 0;
}

Let's try to run it on different platforms.

Platform Result
MSYS2 GCC 8 BAACCABCCDCACCAA
Linux GCC 8 BAACCABCCDCACCAA
Visual Studio 2017 BAACCABCCDCACCAA

It appears we're getting consistent result across different compilers. Let's try a different distribution function:

#include <random>
#include <iostream>

int main() {
    std::mt19937 generator(200);
    std::uniform_int_distribution<int> distribution(0, 3);

    for(int i = 0; i < 16; i++) {
        char a = 'A' + distribution(generator);
        std::cout << a;
    }

    std::cout << "\n";
    return 0;
}

and here's our truth table:

Platform Result
MSYS2 GCC 8 DBAACBBCDDAABBDC
Linux GCC 8 DBAACBBCDDAABBDC
Linux Clang 7 DBAACBBCDDAABBDC
Visual Studio 2017 CBAACDADBCDBBDCC

No such luck here. Why the results are the same with std::discrete_distribution and are different for std::uniform_int_distribution?

Maybe N3337 can shed some light to this mystery:

26.5.8.1, 3. The algorithms for producing each of the specified distributions are implementation-defined.

This point effectively closes our way to produce multi-platform code that uses number distribution functions. It's pure luck that some distributions produce consistent results, while other are different. Even more, there's no warranty that the results of distribution functions will be consistent on the same platform in the future. So, if you want your results to be reproducible, don't use C++'s distribution functions.

VS extensions in generators

It appears that the C++ standard actually allows writing various extensions to its classes. There is one interesting point in N3337:

1.4.8 -- A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.

One example of such extension is std::random_device. The standard doesn't require it to be able to generate cryptographically-secure numbers. The Microsoft implementation does produce cryptographically-secure numbers. But if you want your code to be portable, you need to assume this class produces plain random numbers, unsuited for use for encryption or digital signing.

VS extensions in distributions

Another extension which may introduce more incompatibilities seems to exist in the implementation of Microsoft's distribution classes. It appears that under Visual Studio it's possible to make a distribution object const, like in this small program:

#include <random>
#include <iostream>

int main() {
    std::mt19937 generator(200);
    const std::discrete_distribution<int> distribution({41, 9, 40, 10});

    for(int i = 0; i < 16; i++) {
        char a = 'A' + distribution(generator);
        std::cout << a;
    }

    std::cout << "\n";
    return 0;
}

The problem is that the standard does not mark operator() as a const method. So it shouldn't be possible to call operator() on a const object. However, under VS this program will compile and run just fine. This is because apparently Visual Studio uses the "implementations can use extensions" kludge from the standard here. GCC, which seems to stick to C++ standard a little bit more, won't allow this program to be compiled. This means if that if you take code written using GCC, it will compile under VS fine. But the code taken from VS may not compile under GCC.

An interesting supplement for this incompatibility is the MSDN documentation for std::discrete_distribution. The MSDN states that this code shouldn't be able to be compiled, because there's no const specifier in operator()'s definition. Yet it does. With Visual Studio, prepare to be able to do impossible things.

Of course, if you stick to only Visual Studio, it will never be a problem. But if you want your code to be portable, you need to have your distribution objects non-const.

Conclusion

If you want to have your random values consistent between different platforms, use specific number generators engine like std::mt19937 or std::minstd_rand0 instead of default_random_engine. Do not use number distribution functions because they're not mandated by the standard, and your standard library is free to use whatever distribution algorithms it likes. Random numbers generated by all distribution functions are not reproducible, so you may want to avoid this family of functions.

Don't mark distribution objects as const, because a const distribution object is a Microsoft extension of the standard. Your code won't compile on other platforms.

Don't assume std::random_device produces numbers that are cryptographically secure. The Microsoft implementation is; but the standard doesn't require it to be, so other platforms probably will not produce secure numbers.

This entry was tagged on #c++ and #rant