C++: Shooting yourself in the foot #4
C++11 has introduced a better way to generate random numbers than the immortal srand(time(NULL))
and rand() % N
method. However, this family of functions sometimes may behave in a not very intuitive way, especially when it comes to multi-platform programming.
TL;DR: before commenting, please at least read the Conclusion from the end of this blog post. Thanks.
Generators
The C++ way of generating random numbers is based on functions such as std::default_random_engine
, which provides us with actual random numbers. Here's a small snippet showing you how can you use this class to generate your own set of numbers:
#include <random>
#include <iostream>
int main() {
std::default_random_engine e(12345);
std::default_random_engine::result_type a, b, c;
a = e();
b = e();
c = e();
std::cout << a << " " << b << " " << c << "\n";
}
The number 12345
is a seed.
Update: Yes, you shouldn't use a constant seed in the initialization phase of your random number generator. If you want to have properly generated random numbers, make sure your seed is always properly initialized i.e. by using std::random_device
. Even more, you should be aware what generation engine you'll use. For example, if you use std::mt19937
, there's a property std::mt19937::state_size
which contains information about the internal size of the state buffer. You need to feed this many bytes to this generator's initialization function as a seed, possibly by using i.e. std::seed_seq
class. You also shouldn't directly use any timestamps as your seed. Use a good source of entropy for this. However, this subject (proper seeding of the engine) is beyond the scope of this blog post, so I won't dig it further. Since we're doing research here, I'll continue with using constant seeds.
So the expectation is that for each seed the number generator should produce the same set random values, but in reality, a different set of numbers is generated when compiling this code under Visual Studio and GCC:
Compiler | Generated numbers |
---|---|
Visual Studio 2017 | 3992670690 3823185381 1358822685 |
GCC 8 | 207482415 1790989824 2035175616 |
This means we need to assume the default_random_engine
produces different random numbers on a different compiler. Why?
It appears that default_random_engine
is not an engine by itself, it's just a meta-class which points to an actual implementation class. On Visual Studio, the implementation algorithm is the Mersenne Twister generator (std::mt19937
), while on GCC it's chosen to be a multiplicative congruential pseudo-random number generator (std::minstd_rand0
). So what happens if we change our program to use one of those algorithms instead of the meta-class?
#include <random>
#include <iostream>
int main() {
std::mt19937 e(12345);
std::mt19937::result_type a, b, c;
a = e();
b = e();
c = e();
std::cout << a << " " << b << " " << c << "\n";
}
We'll get consistent results this time:
Compiler | Generated numbers |
---|---|
Visual Studio 2017 | 3992670690 3823185381 1358822685 |
GCC 8 | 3992670690 3823185381 1358822685 |
Same thing happens if you'll use std::minstd_rand0
instead of std::mt19937
-- generated numbers are consistent between compilers, systems and standard libraries. It's actually a result of including the implementation of generator algorithms inside the C++ standard; they need to work the same way on each platform. By looking at the implementation classes for mt19937
and minstd_rand0
we can see that they are only specializations for more general classes: std::mersenne_twister_engine
and std::linear_congruential_engine
, respectively. You can check those sections in the N3337 standard to see if the engine you're using is mandated or not:
Generator engine | Mandated by N3337? |
---|---|
std::mersenne_twister_engine | yes, 26.5.2.3, rand.eng.mers |
std::linear_congruential_engine | yes, 26.5.3.1, rand.eng.lcong |
std::discard_block_engine | yes, 26.5.4.2, rand.adapt.disc |
std::subtract_with_carry_engine | yes, 26.5.3.3, rand.eng.sub |
etc... |
This seems to apply both for the actual engine functions, and engine adaptor functions, that require the use of some other engine to operate correctly. So if you're using a concrete implementation of a number generator engine, your code should be safe when compiling on a different platform.
Distributions
The C++ library contains more functions that deal with random numbers. What about the situation where you'd like to limit to i.e. 4 possible outcomes? Before you could write: 10 + rand() % 5
to get a random number from the 10-14
range. But with C++11 you can use a distribution function that is more flexible than that.
To generate A, B, C or D outcomes according to this probability table:
Outcome | Probability of happening |
---|---|
A | 41% |
B | 9% |
C | 40% |
D | 10% |
we can use this code snippet:
#include <random>
#include <iostream>
int main() {
std::mt19937 generator(200);
std::discrete_distribution<int> distribution({41, 9, 40, 10});
for(int i = 0; i < 16; i++) {
char a = 'A' + distribution(generator);
std::cout << a;
}
std::cout << "\n";
return 0;
}
Let's try to run it on different platforms.
Platform | Result |
---|---|
MSYS2 GCC 8 | BAACCABCCDCACCAA |
Linux GCC 8 | BAACCABCCDCACCAA |
Visual Studio 2017 | BAACCABCCDCACCAA |
It appears we're getting consistent result across different compilers. Let's try a different distribution function:
#include <random>
#include <iostream>
int main() {
std::mt19937 generator(200);
std::uniform_int_distribution<int> distribution(0, 3);
for(int i = 0; i < 16; i++) {
char a = 'A' + distribution(generator);
std::cout << a;
}
std::cout << "\n";
return 0;
}
and here's our truth table:
Platform | Result |
---|---|
MSYS2 GCC 8 | DBAACBBCDDAABBDC |
Linux GCC 8 | DBAACBBCDDAABBDC |
Linux Clang 7 | DBAACBBCDDAABBDC |
Visual Studio 2017 | CBAACDADBCDBBDCC |
No such luck here. Why the results are the same with std::discrete_distribution
and are different for std::uniform_int_distribution
?
Maybe N3337 can shed some light to this mystery:
26.5.8.1, 3. The algorithms for producing each of the specified distributions are implementation-defined.
This point effectively closes our way to produce multi-platform code that uses number distribution functions. It's pure luck that some distributions produce consistent results, while other are different. Even more, there's no warranty that the results of distribution functions will be consistent on the same platform in the future. So, if you want your results to be reproducible, don't use C++'s distribution functions.
VS extensions in generators
It appears that the C++ standard actually allows writing various extensions to its classes. There is one interesting point in N3337:
1.4.8 -- A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.
One example of such extension is std::random_device
. The standard doesn't require it to be able to generate cryptographically-secure numbers. The Microsoft implementation does produce cryptographically-secure numbers. But if you want your code to be portable, you need to assume this class produces plain random numbers, unsuited for use for encryption or digital signing.
VS extensions in distributions
Another extension which may introduce more incompatibilities seems to exist in the implementation of Microsoft's distribution classes. It appears that under Visual Studio it's possible to make a distribution object const
, like in this small program:
#include <random>
#include <iostream>
int main() {
std::mt19937 generator(200);
const std::discrete_distribution<int> distribution({41, 9, 40, 10});
for(int i = 0; i < 16; i++) {
char a = 'A' + distribution(generator);
std::cout << a;
}
std::cout << "\n";
return 0;
}
The problem is that the standard does not mark operator()
as a const
method. So it shouldn't be possible to call operator()
on a const
object. However, under VS this program will compile and run just fine. This is because apparently Visual Studio uses the "implementations can use extensions" kludge from the standard here. GCC, which seems to stick to C++ standard a little bit more, won't allow this program to be compiled. This means if that if you take code written using GCC, it will compile under VS fine. But the code taken from VS may not compile under GCC.
An interesting supplement for this incompatibility is the MSDN documentation for std::discrete_distribution
. The MSDN states that this code shouldn't be able to be compiled, because there's no const
specifier in operator()
's definition. Yet it does. With Visual Studio, prepare to be able to do impossible things.
Of course, if you stick to only Visual Studio, it will never be a problem. But if you want your code to be portable, you need to have your distribution objects non-const
.
Conclusion
If you want to have your random values consistent between different platforms, use specific number generators engine like std::mt19937
or std::minstd_rand0
instead of default_random_engine
. Do not use number distribution functions because they're not mandated by the standard, and your standard library is free to use whatever distribution algorithms it likes. Random numbers generated by all distribution functions are not reproducible, so you may want to avoid this family of functions.
Don't mark distribution objects as const
, because a const
distribution object is a Microsoft extension of the standard. Your code won't compile on other platforms.
Don't assume std::random_device
produces numbers that are cryptographically secure. The Microsoft implementation is; but the standard doesn't require it to be, so other platforms probably will not produce secure numbers.