C++: Shooting yourself in the foot #3
Everyone says that programming is complicated. Is it? Well, actually it is, especially if you're using your programming language in a non-recommended way. Take file input/output for example.
In C++ there are some ways you can perform file I/O: std::ifstream
is one example. Multiple frameworks like Qt offer classes like QFile
that allow the programmer to read and write to files. But sometimes we want pure speed and hand-crafted optimalization. Well, who is better at those things than the C programmer?
In C, we can use fopen(3)
to open the file, fread(3)
to read from it, and fwrite(3)
to write to the file. This family of functions are using special internal buffers, so we don't risk performing syscalls too often than what's necessary.
So let's write a simple program that will create a read/write stream in the form of a file named hello
.
#include <cstdio>
#include <cstdint>
#include <cassert>
int main() {
uint8_t buf[] = { 1, 2, 3, 4, 5, 6 };
uint8_t buf2[256] = {};
FILE* fw = fopen("hello", "wb+");
fwrite(buf, sizeof(buf), 1, fw));
fread(buf2, sizeof(buf2), 1, fw);
fclose(fw);
return 0;
}
This program should write 6 bytes into the hello
file. Then, it should read some bytes from this file, but most probably the read operation will fail, since obviously we're at the end of the file and there's nothing to read. Unless the fopen
family of functions uses two separate pointers, read and write, in which case we should read what we've just written. Well, let's see what will happen. Oh, and let's use Windows and Visual Studio! ;)
Wait a minute...
~ $ cat hello | xxd
00000000: 0102 0304 0506 0000 e020 d56a fc01 0000 ......... .j....
00000010: 7300 7900 7300 3600 3400 5c00 6800 6f00 s.y.s.6.4.\.h.o.
00000020: 6d00 6500 5c00 6100 6e00 7400 6500 6b00 m.e.\.a.n.t.e.k.
00000030: 5c00 6400 6500 7600 5c00 7400 6500 7300 \.d.e.v.\.t.e.s.
00000040: 7400 7300 5c00 6600 6f00 7000 6500 6e00 t.s.\.f.o.p.e.n.
00000050: 5f00 7700 7200 6900 7400 6500 5c00 6800 _.w.r.i.t.e.\.h.
00000060: 6500 6c00 6c00 6f00 0000 0000 0000 0000 e.l.l.o.........
00000070: 0000 0000 0000 0000 0f00 000f 7878 0000 ............xx..
00000080: e020 d56a fc01 0000 5001 d46a fc01 0000 . .j....P..j....
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000100: 0000 0000 0000 ......
What the hell is going on? We wanted to write 6 bytes, but the size of our new file is 0x106 bytes. There's also a lot of stuff that we didn't write. But how can fread(3)
actually write anything?
Are other platforms affected?
Linux - nope:
└─[0] <> cat hello | xxd
00000000: 0102 0304 0506 ......
OpenBSD - nope:
-> % cat hello | xxd
00000000: 0102 0304 0506 ......
FreeBSD - nope:
[antek@xxx ~/data/dev/tests/fopen_fwrite]$ cat hello | xxd
00000000: 0102 0304 0506 ......
So it's clear that it's Windows fault. Right? Once again: nope!
What we see here is actually a documented behavior. Let's see what POSIX documentation for fopen(3)
has to say:
When a file is opened with update mode ( '+' as the second or third character in the mode argument), both input and output may be performed on the associated stream. However, the application shall ensure that output is not directly followed by input without an intervening call to fflush() or to a file positioning function ( fseek(), fsetpos(), or rewind()), and input is not directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
So, calling fread
just after fwrite
is an undefined operation. If you've ever wondered what "undefined operation" actually means, this is it: a read operation can write to a file! When I read this POSIX document it actually sounds like a feature, like they did some actual work in order to make this function combination work like that.
So, this means that Linux/FreeBSD/OpenBSD happen to have this feature patched, and Windows doesn't, but it also doesn't need to have it patched. Documentation expects this bug to happen, so everything's fine (kind of).
The moral of the story is: don't use legacy stuff when programming in C++ and we're all going to be OK.