C++: Shooting yourself in the foot #1
Hi,
Recently I was browsing some code, and I saw some construct that maybe is not very obvious to everyone. The code was located inside an important function, which took a hash in the argument and performed some action based on the contents of this hash.
Pseudocode:
typedef process_t {
// ...
uint8_t hash[20];
};
void perform(int pid, uint8_t hash[20]) {
for(const process_t& proc: process_list) {
if(!memcmp(hash, proc.hash, sizeof(hash))) {
perform_on_proc(proc);
}
}
}
// ...
More or less: the code gathered information about all processes currently
running in the system, and calculated some hash value for each process. Then,
the perform()
function would be called with some hash in the argument. This
function would search for specific process that had the requested hash, and
performed some action on this process.
So what's wrong with this function? Let's find out! You can see very clearly that something is wrong in at least two places. First, the function's mangled type gets converted to something different that was written in the source code:
$ nm test.exe | grep test | c++filt
00000000004015b0 T test(int*)
(where it should have been test(unsigned char [20])
or something similar),
but also the disassembly is very clear on this topic:
0000000000407ce0 <main>:
407ce0: 48 81 ec 28 08 00 00 sub $0x828,%rsp
407ce7: e8 64 9a ff ff callq 401750 <__main>
407cec: 48 8d 4c 24 20 lea 0x20(%rsp),%rcx
407cf1: e8 ba 98 ff ff callq 4015b0 <test(int*)>
407cf6: 31 c0 xor %eax,%eax
407cf8: 48 81 c4 28 08 00 00 add $0x828,%rsp
407cff: c3 retq
As we can already see in offset 0x407cec
, %rcx
register contains some stack
address: our table, which was allocated on the stack. Moreover, the %rcx
address, which is a pointer, not a copy of the whole array, will be passed to
the test()
function as the first argument (I'm testing this on Visual
Studio's C++ compiler, so the ABI generated in this executable is subject to
Microsoft's ABI parameter passing convention, you can read about this
here).
But why it's a pointer, not a table, you might ask? Since the declaration of
the hash
variable inside the argument list is so clear, why it's silently
converted to a pointer type without the user knowing what is happening? Well,
the C++ specs are quite clear on this subject:
8.3.5 Functions
[...]
- [...] After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively. [...]
I read it as:
Here is some logic, but for arrays in function arguments let's just skip this entire logic, and use a different thing instead. Also, treat is as a feature so that some old VAX code will compile without problems.
Since we now know that the test()
function receives a pointer, not a table,
the fact that sizeof(hash)
will return 8
(if the binary is compiled in
64-bit mode) instead of 20
shouldn't be a surprise. This means that the
memcmp()
compares only 8 bytes of both hashes instead of full 20 bytes, which
is a clear bug. Changing this to sizeof(proc.hash)
fixes the problem... but
the distaste still remains ;).
The proper way in C++ to pass an array as an argument, so that it will be interpeted as an array later in the function is to pass a reference to this array, like this:
#include <cstdio>
void test(int (&arr)[4]) {
printf("size of arr: %zd\n", sizeof(arr));
}
int main() {
int arr[4] = { 0, 1, 2, 3 };
test(arr);
return 0;
}
because this way the "8.3.5.5 kludge" won't be triggered. Or simply use
std::array
or std::vector
. Or... switch to
Rust ;)
PS. In case you're wondering: this applies to C as well. C++ folks introduced this because they apparently didn't want to diverge from C that much.
G.