http://anadoxin.org/blog

C++: Shooting yourself in the foot #1

Sat, 20 January 2018 :: #cpp :: #rant

Hi,

Recently I was browsing some code, and I saw some construct that maybe is not very obvious to everyone. The code was located inside an important function, which took a hash in the argument and performed some action based on the contents of this hash.

Pseudocode:

typedef process_t {
    // ...
    uint8_t hash[20];
};

void perform(int pid, uint8_t hash[20]) {
    for(const process_t& proc: process_list) {
        if(!memcmp(hash, proc.hash, sizeof(hash))) {
            perform_on_proc(proc);
        }
    }
}

// ...

More or less: the code gathered information about all processes currently running in the system, and calculated some hash value for each process. Then, the perform() function would be called with some hash in the argument. This function would search for specific process that had the requested hash, and performed some action on this process.

So what's wrong with this function? Let's find out! You can see very clearly that something is wrong in at least two places. First, the function's mangled type gets converted to something different that was written in the source code:

$ nm test.exe | grep test | c++filt
00000000004015b0 T test(int*)

(where it should have been test(unsigned char [20]) or something similar), but also the disassembly is very clear on this topic:

0000000000407ce0 <main>:
  407ce0:	48 81 ec 28 08 00 00 	sub    $0x828,%rsp
  407ce7:	e8 64 9a ff ff       	callq  401750 <__main>
  407cec:	48 8d 4c 24 20       	lea    0x20(%rsp),%rcx
  407cf1:	e8 ba 98 ff ff       	callq  4015b0 <test(int*)>
  407cf6:	31 c0                	xor    %eax,%eax
  407cf8:	48 81 c4 28 08 00 00 	add    $0x828,%rsp
  407cff:	c3                   	retq

As we can already see in offset 0x407cec, %rcx register contains some stack address: our table, which was allocated on the stack. Moreover, the %rcx address, which is a pointer, not a copy of the whole array, will be passed to the test() function as the first argument (I'm testing this on Visual Studio's C++ compiler, so the ABI generated in this executable is subject to Microsoft's ABI parameter passing convention, you can read about this here).

But why it's a pointer, not a table, you might ask? Since the declaration of the hash variable inside the argument list is so clear, why it's silently converted to a pointer type without the user knowing what is happening? Well, the C++ specs are quite clear on this subject:

8.3.5 Functions

[...]

  1. [...] After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively. [...]

I read it as:

Here is some logic, but for arrays in function arguments let's just skip this entire logic, and use a different thing instead. Also, treat is as a feature so that some old VAX code will compile without problems.

Since we now know that the test() function receives a pointer, not a table, the fact that sizeof(hash) will return 8 (if the binary is compiled in 64-bit mode) instead of 20 shouldn't be a surprise. This means that the memcmp() compares only 8 bytes of both hashes instead of full 20 bytes, which is a clear bug. Changing this to sizeof(proc.hash) fixes the problem... but the distaste still remains ;).

The proper way in C++ to pass an array as an argument, so that it will be interpeted as an array later in the function is to pass a reference to this array, like this:

#include <cstdio>

void test(int (&arr)[4]) {
    printf("size of arr: %zd\n", sizeof(arr));
}

int main() {
    int arr[4] = { 0, 1, 2, 3 };
    test(arr);
    return 0;
}

because this way the "8.3.5.5 kludge" won't be triggered. Or simply use std::array or std::vector. Or... switch to Rust ;)

PS. In case you're wondering: this applies to C as well. C++ folks introduced this because they apparently didn't want to diverge from C that much.

G.