http://anadoxin.org/blog

Control over symbol exports in GCC

Thu, 30 October 2014 :: #cpp :: #linux

When dealing with creation of shared objects, one should keep in mind that the longer is the list of their exported symbols, the longer time is taken by the dynamic linker during the loading process.

There is much information regarding the rules which should be followed when one tries to create a shared object. There is a great paper written by the glibc maintainer, Ulrich Drepper, which covers most of the topics. You can find the paper on your favourite search engine (try searching for "How to write shared libraries by Ulrich Drepper").

One of the important points is the restriction of exported symbols in a shared object. When creating an ELF shared object (by convention, a file named as lib*.so), all symbols are marked as public, hence included in the object's export symbol table. Obviously, existence of every symbol in the export table is not optimal. Export table should be populated only by the API of the shared object, because nobody every will (nobody ever should) use any of the internal structures, which the symbols are describing. Exported symbols are often considered as an Application Binary Interface (ABI), which should be compatible between different library versions. Maintaining the ABI of an internal structure effectively removes the possibility of any practical modification.

Normally, when you create a shared object on Linux, all of its symbols are exported by default. Consider this example:

#include <iostream>
using namespace std;

void function1() {
	cout << "hi\n";
}

void entry_point() {
	function1();
}

The example shows a shared object that contains an API function (that is a function designed to be called from a different module, an entry point to the shared library's functionality). This entry_point API function uses a helper function, function1 to perform some task. Let's compile this shared object:

$ g++ code.cpp -o libcode.so -shared -fPIC

and let's glance at the export table:

$ nm -CD libcode.so | grep " T "
00000000000007c8 T entry_point()
00000000000007ac T function1()
0000000000000868 T _fini
0000000000000668 T _init

We have 4 symbols insite the text segment of libcode.so. _fini and _init are automalically added export symbols that relate to initialization and finalization of the object. Two other symbols are ours. Our goal is to make a shared library that will put entry_point in the export table, and at the same time leaving function1 from this table.

There are two methods for doing this: by using the -fvisibility option, and linker (ld) version script.

Using `fvisibility' compiler option

The compiler flag -fvisibility=hidden is designed for dealing with exactly this kind of problems. By using this option, the user tells the compiler to exclude all of the symbols from the export table:

$ g++ code.cpp -o libcode.so -shared -fPIC -fvisibility=hidden
$ nm -CD libcode.so | grep " T "
00000000000007e8 T _fini
00000000000005f8 T _init

But, we want also to include the entry_point function in the table, because it's our API function. This is why we need to change the source code a little bit:

#include <iostream>
using namespace std;

void function1() {
	cout << "hi\n";
}

__attribute__ ((visibility ("default"))) void entry_point() {
	function1();
}

After compilation, our goal is matched:

$ g++ code.cpp -o libcode.so -shared -fPIC -fvisibility=hidden
$ nm -CD libcode.so | grep " T "
0000000000000778 T entry_point()
0000000000000818 T _fini
0000000000000628 T _init

The __attribute__ ((visibility ("default"))) is very similar to __declspec(dllexport) that can be found on Microsoft's compilers. In fact, GCC should handle it as well. The attribute overrides the -fvisibility=hidden compiler option only for only one symbol, and makes it public. That's why it's included in the export table.

Static library used by a shared library problems

However, there is one problem with this approach. If our shared library is large enough, it may use other libraries that are statically linked. Let's consider another example.

Let's suppose we have a static library, libutil.a, that is statically linked into our shared library, libcode.so.

Here is our util.cpp file, that is the source of the static library:

#include <iostream>
using namespace std;

void util_function() {
	cout << "hello from util function\n";
}

Let's compile it, and build a static library from this code:

$ g++ util.cpp -o util.o -c -fPIC
$ ar r libutil.a util.o

We have now libutil.a static library which can be used in our shared object. Let's modify the shared object to include a reference to the code of libutil.a (without it, libutil.a would be dropped in the linking process):

#include <iostream>
using namespace std;

extern void util_function();

void function1() {
	util_function();
	cout << "hi\n";
}

__attribute__ ((visibility ("default"))) void entry_point() {
	function1();
}

Let's compile our shared library and see its export table:

$ g++ code.cpp libutil.a -o libcode.so -shared -fPIC -fvisibility=hidden
$ nm -CD libcode.so | grep " T "
00000000000007ed T entry_point()
0000000000000858 T util_function()
0000000000000918 T _fini
0000000000000688 T _init

As you can see, util_function is included in the list, but we don't want it there. How to fix this?

One solution is to add -fvisibility=hidden to libutil.a library's build process, but sometimes we don't have the control over it. What then?

Linker scripts to the rescue.

Using `ld' linker version script

We need to get rid of the util_function symbol during the linking process, because we may not have any control over the compilation of libutil.a library.

Fortunately, ld supports export table filtering (and much more) by using a "version script", which will contain some definitions on what to include, and what to skip. It even supports wildcards which also may help.

So, let's skip the -fvisibility=hidden option for now, and build our shared library like this:

$ g++ code.cpp libutil.a -o libcode.so -shared -fPIC

Then, create a text file, libcode.version, with this content:

CODEABI_1.0 {
	global: *entry_point*;
	local: *;
};

This version file tells the linker, that all symbols (*) should be considered as local symbols (that is: hidden), and all symbols that match the wildcard *entry_point* should be considered as global (so, visible).

Compilation is done by the g++ driver:

$ g++ code.cpp libutil.a -o shared -fPIC -Wl,--version-script=libcode.version

Verification:

$ nm -CD libcode.so | grep " T "
000000000000074d T entry_point()

It even stripped out the _init and _fini symbols. Job done!

The problem with this approach is that it can't handle some more complicated scenarios, like filtering only some symbols that are using C++ templates. Some of the template-based symbols in C++ can easily grow up to few hundred characters, but you probably know what I mean. Once you start using functions from std::, you'll know.

Please do some reading about the linker's version scripts, because it allows you to perform some really cool things, like symbol versioning!