ABI: problems of C++ programs compatibility at the binary interface level
Introduction
The C++ language standard strictly defines semantics of all language constructs. However, it does not specify how these constructs must be implemented at binary level (in object code). As there is no common application binary interface (ABI) standard, C++ compiler vendors have come up their own ABI variations. As a result, problems can arise when object files produced by different compilers are linked together. These problems are usually caused either by the differences in memory layout of compiler-generated data structures, different calling conventions, or differences in name mangling.
Morpher uses the LLVM-GCC compiler which complies with Generic C++ ABI (also often called “ITANIUM C++ ABI”, since it was initially developed for Itanium processor architecture). The Generic C++ binary interface specification was developed jointly by CodeSourcery, Compaq, EDG, HP, IBM, Intel, Red Hat and SGI. The following compilers comply with the Generic C++ ABI: GCC (from version 3.x upwards); Clang and llvm-gcc; Linux versions of Intel and HP compilers, and compilers from ARM. Generic C++ ABI defines the following:
- Layout of both built-in and user-defined types and compiler-generated data structures in memory and peculiarities of handling them.
- POD types layout (the “plain old” C data)
- Non-POD types layout (user-defined C++ types which can support dynamic polymorphism)
- Layout of virtual function tables
- State of the virtual function tables during the object creation process
- Peculiarities of memory allocation for an array using operator “new”
- Initialization of guard variables, which control the initialization of function-level static variables and static class members.
- Layout of structures used to implement run-time type information (RTTI).
- Special aspects of the RTTI implementation, for example, dynamic_cast<T>(v) algorithm.
- Details of how virtual and non-virtual functions are called, and the behavior of constructors and destructors.
- Exception handling mechanisms.
- Behaviour at the linking stage:
- Name mangling (i.e., encoding) of external names (“external” means being visible outside the object file where they occur)
- Vague linkage rules. In some cases, some entities can be defined in several object files; however, in the final program, only one copy should be preserved. For example, it can happen with out-of-line functions (inline functions which the compiler has decided not to inline), virtual function tables, typeinfo information, and instantiated template classes.
- Details of the unwind table layout. Unwind tables are used for unwinding the stack during the process of exception handling.
Unfortunately, not many software developers rush towards binary compatibility - for instance, the Microsoft compilers use their own C++ ABI. And because Morpher produces binary object files, users of compilers that do not stick to the Generic C++ ABI can face compatibility problems.
If it is not possible to rebuild the conflicting libraries using Morpher or a Morpher-compatible compiler, then one should access them via C wrappers, since the binary compatibility problems for C are not as severe.
With extern "C" you can avoid problems with non-compatible name mangling which are common for many C++ compilers. This way the definitive C mangling will be used to form external names.
Let us take the following function as an example:
void my(int i, char c, float x);
After mangling, which is defined in Generic С++ ABI, external name of the function will look like «_Z1myicf». The mangled name encodes the types of the function arguments and the namespace that the function resides in. Other compilers can use different methods for forming external names.
Thus, in some cases a C wrapper can be written on top of C++ library code using the same compiler. Let’s discuss the possible options to solve the problem (taken from reference [3]).
To call an external C function in C++ code, it’s enough to describe its prototype in the following way:
extern "C" void my_c_function(int i, char c, float x);
If multiple functions are used, you can shorten it a little bit:
extern "C" {
void my_c_function1(int i);
void my_c_function2(char c);
void my_c_function3(float x);
* * *
}
To make a C++ function accessible for linking with C code define it in the following way:
extern "C" void my_cpp_function(int i);
And then implement it in one of the modules:
void my_cpp_function(int i) {/* DO SOMETHING */};
Now we can use this function in C program (or, in our case, in the C wrapper)
You are not allowed to refer to the objects of a non-compatible library directly. However you can define C interfaces for them. We just need to remember about this. When you call foo.bar(x), the address of foo will be transferred as well. In our next example we assume that the wrapper has FOO_PROXY macro defined:
/* Foo.h Initiate it in the used C++ module as well as in the C-cover */
#ifndef FOO_H
#define FOO_H
#ifdef FOO_PROXY
class Foo {
public:
Foo();
int bar1(int);
char bar2(char);
float bar3(float);
private:
void bar0();
int m_;
};
#else
typedef struct Foo Foo;
#endif
extern "C" {
extern int c_Foo_proxy_bar1(Foo*, int);
extern char c_Foo_proxy_bar2(Foo*, char);
extern float c_Foo_proxy_bar3(Foo*, float);
}
#endif
Wrapper code has the following C interface
// Foo.cpp
#include "Foo.h"
//implementation of class methods
Foo::Foo() : m_(0) {}
int Foo::bar1(int i) { /* DO SOMETHING */ return i; }
char Foo::bar2(char c) { /* DO SOMETHING */ return c; }
float Foo::bar3(float x) { /* DO SOMETHING */ return x; }
void Foo::bar0() { /* DO SOMETHING */ };
//opening of a public interface for C:
int c_Foo_proxy_bar1(Foo *fooThis, int i) { return fooThis->bar1(i); }
char c_Foo_proxy_bar2(Foo *fooThis, char c) { return fooThis->bar2(c); }
float c_Foo_proxy_bar3(Foo *fooThis, float x) { return fooThis->bar3(x); }
The following code uses the wrapper to manipulate objects from non-compatible library:
//main.cpp
#include "Foo.h"
#include "etc..."void c_manipulator(Foo* fooThis){
/* Manipulating... */
if (c_Foo_proxy_bar1(fooThis, rand())) {
c_Foo_proxy_bar2(fooThis, 'X');
} else {
c_Foo_proxy_bar3(fooThis, 3.14);
}
}
int main()
{
...
//getting an indicator to the object Foo
c_manipulator(&foo); // use
}
There should be no serious problems when using POD data. However, more complex cases can occur: instances of classes that contain virtual functions, derivatives from classes that have virtual functions. In such cases we must step down to the lowest level and go into the details of a certain ABI. It is a very difficult task. Fortunately, pointers can be of some help.
It’s not hard to imagine that at some point in the future all compiler vendors will adopt a single common C++ ABI standard and this problem will stop being actual. Due to Generic C++ ABI being the one supported by the largest number of compilers (Morpher included), it is the first candidate to become that common standard.
Appendix: An example of multiple choice possibilities to create binary level equivalents
As an example that explains binary compatibility details, let’s consider how the compiler-generated structures will look like for the following source code:
#include <iostream>
struct A {
int a;
virtual void f() {std::cout << "A:f()" << std::endl;}
virtual void g() {std::cout << "A:g()" << std::endl;}
virtual void h() {std::cout << "A:h()" << std::endl;}
};
struct B:public virtual A {
int b;
void f() {std::cout << "B:f()" << std::endl;}
};
struct C:public B {
int c;
void f() {std::cout << "C:f()" << std::endl;}
void g() {std::cout << "C:g()" << std::endl;}
virtual void k() {std::cout << "C:k()" << std::endl;}
};
int main() {
A oa;
A *poa = &oa;
poa->f(); poa->g(); poa->h();
C oc;
A *poa_in_c = &oc;
poa_in_c->f(); poa_in_c->g(); poa_in_c->h();
B *pob_in_c = &oc;
pob_in_c->f(); pob_in_c->g(); pob_in_c->h();
return 0;
}
The output is as follows:
A:f() // poa->f();
A:g() // poa->g();
A:h() // poa->h();
C:f() // poa_in_c->f();
C:g() // poa_in_c->g();
A:h() // poa_in_c->h();
C:f() // pob_in_c->f();
C:g() // pob_in_c->g();
A:h() // pob_in_c->h();
The following binary structures will be created for a x64-compatible system:
Without getting into details, let’s take a quick look at some aspects of the Generic C++ ABI approach to object layout.
- Every object (unless it belongs to a class without virtual functions), contains a pointer to a higher part of virtual function table (vtbl) at its very top (at this)
- The virtual function table vtbl is divided into two parts: higher and lower. The higher part (marked in white in the Figure)contains pointers to virtual function entry points. The lower part (marked in gray), which extends towards lower addresses, contains:
- A pointer to the RTTI record
- An offset from the current this to a this of descendant object (if there is none, this field equals 0)
- An offset of virtual parent objects bases from current this.
- Thunk adjusts the argument this for the virtual function (check for thunk occurrences in vtbl elements). The appropriate pointer in vtbl refers to this thunk. It is worth noticing that in some implementations vtbl contains not only the function address but also a correction to this. The virtual function calling algorithm does the adjustment. Thus the higher part of the vtbl holds only pointers.
- Every class with virtual base classes has its own vtbl. This vtbl contains addresses for overridden and newly defined virtual functions. It also contains copies of virtual tables for each virtual class along with the thunks for the overridden functions. Offsets of virtual parent objects bases from current this are also stored here.
- And so on.
It is obvious that there are many different possible implementations for each part of the C++ object model. In which order will the virtual functions addresses be placed into the table? The way they are defined? Or after sorting by a function name? If we do the sorting, will we take the names as they are or after mangling? In UPPERCASE, probably? We can exchange rtti and top_offset fields. What if we keep the rtti field in the object at once? Should we transfer the offsets of virtual parent objects bases into the object itself? Does it make sense to do this adjustments not through the thunk but explicitly? What if we switch higher and lower parts? Or completely get rid of vtable?
It makes no sense to require compatibility at binary level from different software developers when a strict common ABI standard does not exist. Generic C++ ABI is an attempt to create such a standard.

