GCC compatibility: inline namespaces and ABI tags
Keeping libraries binary-compatible with old versions is hard. Recently, GCC
was in the unenviable situation of having to switch its
std::string
implementation.1 GCC used inline
namespaces and ABI tags to minimize the extent of breakage and to ensure
that old and new versions could only be combined in a safe way. Here, we'll have a
look at those mitigation techniques: what they are and how they work.
First, some background: GCC used to have a copy-on-write implementation for
std::string
. However, C++11 does not allow this anymore
because of new iterator and reference invalidation rules. So, GCC 5.1
introduced a new implementation of std::string
. The new version was not
binary-compatible with the old one: exchanging strings between old (pre-5.1)
and new code would crash. We say the application binary interface (ABI)
changed.
To understand the consequences of this, let's look at several scenarios:
- A program uses only code compiled with GCC < 5.1: only old strings, works
- A program uses only code compiled with GCC >= 5.1: only new strings, works
- A program mixes code compiled with different GCC versions: both types of strings exist in the program, it could crash.
Let's dig into the last bullet point. When would it crash? Whenever
"new" code accesses an "old" string or vice versa. Here are some examples
where f()
is compiled with an older GCC version and called from "new"
code:
void f(int i); // (a) safe, no std::string involved
void f(const std::string& s); // (b) will crash when f accesses s
std::string f(); // (c) will crash when the returned string is used
Given how common std::string
is, such crashes would happen frequently if
"old" and "new" code were combined. Unfortunately, it's surprisingly easy to
end up with a program with some pre-5.1 parts and some newer ones. It's
sufficient to link a "new" executable to an "old" library. Given the bad
consequences, the GCC developers needed to solve this.
The solution is to change the symbol names of the GCC 5.1 std::string
and
all functions using it. Because the linker uses symbol names to resolve
function calls into libraries, this would cause link-time errors for cases (b)
and (c) while (a) would still work. Exactly the intended behavior.
How could this be done? The mechanism that converts C++ names into symbol names
is called name mangling. The generated symbol names contain information
about namespaces, function names, argument types, etc. Putting the new
std::string
into a separate namespace would change the
symbol names of all functions accepting std::string
as argument. But that's
crazy—std::string
needs to be in namespace std
, right?
The solution for this is inline namespaces. All classes, functions, and templates declared in an inline namespace are automatically imported into the parent namespace. Their mangled name still references the original location, though.
Here's what GCC 5.1 does:
namespace std {
inline namespace __cxx11 {
template<typename _CharT, ...> basic_string;
typedef basic_string<char> string;
}
}
Looking at f(const std::string& s)
, what would the symbol names be
when compiled with older and newer GCC versions?
Older GCC | GCC 5.1 | ||||
---|---|---|---|---|---|
Symbol name | _Z1fRKSs | _Z1fRKNSt7Decoded symbol name2 | f(std:: f(std::
|
Indeed, the symbol names are different and the linker would give an error if
GCC 5.1 code called f(const std::string &s)
from a library compiled
with an older GCC version. This solves the compatibility problem for all
functions taking std::string
as argument.
One problem remains: case (c) from above, std::string f()
. The return value
type of a function is not part of its mangled name.3 Thus, the symbol
name wouldn't change for functions that return std::string
, which could
still lead to runtime crashes.
GCC 5.1 solves this with the use of ABI tags. From the documentation:
The
abi_tag
attribute can be applied to a function, variable, or class declaration. It modifies the mangled name of the entity to incorporate the tag name, in order to distinguish the function or class from an earlier version with a different ABI[...]
When a type involving an ABI tag is used as the type of a variable or return type of a function where that tag is not already present in the signature of the function, the tag is automatically applied to the variable or function.
GCC applies the attribute __abi_tag__ ("cxx11"))
to the std::__cxx11
namespace. This affects all classes therein, including the new version of
std::string
. The ABI tag propagates to all functions that
return a string and changes their symbol name.
Let's look at the symbol names for std::string f()
for different compiler versions:
Older GCC | GCC 5.1 | |
---|---|---|
Symbol name | _Z1fv | _Z1fB5cxx11v |
Decoded symbol name2 | f() | f[abi:cxx11]()
|
Again, the symbol name is different, and the "cxx11" ABI tag is applied to
std::string f()
with GCC 5.1. This completes the second part of GCC's solution
for the std::string
ABI change.
These two methods to induce symbol name changes for anything using
std::string
make the migration path to GCC 5.1 much easier.
If the program links correctly it should work, and be safe from difficult-to-debug
runtime errors.
There would be more to discuss about the GCC 5.1 changes, such as how
libstdc++.so still exports the old std::string
implementation for
compatibility. But this post has gone on too long already, so let's leave it at
that :-)
1 GCC 5.1 also introduced a new version of std::list
. It is handled analogously to std::string
.
2 Decoded using c++filt from the binutils package.
3 The return value type doesn't need to be part of the symbol name because it doesn't get used for overload resolution.