Removing or obfuscating RTTI type name strings

Hi there,

I’m hitting a rather difficult problem. I have to compile with RTTI data structures generated because, even though I am not using dynamic_cast or typeid in my application code, I am linking and using a library that does use dynamic_cast. Therefore my code will crash if I compile with -fno-rtti.

The problem is, then, that the size of my code is greatly increased, and also (which is more important) critical information is being leached into the resulting application binary by the RTTI type name string that is generated. Unlike normal symbols, these cannot be stripped from the executable.

Therefore I would like to make a change to the clang compiler to either replace all the type name strings with a single “?” string (this would be best) or doing something like a rot-x encryption on the complete string (I would rather not do this since these strings are literally hundreds of characters long given that the types are complex template types).

My suggestion would be that I would attempt to add a -fno-rtti-names parameter. If this is of interest to the general clang community I would be happy to submit a patch for consideration, but at the very least I need something for my own purposes.

This brings me to my request. I would be very grateful if someone here might be able to direct me into the right place for making such a change.

Looking at the source code there is a ItaniumRTTIBuilder::BuildTypeInfo(…) function in CodeGen/ItaniumCXXABI.cpp (see https://github.com/llvm/llvm-project/blob/fe177a1773e4f88dde1aa37d34a0d3f8cb582f14/clang/lib/CodeGen/ItaniumCXXABI.cpp#L3730). In there, the first thing it does is lay down a field for the mangled type name. My guess is that it should be possible to substitute the line

llvm::GlobalVariable *TypeName = GetAddrOfTypeName(Ty, Linkage);

with something that generates a static string “?” and returns the address of that. Then it will build the table pointing at this string, I am guessing.

Is this a feasible approach or will this break loads of things elsewhere in the compiler and/or c++ runtime? I am not interested in a run-time ability to get the mangled (or otherwise) name of the class, so if replacing this string has no effect on, for example, the correction function of dynamic_cast or typeid and only means that std::type_info::name returns a bogus value, then I’m happy with that.

Many thanks for any and all suggestions regarding this.

Cheers,
Andy

The address and (sometimes) contents of the _ZTS type_info name are used for type_info equality comparisons. The implementation of dynamic_cast internally uses type_info comparisons to find the destination type within the source type’s type_info tree. So changing the contents of the string to be non-unique may lead to problems, especially if the ABI rule in question results in the use of strcmp. You could perhaps instead consider replacing the string contents with something like a hash of the mangled name of the type (though be aware that the ABI library will want to interpret it as a nul-terminated byte string).

The symbol hashing (while preserving the nul-terminated-ness, as you say - so using base64 encoding of the hash or something like that) might have some overlap with ideas that gets thrown around from time to time, to reduce symbol name length generally (in the DWARF, and in the ELF symbol tables - though the latter would mostly/only apply if you’re OK with an ABI break or possibly a floating ABI (ie: build all your C++ code with this exact compiler, no prebuilt libraries, etc)) - so not /exactly/ the same thing, but might have enough overlap to benefit from some common machinery/family of options. I hadn’t actually thought about the RTTI side of things - I should check that in more detail, perhaps another place/source of redundant names & potential size benefit of this overall direction.

  • Dave

GCC runtime specifically does a string comparision, even if that
explicitly breaks things like hidden.

Joerg