RFC: Supporting ELF symbol aliases via GlobalAlias GEPs

Hi Everyone,

Chris suggested[1] I should ask for feedback as to whether this is a desired feature before I put too much effort into it, so here goes:

I would like to be able to export a symbol that is inside an LLVM structure. This is possible on ELF targets[2], and the attached proof-of-concept patch to AsmWriter makes it work (although in a hackish way that I am NOT suggesting be committed as-is).

With this patch, the you can compile this:

%0 = type { i32, i32 }
@structure = global %0 { i32 0, i32 1 }
@element1 = alias getelementptr( %0* @structure, i32 0, i32 1)

To this:

     .subsections_via_symbols
     .section __DATA,__data
     .align 3
_structure: ## @structure
     .space 4
     .long 1 ## 0x1

     .globl _element1
     .set _element1, _structure+4
     .size _element1, 4
     .type _element1,@object

The element1 symbol is an i32* pointing to element 1 in the structure (the one emitted by .long 1).

There are really two questions here:

1) Do we want to be able to generate this kind of output at all (I do!)
2) If we do, do we want to use the global alias initialised with a constant GEP to do it, or provide some other mechanism?

David

[1] http://llvm.org/bugs/show_bug.cgi?id=4739#c24
[2] I know it's not possible on Mach-O (well, it is for internal symbols, just not for ones exported via the symbol table) - does anyone know if PE allows it?

llvm.diff (1.3 KB)

I've attached a less-hackish implementation of this. This includes the following modifications:

- getSupportsOverlappingAliases() method on TargetMachine which returns whether the target supports multiple symbols to the same object. This returns false in the superclass and needs to be explicitly overridden for each target to enable it.

- An implementation of this method in X86TargetMachine which returns true for ELF targets.

- printObjectType() in AsmPrinter. Currently all of the subclasses hard-code this (e.g. ".type " + name + ", @object") when emitting global variables. I've implemented this for X86ATTAsmWPrinter, and will add the same for other classes as required.

- PrintGlobalOffsetAlias(), also in AsmPrinter. This outputs a GlobalAlias which is a GEP to a GlobalVariable, if the target supports it.

This is a proof-of-concept implementation which, pending review, I'd like to commit as a work-in-progress and then work on adding support for more targets. It should be relatively trivial to add this for other ELF targets; just override getSupportsOverlappingAliases() in the relevant TargetMachine subclass to return true and copy the code out of the PrintGlobalVariable() method in the AsmPrinter subclass to implement printObjectType().

David

llvm.diff (6.87 KB)

Hello David,

This feature sounds reasonable to me. Please update the patch to
include a feature test, and LangRef.html changes as necessary,
including a mention that the feature depends on the target.

Here are a few misc. comments on the patch itself:

+ /// isTargetElf - returns true if the target is ELF.
+ virtual bool isTargetElf() const { return true; }

This does not belong in TargetMachine.h.

+ if ((GEP = dyn_cast<ConstantExpr>(I->getAliasee()))
+ && (GEP->getOpcode() == Instruction::GetElementPtr)) {

Please follow LLVM style.

+ assert(PrintGlobalOffsetAlias(cast<GlobalAlias>(I)) &&
+ "Target doesn't support offset aliases.");

This won't call PrintGlobalOffsetAlias when asserts are disabled.

Thanks,

Dan

Hi David,

Even if this works on Linux/ELF, do know whether it is "officially" supported?

Using aliases to point to the interior of objects seems like something
that could be very likely to break, but I don't know anything about
how ELF encodes aliases.

- Daniel

Hi Daniel,

I've tested it on Linux and FreeBSD and it works on both. I just tested on PowerPC/OpenBSD[1] and it seems that the OpenBSD loader does, indeed, break this. On further investigation it turns out that the OpenBSD loader doesn't work properly with aliases at all, including the kind emitted from a GlobalAlias.

I've not tried on NetBSD or Solaris. I'd be interested to know if this works on Windows (I would expect that it would, purely by accident, given the crazy way DLL relocations work on Windows, but I've no idea if it actually does).

It would be nice to get some idea of where this actually works before I proceed. The ELF specification doesn't actually contain the word 'alias' anywhere, but it seems to be assumed that symbols are allowed to be aliased. As far as I can see, this is no more likely to break than aliases pointing to the start of symbols - on platforms where the loader relocates entire segments together it works, on platforms where they relocate individual symbols independently it breaks.

David

[1] LLVM doesn't seem to correctly output external globals at all on this platform - they lack the .size and .type directives, which is a bit odd because I remember seeing code for generating them in the PowerPCAsmPrinter class. I'll have a look at what's going on there later).