Adding support to LLVM for data & code layout (needed by GHC)

Hi All,

The GHC developers would like to add support to llvm to enable the
order that code and data are laid out in, in the resulting assembly
code produced by llvm to be defined by the user. The reason we would
like to have this feature is explained in the blog post on GHC's use
of llvm here: http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html,
specifically under the title, 'Problems with backend'.

Basically we want to be able to produce code using llvm that looks like this:

.text
    .align 4,0x90
    .long _ZCMain_main_srt-(_ZCMain_main_info)+0
    .long 0
    .long 196630
.globl _ZCMain_main_info
_ZCMain_main_info:
.Lcg6:
    leal -12(%ebp),%eax
    cmpl 84(%ebx),%eax
    [...]

So in the above code we can access the code for the function
'_ZCMain_main_info' and the metadata for it need by the runtime with
just the one label. At the moment llvm just outputs all global
variables at the end.

It seems to me that there are three slightly different ways to support
this in llvm:

1) Have llvm preserve order of data and code from input file when in
the same section

2) Use a new special '@llvm.foo' variable that takes a list of
functions and globals. Order they appear in the array is the order
they should be output in and as one contiguous block.

3) Have llvm be specifically aware about the desire to associate some
global variable with a function. So a function definition could
include taking a global variable as an attribute. llvm would then
output the function and variable together like in the code above.

I was thinking that the first option is the easiest, both for llvm and
its users. My simple idea was to just somehow store the order that
functions and globals are read in by AsmParser or created in by using
the API. You could use a list to do this or just give each
global/function a number representing its order that could be sorted
on. When it comes for AsmPrinter to write out the module it does so in
order. Any new functions or globals created by optimisations are
simply added to the end of the sort order. This would produce the
above code but also with a label for the data, like this:

.text
    .align 4,0x90
_ZCMain_main_info_table:
    .long _ZCMain_main_srt-(_ZCMain_main_info)+0
    .long 0
    .long 196630
.globl _ZCMain_main_info
_ZCMain_main_info:
.Lcg6:
    leal -12(%ebp),%eax
    cmpl 84(%ebx),%eax

The problem could be optimisations though. this is an area I'm not
very knowledgeable in so please point out any issues. The main problem
I can think of are inlining and dead code removal. Inlining I believe
should be OK as long as it doesn't remove the original function since
we need that label present to access the data before it. I wouldn't
think this will happen though since the function will be accessed both
as a tail call to it and using pointer arithmetic with subtraction to
get the data before it. The pointer arithmetic would stop llvm
removing it. The other issue is dead code removal removing the data
('_ZCMain_main_info_table') since there are no references to it. That
can be easily fixed using @llvm.used. The biggest problem with this
approach is that it limits the optimisations llvm can do on this code.
If the third approach was taken for example llvm could optimise more
aggressively and specifically for the situation. I'm also not exactly
sure how link time optimisation would figure into this at the moment
so perhaps that's a big issue.

So thoughts, criticisms, alternative suggestions please.

Cheers,
David

Whichever way is chosen, the ability to reorder and intermingle functions
and data arbitrarily is interesting to more than just GHC. In particular, I
would like to point out the efforts by Mozilla to make Firefox startup
faster, which essentially came down to reordering stuff in the executables
so that everything is ordered by the sequence of accesses during program
startup. This means that programs can be read sequentially from the front
to the end, thus reducing I/O latency.

Tools for automating this process would probably benefit from being able
to specify the layout this way.

Sebastian

I dislike this approach; implicit requirements are bad. Many clients don't care
about the order in which variables are emitted, and indeed GHC doesn't care
either outside of a very narrow range of constraints.

It seems to me that a module property (or special global value) holding a list of
ordering lists would be reasonably appropriate. Constraints: values can only
appear in a single list, values in a list must be definitions, and (for now) values
in a list should not have merging linkage.

John.

FWIW this would also help in implementing -fno-toplevel-reorder, which
is needed if llvm/clang wants to build (e)glibc. A list holding the
order of module level assembly and functions should suffice.

Best regards,
--Edwin

Let me point out that projects using standard toolchain (e.g.
binutils) can already reorder code and data pretty much arbitrary
using sections and linker scripts.
I think it's still attractive to have reordering in LLVM to be
independent from external toolchain. This will allow reordering in JIT
and other interesting things.

I agree with John that special global with ordered list looks like the
clean approach. However, for the specific need in GHC I'm not sure
that this is enough.
What it's trying to do is placing a global variable (a struct) and a
function next to each other, so that it can use pointer arithmetic to
go from one address to another.
So, say, we take the address of the struct, add it's size and call
into resulting address. Sufficiently smart optimizer can spot that we
dereference a pointer pointing outside of the object. This probably
has undefined semantics and can be replaced with unreachable?
It's also a missed opportunity for optimizations -- if optimizer knew
where outside-of-the-struct pointer is really going it could make
direct call instead of indirect -- however, I don't know if this is a
big deal.

Eugene

Let me point out that projects using standard toolchain (e.g.
binutils) can already reorder code and data pretty much arbitrary
using sections and linker scripts.
I think it's still attractive to have reordering in LLVM to be
independent from external toolchain. This will allow reordering in JIT
and other interesting things.

I agree with John that special global with ordered list looks like the
clean approach. However, for the specific need in GHC I'm not sure
that this is enough.
What it's trying to do is placing a global variable (a struct) and a
function next to each other, so that it can use pointer arithmetic to
go from one address to another.
So, say, we take the address of the struct, add it's size and call
into resulting address. Sufficiently smart optimizer can spot that we
dereference a pointer pointing outside of the object. This probably
has undefined semantics and can be replaced with unreachable?
It's also a missed opportunity for optimizations -- if optimizer knew
where outside-of-the-struct pointer is really going it could make
direct call instead of indirect -- however, I don't know if this is a
big deal.

If I understand, you want more than just ordering within a section,
you want to override the section of a global to be in the text (or
equivalent) section. Is the global structure constant? It seems a
way to splat constants directly before/after/in a function and refer
to them as a global would be useful for all manner of side-tables.
Being able to just order output of globals and functions make picking
sections for them fairly hackish.

Andrew

Yes, the global structure is constant. This is indeed a side-table.
Overriding section of global to be in text is simple -- LLVM already
supports section attribute on globals and functions. However we also
need a specific ordering in text.
With some extra work this ordering can be achieved with gnu linker (I
posted example implementation earlier) without any changes to LLVM, so
the main points for having a direct support for side-tables in LLVM is
1) Not relying on external tool
2) Optimizer issues

Eugene

Yes, the global structure is constant. This is indeed a side-table.
Overriding section of global to be in text is simple -- LLVM already
supports section attribute on globals and functions. However we also
need a specific ordering in text.
With some extra work this ordering can be achieved with gnu linker (I
posted example implementation earlier) without any changes to LLVM, so
the main points for having a direct support for side-tables in LLVM is
1) Not relying on external tool
2) Optimizer issues

My argument amounts to express side tables as side tables in the IR
rather than as an ordering on globals. I think that would simplify
the backend (a side table is something you discover form the function
rather than having to check another global). Also, if well specified,
I think you could allow basic block labels into structures which makes
them more interesting for other uses.

Andrew

Its good to see that a feature of this nature would be useful to a
whole range of people, I wasn't aware of that.

My argument amounts to express side tables as side tables in the IR
rather than as an ordering on globals. I think that would simplify
the backend (a side table is something you discover form the function
rather than having to check another global). Also, if well specified,
I think you could allow basic block labels into structures which makes
them more interesting for other uses.

Sure. I wasn't set on the third approach I suggested, which is to have
them expressed as side tables in the IR as I didn't realise other
users would be interested in them so I didn't think it would be
appropriate to add new language constructs for one user. I don't think
it would simpler to implement in the backend though and this approach
would need changes to the frontend, so a lot more work.

What I am hoping someone may be able to give a answer to though is
what issues there may be if the second approach was taken (using the
special glob var)? Would the optimiser be tempted at some point to
replace a load instruction to an unknown address created by a negative
offset from a function with unreachable for example as Eugene
suggested may be possible?

Also, what are you gaining going with the third approach? I guess the
optimiser could do things like constant propogation using the third
approach but not the second although I think thats unlikely do give
much benefit in the kind of code GHC produces but there is everyone
else to think of :).

Thanks for all the responses though, I'm going to start playing around
with some code and see what happens.

Its good to see that a feature of this nature would be useful to a
whole range of people, I wasn't aware of that.

My argument amounts to express side tables as side tables in the IR
rather than as an ordering on globals. I think that would simplify
the backend (a side table is something you discover form the function
rather than having to check another global). Also, if well specified,
I think you could allow basic block labels into structures which makes
them more interesting for other uses.

Sure. I wasn't set on the third approach I suggested, which is to have
them expressed as side tables in the IR as I didn't realise other
users would be interested in them so I didn't think it would be
appropriate to add new language constructs for one user. I don't think
it would simpler to implement in the backend though and this approach
would need changes to the frontend, so a lot more work.

The backend already can sort of do this with the GCMetadataPrinter.
Generalizing that to arbitrary side tables might be easier than adding
a new construct (granted sidetables might not replace the ability to
output assembly by that class, but they might do a lot of the heavy
lifting). Since GC lowering happens on the IR level (from the docs I
looked at, I haven't personally dealt with GC yet), it maybe possible
to do a lot of lowering to generalized tables rather than complex
GCMetadataPrinter implementations. This is just speculation on my
part though. This is one of the reasons I thought labels in the
constant structs could be handy. Perhaps a general side table
representation in the backend could be used by EH too?

Andrew

Hi all,

Just wanted to report that I've found a second way to achieve
data/code layout (the first being the linker script that Eugene
mentioned).

The key is that gnu as supports a feature called subsections.

http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_002dSections

The way this works is that you can put stuff into a section like
'.text 2', where 2 is a subsection of .text When run, 'as' orders the
subsections. So all you need to do is arrange for the sidetable to be
in section '.text n' and the code in section '.text n+1'. Each
sidetable and its code goes in its own subsection. The nice thing is,
this is purely a gnu as feature. When it compiles the assembly to
object code, the subsections aren't present in the object code, so you
don't get 100's of sections that take up space and slow down linking.

There is one complication though. LLVM (and GCC as well) don't support
subsections. While you can define what section globals and functions
are in, this doesn't support defining the subsection. If you say to
LLVM, put function f in section "text 12", it produces assembly like:

.section text 12,"rw" @progbits
f:
[..]

Which causes gas to spit out a syntax error. Gas only allows using
subsections through a very defined syntax, so it needs to be:

.text 12
f:
  [...]

We can convert between them though with just a simple regex.

We are going to use this approach for the moment in GHC, we've tested
it and its working great so far. I prefer this method over the linker
script as implementing the linker script approach would affect all the
backends GHC supports while this approach is contained to the LLVM
backend.

I'm still planning on adding support to LLVM for supporting side
tables in some manner so we can just depend on pure LLVM.

Cheers,
David

Subsections is a very good idea. You can even do without
post-processing by using carefully crafted section names, e.g.

__attribute__((section(".text,\"ax\",@progbits\n\t.subsection 1 #")))
void foo()
{
}

(Note that you need ".subsection n" commands on ELF targets and
".section name, n" commands on COFF targets; seems that the latter was
supported on all targets in old versions of gas, but not any longer).

Eugene

hehe cool, this is great news. Thanks for letting me know.

David

(Note that you need ".subsection n" commands on ELF targets and
".section name, n" commands on COFF targets; seems that the latter was
supported on all targets in old versions of gas, but not any longer).

Btw, will this work on Mach-O?

There is one complication though. LLVM (and GCC as well) don't support
subsections. While you can define what section globals and functions
are in, this doesn't support defining the subsection. If you say to
LLVM, put function f in section "text 12", it produces assembly like:

.section text 12,"rw" @progbits

This seems easy to fix during the asmprinting. E.g. if section name is
an integer from 0 till 8192 => emit as an subsection. Side q: what
will you do when you run out of subsections?

There is one complication though. LLVM (and GCC as well) don't support
subsections. While you can define what section globals and functions
are in, this doesn't support defining the subsection. If you say to
LLVM, put function f in section "text 12", it produces assembly like:

.section text 12,"rw" @progbits

This seems easy to fix during the asmprinting. E.g. if section name is
an integer from 0 till 8192 => emit as an subsection. Side q: what
will you do when you run out of subsections?

It seems easy to fix for functions, but for globals you already have
to overwrite their section in LLVM so the section won't be just an
integer.

Andrew

I have no idea how gnu toolchain works on Mach-O platforms. My guess
is that it goes via COFF path here, because the other path is
ELF-specific.

As Andrew already said, for the table we need both section and
subsection. To solve the problem with running out, we can put each
function into a separate section (C++ compilers were doing this for a
while) and only use 2 subsections per section: 0 for the table and 1
for function.

Eugene

Hi,

Does anyone know whether subsections are specific to the gnu assembler
or whether they are supported by other assemblers, such as masm?

Or put another way, will this limit the assembly output to the gnu
toolchain?

Cheers,
Sam

There is one problem with the section name used here, 'llvm-as'
doesn't support it. LLVM itself does, so if you compile the above with
clang then it works fine. If you try to use that section name in a .ll
file and call one of the tools it fails as the parser doesn't support
escaping quotes. It also doesn't support interpreting '\n' as a new
line and outputs each character into the assembly file. Anyway you can
get around this by using a section name like:

".text;.subsection 1 #"

instead. If your using the LLVM API then this isn't a problem.

David

Well, post-processing then.