Extend llvm to fix global addresses

Hi all

It would be nice to add support for placing globals at fixed addresses in memory.

For example, low level driver code tends to contain things like this

*(int*)0x00001000

which is horrible.

Alternatively people are using linker scripts and assembly hacks to put symbols at fixed addresses.

I propose we add first class support to this in the compiler. Like addrspace(#) we should have something like fixedaddr(#) to put a symbol at the address specified (naming is open to suggestions)

Support for this would be needed in the following places:
Clang - with suggestions open to syntax, particularly if anyone knows of existing compilers with syntax for this
IR - "fixedaddr" as above with new attribute on globals (and probably on functions if there's a need to bind functions to fixed addresses too)
Object files - this might be possible within current formats like ELF by doing horrible things with relocations, but we'll have to experiment
Linker - like object file support, but additionally would be nice to improve them to the point where we can avoid using linker scripts to do this

Comments are very welcome.

Thanks,
Pete

It would be nice to add support for placing globals at fixed addresses in memory.

I don't know. From my experience, the usefulness is very, very limited.
As in: drivers are about the only thing that can make use of it.

For example, low level driver code tends to contain things like this

*(int*)0x00001000

which is horrible.

The drivers I have seen can't work that way because they are written to
cover more than one specific machine. As soon as you go anywhere near an
IO abstraction, this doesn't apply anymore. As such, I think this
feature primarily helps making ugly, unportable code differently ugly,
unportable code.

Joerg

It would be nice to add support for placing globals at fixed addresses in memory.

I don't know. From my experience, the usefulness is very, very limited.
As in: drivers are about the only thing that can make use of it.

Yeah, it would mostly be drivers, but there's also plenty of embedded processors out there
with minimal/no linkers. In those situations the developer tends to have to hack the assembly
output from the compiler to lay out the globals manually.

For example, low level driver code tends to contain things like this

*(int*)0x00001000

which is horrible.

The drivers I have seen can't work that way because they are written to
cover more than one specific machine. As soon as you go anywhere near an
IO abstraction, this doesn't apply anymore. As such, I think this
feature primarily helps making ugly, unportable code differently ugly,
unportable code.

Agreed on the ugliness, but at least it keeps it all in one place instead of defining the globals
in C but then having to also maintain a linker script somewhere else to define their addresses.

(resent due to mailing list breakage)

It would be nice to add support for placing globals at fixed addresses in memory.

I don't know. From my experience, the usefulness is very, very limited.
As in: drivers are about the only thing that can make use of it.

I agree that most code does not profit from specifying fixed addresses for objects.
Nonetheless, there is code which does. Obvious examples:
- boot-loaders
- kernels
- drivers
- JITted code interacting with the runtime
- code working with specialized address spaces

For example, low level driver code tends to contain things like this

*(int*)0x00001000

which is horrible.

The drivers I have seen can't work that way because they are written to
cover more than one specific machine. As soon as you go anywhere near an
IO abstraction, this doesn't apply anymore. As such, I think this
feature primarily helps making ugly, unportable code differently ugly,
unportable code.

Your argument appears to be that this feature would have little use to you.
Noted, I guess. We've spoken to several people who do write drivers and
other code like the above, and they seem pretty enthusiastic about the
idea. If you have input about how best to design this, at any of the levels
Peter spelled out, that would be interesting.

John.

We've spoken to several people who do write drivers and
other code like the above, and they seem pretty enthusiastic about the
idea. If you have input about how best to design this, at any of the levels
Peter spelled out, that would be interesting.

I'm curious what the particular benefits/differences would be between
this language feature & the existing solution (casting constant
integer values to pointers). This wasn't clear to me from the original
post.

Perhaps some (even straw man) example code?

- David

The best case i can think of is embedded developers needing to layout functions or globals in memory.
Currently they would have to resort to a linker script or assembly hacks for this.

But anything which avoids the horrible int* cast has to be a good thing. For one thing it would cause alias analysis a lot of pain.

One thing that comes immediately to mind is that the optimizer would see
loads and stores as affecting a specific global variable and therefore
not aliasing other globals and the heap.

Dereferencing a constant pointer also only really works if you're
memory-mapping. If you actually need to place data at a fixed address,
it's not good enough, and you'll end up needing some crazy linker magic.
One of our long-term goals is to reduce the need for that.

John.

The best case i can think of is embedded developers needing to layout functions or globals in memory.
Currently they would have to resort to a linker script or assembly hacks for this.

In this case you're describing a situation that's not the
constant-cast-to-int - ah, like John's recent reply, wanting to place
certain variables into fixed memory locations & let the compiler do
the initialization, etc.

For functions I realize that'd have to be compiler supported, but for
global variables what would the practical difference be between "place
this variable at this address" and "take a pointer to this constant
address & write my constant data in there" be? Is that something that
cannot be done with the latter technique?

But anything which avoids the horrible int* cast has to be a good thing.

I'm just not sure I see it as so problematic as deserving of a
language feature - what actual problems does it cause developers? Used
fairly explicitly & in the places that require it, it doesn't seem
terribly intrusive (so as to corrupt otherwise clean code) or error
prone.

For one thing it would cause alias analysis a lot of pain.

Hmm, I suppose - I'm not sufficiently familiar with that to know how
this would adversely affect such logic nor what optimizations might
occur on it. Wouldn't the usual use case here (for drivers, etc)
require the pointer to be volatile anyway, to avoid the compiler
optimizing away, say, a repeated read that was intentionally designed
to read distinct values due to the hardware changing the values in
memory?

[apologies if these are somewhat naive questions & feel free to carry
on without my discussion, of course]

- David

The best case i can think of is embedded developers needing to layout functions or globals in memory.
Currently they would have to resort to a linker script or assembly hacks for this.

This proposal also requires extending object file formats and linkers,
and this impacts a lot of people, so it would want a pretty
compelling motivation.

But anything which avoids the horrible int* cast has to be a good thing. For one thing it would cause alias analysis a lot of pain.

Actually, it wouldn't cause very much pain, if any, in alias analysis.

LLVM IR already assumes that programmers can't "guess" what the
address of the stack, global variables, or heap will be. An integer
constant casted to a pointer is assumed to be non-aliasing with
"regular" objects in memory. This property is of utmost importance,
because a pass like mem2reg relies on it to know that the allocas
it wants to promote to registers don't have their addresses
secretly taken, for example.

To support global variables at fixed addreses, you'd probably
need to make an exception to this rule, to allow these variables
to be accessed via their address values directly, which would
defeat much of the remaining benefit. These variables would
always be implicitly escaped.

I don't mean to shoot you down, but there don't seem to be any
compelling motivations for this feature.

Dan

In some circumstances, it may be helpful to treat memory used for memory mapped I/O devices as special memory objects. We did this in the SVA-OS work in which we wanted to know which pieces of memory are used for memory mapped I/O and which are used for regular memory objects. This, in turn, could be used to figure out which memory objects should be accessible for DMA (they are reachable from I/O memory objects in the points-to graph) and which load/store instructions (regular loads and stores or special I/O loads/stores) were accessing which type of memory.

In other words, it may be desirable to tell the difference between an arbitrary int2ptr cast and a pointer to a memory-mapped I/O object.

One way to do this is to have the kernel/driver use an I/O allocator function. This function returns a pointer like (malloc()) but takes as input a physical memory address and size; the returned pointer is a pointer to memory-mapped I/O that can be accessed by deferencing the pointer, either using a volatile load/store or some special I/O function.

Interestingly enough, the Linux 2.4 kernel had such a function; it remapped physical memory into the kernel address space and returned a pointer to it.

So, instead of adding the globals feature originally suggested in this thread, one could simply add an intrinsic or just recognize some function as the "I/O memory allocator" the same way that malloc() is recognized as a heap allocator today. Just make it take a constant integer and a size, and it returns a pointer to the memory in question.

-- John T.

P.S. The relevant SVA-OS paper is at http://llvm.org/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.html

Dan Gohman <gohman@apple.com> writes:

But anything which avoids the horrible int* cast has to be a good
thing. For one thing it would cause alias analysis a lot of pain.

Actually, it wouldn't cause very much pain, if any, in alias analysis.

LLVM IR already assumes that programmers can't "guess" what the
address of the stack, global variables, or heap will be. An integer
constant casted to a pointer is assumed to be non-aliasing with
"regular" objects in memory.

I don't mean to shoot you down, but there don't seem to be any
compelling motivations for this feature.

I agree. We make use of this property to turn some pretty ugly frontend
pointer arithmetic using integers into somewhat less ugly GEP operations
which LLVM can understand.

So ptrtoint/inttoptr isn't really a problem as long as you know it's
well-behaved.

                             -Dave