Proposal: arbitrary relocations in constant global initializers

Hi,

I’d like to make this proposal for extending the Constant hierarchy with
a mechanism for introducing custom relocations in global initializers. This
could also be seen as a first step towards adding a “bag-of-bytes with
relocations” representation for global initializers.

Problem

In order to implement control flow integrity for indirect function calls, we
would like to add a set of constructs to the IR that ultimately allow for a
jump table similar to that described for IFCC in [1] to be expressed. Ideally
the additions should be minimal and general-purpose enough to allow them to
be used for other purposes.

IFCC, the previous attempt to teach LLVM to emit jump tables, was removed
for complicating how functions are emitted, in particular requiring a
subtarget-specific instruction emitter available in subtarget-independent
code. However, the form of a jump table entry is generally well known to
whichever component of the compiler is creating the jump table (for example, it
needs to know the size of each entry, and therefore the specific instructions
used), and we can therefore simplify things greatly by not considering jump
tables as consisting of instructions, but rather known strings of bytes in
the .text section with a relocation pointing to the function address. For
example, on x86:

$ cat tc.ll
declare void @foo()

define void @bar() {
  tail call void @foo()
  ret void
}
$ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 |~/src/llvm-build-rel/bin/llvm-objdump -d -r -
<stdin>: file format ELF64-x86-64

Disassembly of section .text:
bar:
       0: e9 00 00 00 00 jmp 0 <bar+5>
    0000000000000001: R_X86_64_PC32 foo-4-P

Or on ARM:

$ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 -mtriple=armv7-unknown-linux |~/src/llvm-build-rel/bin/llvm-objdump -d -r -

<stdin>: file format ELF32-arm-little

Disassembly of section .text:
bar:
       0: fe ff ff ea b #-8 <bar>
      00000000: R_ARM_JUMP24 foo

How can we represent such jump table entries in IR? One way that almost
works on x86 is to attach a constant to a function using either prefix data
or prologue data, or to place a GlobalVariable in the .text section using
the section attribute. The constant would use ConstantExpr arithmetic to
produce the required PC32 relocation:

define void @bar() prefix <{ i8, i32, i8, i8, i8 }> <{ i8 -23, i32 trunc (i64 add (i64 sub (i64 ptrtoint (void ()* @foo to i64), i64 ptrtoint (void ()* @bar to i64)), i64 3) to i32), i8 -52, i8 -52, i8 -52 }> {
  ...
}

However, this is awkward, and can’t be used to represent an ARM jump table
entry. (It also isn’t quite right; PC32 can trigger the creation of a
PLT entry, which doesn’t entirely match what the ConstantExpr arithmetic
is doing.)

Design

A relocation can be seen as having three inputs: the relocation type (on
Mach-O this also includes a pcrel flag), the target, and the addend. So
let’s define a relocation constant like this:

iNN reloc relocation_type (ptr target, iNN addend)

where iNN is some integer type, and ptr is some pointer type. For example,
an ARM jump table entry might look like this:

i32 reloc 0x1d (void ()* @foo, i32 0xeafffffe) ; R_ARM_JUMP24 = 0x1d

There is no error checking for this; if you use the wrong integer type for
a particular relocation, things will break and you get to keep both pieces.

At the asm level, we would add a single directive, ".reloc", whose syntax
would look like this when targeting ELF and COFF:

.reloc size relocation_type target addend

or this when targeting Mach-O:

.reloc size relocation_type pcrel target addend

The code generator would emit this directive when emitting a reloc in a
constant initializer. (Note that this means that reloc constants would only
be supported with the integrated assembler.)

For example, the ARM JUMP24 relocation would look like this:

.reloc 4 0x1d foo 0xeafffffe

We would need to add some mechanism for the assembler to evaluate relocations
in case the symbol is locally defined and not exported. For that reason,
we can start with a small set of supported "internal" relocations and expand
as needed.

What about constant propagation?

We do not want reloc constants to appear in functions' IR, or to be propagated
out of global initializers that use them. The simplest solution to this
problem is to only allow reloc constants in constant initializers where we
cannot/do not currently perform constant propagation, i.e. function prologue
data, prefix data and constants with weak linkage. This could be enforced
by the verifier. Later we can consider relaxing this constraint as needed.

Other uses

Relocation constants could be used for other purposes by frontends. For
example, a frontend may need to represent some other kind of custom/specific
instruction sequence in IR, or to create arbitrary kinds of references between
objects where that may be beneficial (for example, -fsanitize=function may
use this facility to create GOTOFF relocations in function prologues to
avoid creating dynamic relocations in the .text section to fix PR17633).

Thanks,

Now with the correct list.

This is pr10368.

Do we really need to support hard coded relocation numbers? Looks like
the examples above have a representation as constant expressions:

(sub (add (ptrtoint @foo) 0xeafffffe) cur_pos)

no?

I'm not sure if this would be sufficient. The R_ARM_JUMP24 relocation
on ARM has specific semantics to implement ARM/Thumb interworking; see

Note that R_ARM_CALL has the same operation but different semantics.
I suppose that we could try looking at the addend to decide which relocation
to use, but this would mean adding more complexity to the assembler (along
with any pattern matching that would need to be done). It seems simpler,
both conceptually and in the implementation, for the client to directly say
what it wants in the object file.

There's also the point that if @foo is defined outside the current linkage
unit, or refers to a Thumb function, the above expression in a constant
initializer would refer to the function's PLT entry or a shim, but in a
function it would refer to the function's actual address, so the evaluation
of this expression would depend on whether it was constant folded. (Although
on the other hand we might just declare that by using such a constant in a
global initializer that may be constant folded the client is asserting that
it doesn't care which address is used.)

Why do you need to be able to avoid them showing up in function
bodies? It would be unusual but valid to pass the above value as an
argument to a function.

This was part of the proposal mainly for the constant folding reasons mentioned
above, but if we did go with a reloc expression we'd need to encode the
original constant address in the reloc for PC-relative expressions, which
wouldn't be necessary if we disallow it.

Thanks,

I'm not sure if this would be sufficient. The R_ARM_JUMP24 relocation
on ARM has specific semantics to implement ARM/Thumb interworking; see
Documentation – Arm Developer
Note that R_ARM_CALL has the same operation but different semantics.
I suppose that we could try looking at the addend to decide which relocation
to use, but this would mean adding more complexity to the assembler (along
with any pattern matching that would need to be done). It seems simpler,
both conceptually and in the implementation, for the client to directly say
what it wants in the object file.

There's also the point that if @foo is defined outside the current linkage
unit, or refers to a Thumb function, the above expression in a constant
initializer would refer to the function's PLT entry or a shim, but in a
function it would refer to the function's actual address, so the evaluation
of this expression would depend on whether it was constant folded. (Although
on the other hand we might just declare that by using such a constant in a
global initializer that may be constant folded the client is asserting that
it doesn't care which address is used.)

I am pretty sure there is use for some target specific expressions, my
concerns are
* Using a target specific expression when it could be represented in a
target independent way (possibly a bit more verbose).
* Using the raw relocation values, instead of something like
thumb_addr_delta. With this the semantics of each constant expression
are still documented in the language reference.

Why do you need to be able to avoid them showing up in function
bodies? It would be unusual but valid to pass the above value as an
argument to a function.

This was part of the proposal mainly for the constant folding reasons mentioned
above, but if we did go with a reloc expression we'd need to encode the
original constant address in the reloc for PC-relative expressions, which
wouldn't be necessary if we disallow it.

Seems better to make it explicit IMHO.

BTW, about the assembly change: Please check what the binutils guys
think of it. We do have extensions, but it is nice to at least let
them know so that we don't end up with two independent solutions in
the future.

Cheers,
Rafael

> I'm not sure if this would be sufficient. The R_ARM_JUMP24 relocation
> on ARM has specific semantics to implement ARM/Thumb interworking; see
> Documentation – Arm Developer
> Note that R_ARM_CALL has the same operation but different semantics.
> I suppose that we could try looking at the addend to decide which relocation
> to use, but this would mean adding more complexity to the assembler (along
> with any pattern matching that would need to be done). It seems simpler,
> both conceptually and in the implementation, for the client to directly say
> what it wants in the object file.
>
> There's also the point that if @foo is defined outside the current linkage
> unit, or refers to a Thumb function, the above expression in a constant
> initializer would refer to the function's PLT entry or a shim, but in a
> function it would refer to the function's actual address, so the evaluation
> of this expression would depend on whether it was constant folded. (Although
> on the other hand we might just declare that by using such a constant in a
> global initializer that may be constant folded the client is asserting that
> it doesn't care which address is used.)

I am pretty sure there is use for some target specific expressions, my
concerns are
* Using a target specific expression when it could be represented in a
target independent way (possibly a bit more verbose).

Well I don't think there's a target independent way to write an R_ARM_JUMP24
relocation, as there's no way to represent the PLT entry or interworking
shim in IR.

* Using the raw relocation values, instead of something like
thumb_addr_delta. With this the semantics of each constant expression
are still documented in the language reference.

I guess there are two ways we can go here:

1) expose the raw relocation values
2) introduce new specific ConstantExpr subtypes for the target-specific things we need

In this case I think we should do one or the other, I don't really think it's
worth adding a half measure of flexibility (e.g. providing a way to specify
the addend of a R_ARM_JUMP24 when it will pretty much always be the same).

I like option 1 because it's more general purpose and ultimately less of an
impedance mismatch between what the client wants and what appears in the
object file, and we can solve the documentation problem with reference to
the object file format documentation, but it would require our documentation
to depend on sometimes poorly documented object file formats.

Option 2 could look something like this (produces the same bytes as "b
some_label" in every object format when targeting ARM, or "b.w some_label"
when targeting Thumb):

i32 arm_b (void ()* @some_label)

and that would be easy to document on its own. The downside is that it's
pretty specific to my use case, but maybe that's ok.

2 seems like it would be less implementation work, and doesn't require any
changes to the assembly format (and ultimately could be upgraded to 1 later
if needed), so maybe it's best to start with that.

>> Why do you need to be able to avoid them showing up in function
>> bodies? It would be unusual but valid to pass the above value as an
>> argument to a function.
>
> This was part of the proposal mainly for the constant folding reasons mentioned
> above, but if we did go with a reloc expression we'd need to encode the
> original constant address in the reloc for PC-relative expressions, which
> wouldn't be necessary if we disallow it.

Seems better to make it explicit IMHO.

Okay, but if we do introduce a new constant kind, there doesn't seem to be
much point in teaching the backend to lower it in a function, other than
for completeness. If we can avoid having to do that, that seems preferable.

BTW, about the assembly change: Please check what the binutils guys
think of it. We do have extensions, but it is nice to at least let
them know so that we don't end up with two independent solutions in
the future.

Yes if I ultimately go with 1.

Thanks,

I've tried implementing some of the alternatives mentioned in this
thread, and so far I like this syntax the most:

i32 reloc (29, void ()* @f, 3925868544)
; 29 = 0x1d = R_ARM_JUMP24
; 3925868544 = 0xea000000

Note the zeroes in the relocated data instead of 0xfffffe in the
original proposal. This is aligned with the way LLVM emits relocations
in the backend, and avoids encoding the addend in a
relocation-specific way in the IR. Instead, the addend can be
specified in the second argument with the regular IR expressions, like
the following:

@w = internal global [3 x i32]
   [i32 reloc (29, void ()* @f, 3925868544),
    i32 reloc (29, [3 x i32]* @w, 3925868544),
    i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
@w to i32*), i32 1), 3925868544)
], align 4

we also get relocations for elements 1 and 2 of @w optimized out for
free. If the "addend" (i.e. the third arg of reloc) was specified as
0xeafffffe, the backend would have had to decode this value first.

On the other hand, it is possible for a constant expression in the IR
to be lowered to something that is not a valid relocation target, and
it is hard to detect this problem at the IR level.

Also, separating the addend from the section data allows the backend
to choose between .rel and .rela representations.

I've tried implementing some of the alternatives mentioned in this
thread, and so far I like this syntax the most:

i32 reloc (29, void ()* @f, 3925868544)
; 29 = 0x1d = R_ARM_JUMP24
; 3925868544 = 0xea000000

Note the zeroes in the relocated data instead of 0xfffffe in the
original proposal. This is aligned with the way LLVM emits relocations
in the backend, and avoids encoding the addend in a
relocation-specific way in the IR.

I am confused by this statement. If the zeros aren't what appear in the
object file, it seems rather relocation specific to me.

Instead, the addend can be
specified in the second argument with the regular IR expressions, like
the following:

@w = internal global [3 x i32]
   [i32 reloc (29, void ()* @f, 3925868544),
    i32 reloc (29, [3 x i32]* @w, 3925868544),
    i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
@w to i32*), i32 1), 3925868544)
], align 4

we also get relocations for elements 1 and 2 of @w optimized out for

free. If the "addend" (i.e. the third arg of reloc) was specified as

0xeafffffe, the backend would have had to decode this value first.

I think it may be ok to allow non-global constants as the second operand
(the utility of this feature being the ability to freely RAUW a global
without worrying about reloc constants).

This doesn't necessarily need to act as an alternative means of specifying
an addend, though. Instead, the backend could synthesise local symbols to
act as relocation targets. For example, your example would conceptually
translate to:

@w = internal global [3 x i32]
   [i32 reloc (29, void ()* @f, 3925868544),
    i32 reloc (29, [3 x i32]* @w, 3925868544),
    i32 reloc (29, i32* @dummy, i32 1), 3925868544)

@dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x i32]*
@w to i32*), i32 1)

This way, you save yourself from needing to worry about manipulating
addends in the backend, the linker will take care of it for you.

On the other hand, it is possible for a constant expression in the IR

to be lowered to something that is not a valid relocation target, and
it is hard to detect this problem at the IR level.

Right, this is of course a problem we already have for aliasees and
constant initializers.

Also, separating the addend from the section data allows the backend
to choose between .rel and .rela representations.

Do you have an example of a rela relocation which uses both r_addend and
the underlying value in the object file?

Peter

I've tried implementing some of the alternatives mentioned in this
thread, and so far I like this syntax the most:

i32 reloc (29, void ()* @f, 3925868544)
; 29 = 0x1d = R_ARM_JUMP24
; 3925868544 = 0xea000000

Note the zeroes in the relocated data instead of 0xfffffe in the
original proposal. This is aligned with the way LLVM emits relocations
in the backend, and avoids encoding the addend in a
relocation-specific way in the IR.

I am confused by this statement. If the zeros aren't what appear in the
object file, it seems rather relocation specific to me.

These bytes will always be zeroes, which makes them not relocation specific.
Object file contents, on the other hand, are relocation specific. In
particular the constant 0xfffffe is ARM_JUMP24 encoding for zero
offset (from the start of the current instruction).

Somehow I find this IR representation very natural - you've got data
bytes for anything that's not relocated, and the target expression
(possibly including addend).

Instead, the addend can be
specified in the second argument with the regular IR expressions, like
the following:

@w = internal global [3 x i32]
   [i32 reloc (29, void ()* @f, 3925868544),
    i32 reloc (29, [3 x i32]* @w, 3925868544),
    i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
@w to i32*), i32 1), 3925868544)
], align 4

we also get relocations for elements 1 and 2 of @w optimized out for

free. If the "addend" (i.e. the third arg of reloc) was specified as

0xeafffffe, the backend would have had to decode this value first.

I think it may be ok to allow non-global constants as the second operand
(the utility of this feature being the ability to freely RAUW a global
without worrying about reloc constants).

This doesn't necessarily need to act as an alternative means of specifying
an addend, though. Instead, the backend could synthesise local symbols to
act as relocation targets. For example, your example would conceptually
translate to:

@w = internal global [3 x i32]
   [i32 reloc (29, void ()* @f, 3925868544),
    i32 reloc (29, [3 x i32]* @w, 3925868544),
    i32 reloc (29, i32* @dummy, i32 1), 3925868544)

@dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x i32]* @w
to i32*), i32 1)

This way, you save yourself from needing to worry about manipulating addends
in the backend, the linker will take care of it for you.

That's no worry at all, AsmPrinter::lowerConstant evaluates both
constant expressions to MCExpr: w + 4.

Do you suggest we use this to limit "reloc" to accepting only
GlobalValue as the second argument instead of an arbitrary Constant?

On the other hand, it is possible for a constant expression in the IR
to be lowered to something that is not a valid relocation target, and
it is hard to detect this problem at the IR level.

Right, this is of course a problem we already have for aliasees and constant
initializers.

Also, separating the addend from the section data allows the backend
to choose between .rel and .rela representations.

Do you have an example of a rela relocation which uses both r_addend and the
underlying value in the object file?

The point of .rela is to allow addends that do not fit into the
underlying value. Such addends can not be expressed as the third
argument of reloc(), either. And IMHO the middleend should not worry
about such details.

>>
>> I've tried implementing some of the alternatives mentioned in this
>> thread, and so far I like this syntax the most:
>>
>> i32 reloc (29, void ()* @f, 3925868544)
>> ; 29 = 0x1d = R_ARM_JUMP24
>> ; 3925868544 = 0xea000000
>>
>> Note the zeroes in the relocated data instead of 0xfffffe in the
>> original proposal. This is aligned with the way LLVM emits relocations
>> in the backend, and avoids encoding the addend in a
>> relocation-specific way in the IR.
>
>
> I am confused by this statement. If the zeros aren't what appear in the
> object file, it seems rather relocation specific to me.

These bytes will always be zeroes, which makes them not relocation
specific.
Object file contents, on the other hand, are relocation specific. In
particular the constant 0xfffffe is ARM_JUMP24 encoding for zero
offset (from the start of the current instruction).

Somehow I find this IR representation very natural - you've got data
bytes for anything that's not relocated, and the target expression
(possibly including addend).

My point is that the addend mangling between the IR and the object file
would be relocation specific.

What happens if I want to start using some new type of relocation? Will I
need to teach the MC layer about it?

>>
>> Instead, the addend can be
>> specified in the second argument with the regular IR expressions, like
>> the following:
>>
>> @w = internal global [3 x i32]
>> [i32 reloc (29, void ()* @f, 3925868544),
>> i32 reloc (29, [3 x i32]* @w, 3925868544),
>> i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
>> @w to i32*), i32 1), 3925868544)
>> ], align 4
>>
>>
>>
>> we also get relocations for elements 1 and 2 of @w optimized out for
>>
>> free. If the "addend" (i.e. the third arg of reloc) was specified as
>>
>> 0xeafffffe, the backend would have had to decode this value first.
>
>
> I think it may be ok to allow non-global constants as the second operand
> (the utility of this feature being the ability to freely RAUW a global
> without worrying about reloc constants).
>
> This doesn't necessarily need to act as an alternative means of
specifying
> an addend, though. Instead, the backend could synthesise local symbols to
> act as relocation targets. For example, your example would conceptually
> translate to:
>
> @w = internal global [3 x i32]
> [i32 reloc (29, void ()* @f, 3925868544),
> i32 reloc (29, [3 x i32]* @w, 3925868544),
> i32 reloc (29, i32* @dummy, i32 1), 3925868544)
>
> @dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x
i32]* @w
> to i32*), i32 1)
>
> This way, you save yourself from needing to worry about manipulating
addends
> in the backend, the linker will take care of it for you.

That's no worry at all, AsmPrinter::lowerConstant evaluates both
constant expressions to MCExpr: w + 4.

But you still need to worry about how "w + 4" is represented in the object
file.

Do you suggest we use this to limit "reloc" to accepting only
GlobalValue as the second argument instead of an arbitrary Constant?

No, we would accept your example and conceptually translate it into my
example.

>
>> On the other hand, it is possible for a constant expression in the IR
>> to be lowered to something that is not a valid relocation target, and
>> it is hard to detect this problem at the IR level.
>
>
> Right, this is of course a problem we already have for aliasees and
constant
> initializers.
>
>>
>> Also, separating the addend from the section data allows the backend
>> to choose between .rel and .rela representations.
>
>
> Do you have an example of a rela relocation which uses both r_addend and
the
> underlying value in the object file?

The point of .rela is to allow addends that do not fit into the
underlying value. Such addends can not be expressed as the third
argument of reloc(), either. And IMHO the middleend should not worry
about such details.

Something has to worry about them at some point. If a frontend/pass is
creating relocations, then it will need to know at least vaguely which
addend it wants. If that's the case, we can make it the single component
responsible for worrying about the whole addend, rather than the
responsibility being diffuse over a number of components.

Regarding width, I believe that no object format we support uses an addend
width wider than 64 bits, so we can just use a uint64_t.

Peter

>>
>> I've tried implementing some of the alternatives mentioned in this
>> thread, and so far I like this syntax the most:
>>
>> i32 reloc (29, void ()* @f, 3925868544)
>> ; 29 = 0x1d = R_ARM_JUMP24
>> ; 3925868544 = 0xea000000
>>
>> Note the zeroes in the relocated data instead of 0xfffffe in the
>> original proposal. This is aligned with the way LLVM emits relocations
>> in the backend, and avoids encoding the addend in a
>> relocation-specific way in the IR.
>
>
> I am confused by this statement. If the zeros aren't what appear in the
> object file, it seems rather relocation specific to me.

These bytes will always be zeroes, which makes them not relocation
specific.
Object file contents, on the other hand, are relocation specific. In
particular the constant 0xfffffe is ARM_JUMP24 encoding for zero
offset (from the start of the current instruction).

Somehow I find this IR representation very natural - you've got data
bytes for anything that's not relocated, and the target expression
(possibly including addend).

My point is that the addend mangling between the IR and the object file
would be relocation specific.

What happens if I want to start using some new type of relocation? Will I
need to teach the MC layer about it?

Yes. MC needs to know if it is pc-relative or not, at least. What's
the benefit in bypassing MC completely?

>
>>
>> Instead, the addend can be
>> specified in the second argument with the regular IR expressions, like
>> the following:
>>
>> @w = internal global [3 x i32]
>> [i32 reloc (29, void ()* @f, 3925868544),
>> i32 reloc (29, [3 x i32]* @w, 3925868544),
>> i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
>> @w to i32*), i32 1), 3925868544)
>> ], align 4
>>
>>
>>
>> we also get relocations for elements 1 and 2 of @w optimized out for
>>
>> free. If the "addend" (i.e. the third arg of reloc) was specified as
>>
>> 0xeafffffe, the backend would have had to decode this value first.
>
>
> I think it may be ok to allow non-global constants as the second operand
> (the utility of this feature being the ability to freely RAUW a global
> without worrying about reloc constants).
>
> This doesn't necessarily need to act as an alternative means of
> specifying
> an addend, though. Instead, the backend could synthesise local symbols
> to
> act as relocation targets. For example, your example would conceptually
> translate to:
>
> @w = internal global [3 x i32]
> [i32 reloc (29, void ()* @f, 3925868544),
> i32 reloc (29, [3 x i32]* @w, 3925868544),
> i32 reloc (29, i32* @dummy, i32 1), 3925868544)
>
> @dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x
> i32]* @w
> to i32*), i32 1)
>
> This way, you save yourself from needing to worry about manipulating
> addends
> in the backend, the linker will take care of it for you.

That's no worry at all, AsmPrinter::lowerConstant evaluates both
constant expressions to MCExpr: w + 4.

But you still need to worry about how "w + 4" is represented in the object
file.

It's a relocation with target "w" and addend "4".

With my proposal, the frontend/middleend controls section data
indirectly, meaning the actual final section data does not appear as
an IR constant, but we can still get whatever constant we want. On the
other hand, this representation is better for optimizations (instead
of a magic constant 0xfffffe you have a transparent expression w+4).
To optimize 0xfffffe representation, the backend would need to decode
the constant, which is the new code that has to be written for any
relocation you'd like to use in reloc(). And if we allow such
optimizations, we may end up with section bytes that are different
from the reloc() constant anyway.

Note that either way the frontend/middleend knows the size of the
relocated object (jump table entry).

Do you suggest we use this to limit "reloc" to accepting only
GlobalValue as the second argument instead of an arbitrary Constant?

No, we would accept your example and conceptually translate it into my
example.

>
>> On the other hand, it is possible for a constant expression in the IR
>> to be lowered to something that is not a valid relocation target, and
>> it is hard to detect this problem at the IR level.
>
>
> Right, this is of course a problem we already have for aliasees and
> constant
> initializers.
>
>>
>> Also, separating the addend from the section data allows the backend
>> to choose between .rel and .rela representations.
>
>
> Do you have an example of a rela relocation which uses both r_addend and
> the
> underlying value in the object file?

The point of .rela is to allow addends that do not fit into the
underlying value. Such addends can not be expressed as the third
argument of reloc(), either. And IMHO the middleend should not worry
about such details.

Something has to worry about them at some point. If a frontend/pass is
creating relocations, then it will need to know at least vaguely which
addend it wants. If that's the case, we can make it the single component
responsible for worrying about the whole addend, rather than the
responsibility being diffuse over a number of components.

Regarding width, I believe that no object format we support uses an addend
width wider than 64 bits, so we can just use a uint64_t.

You mean as a fourth argument to reloc()?

>>
>> >>
>> >> I've tried implementing some of the alternatives mentioned in this
>> >> thread, and so far I like this syntax the most:
>> >>
>> >> i32 reloc (29, void ()* @f, 3925868544)
>> >> ; 29 = 0x1d = R_ARM_JUMP24
>> >> ; 3925868544 = 0xea000000
>> >>
>> >> Note the zeroes in the relocated data instead of 0xfffffe in the
>> >> original proposal. This is aligned with the way LLVM emits
relocations
>> >> in the backend, and avoids encoding the addend in a
>> >> relocation-specific way in the IR.
>> >
>> >
>> > I am confused by this statement. If the zeros aren't what appear in
the
>> > object file, it seems rather relocation specific to me.
>>
>> These bytes will always be zeroes, which makes them not relocation
>> specific.
>> Object file contents, on the other hand, are relocation specific. In
>> particular the constant 0xfffffe is ARM_JUMP24 encoding for zero
>> offset (from the start of the current instruction).
>>
>> Somehow I find this IR representation very natural - you've got data
>> bytes for anything that's not relocated, and the target expression
>> (possibly including addend).
>
>
> My point is that the addend mangling between the IR and the object file
> would be relocation specific.
>
> What happens if I want to start using some new type of relocation? Will I
> need to teach the MC layer about it?

Yes. MC needs to know if it is pc-relative or not, at least.

Why?

What's
the benefit in bypassing MC completely?

To reduce complexity. Rather than teaching MC about every relocation to be
used with reloc, you can just teach the component that produces the reloc.

>> >
>> >>
>> >> Instead, the addend can be
>> >> specified in the second argument with the regular IR expressions,
like
>> >> the following:
>> >>
>> >> @w = internal global [3 x i32]
>> >> [i32 reloc (29, void ()* @f, 3925868544),
>> >> i32 reloc (29, [3 x i32]* @w, 3925868544),
>> >> i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
>> >> @w to i32*), i32 1), 3925868544)
>> >> ], align 4
>> >>
>> >>
>> >>
>> >> we also get relocations for elements 1 and 2 of @w optimized out for
>> >>
>> >> free. If the "addend" (i.e. the third arg of reloc) was specified as
>> >>
>> >> 0xeafffffe, the backend would have had to decode this value first.
>> >
>> >
>> > I think it may be ok to allow non-global constants as the second
operand
>> > (the utility of this feature being the ability to freely RAUW a global
>> > without worrying about reloc constants).
>> >
>> > This doesn't necessarily need to act as an alternative means of
>> > specifying
>> > an addend, though. Instead, the backend could synthesise local symbols
>> > to
>> > act as relocation targets. For example, your example would
conceptually
>> > translate to:
>> >
>> > @w = internal global [3 x i32]
>> > [i32 reloc (29, void ()* @f, 3925868544),
>> > i32 reloc (29, [3 x i32]* @w, 3925868544),
>> > i32 reloc (29, i32* @dummy, i32 1), 3925868544)
>> >
>> > @dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x
>> > i32]* @w
>> > to i32*), i32 1)
>> >
>> > This way, you save yourself from needing to worry about manipulating
>> > addends
>> > in the backend, the linker will take care of it for you.
>>
>> That's no worry at all, AsmPrinter::lowerConstant evaluates both
>> constant expressions to MCExpr: w + 4.
>
>
> But you still need to worry about how "w + 4" is represented in the
object
> file.

It's a relocation with target "w" and addend "4".

Someone needs to implement how to apply the addend 4 to the addend
0xea000000. That's what I meant by manipulating addends. You could do that
by relying on MC to do it (your proposal), or you can rely on the linker to
do it (my proposal).

With my proposal, the frontend/middleend controls section data

indirectly, meaning the actual final section data does not appear as
an IR constant, but we can still get whatever constant we want. On the
other hand, this representation is better for optimizations (instead
of a magic constant 0xfffffe you have a transparent expression w+4).

To optimize 0xfffffe representation, the backend would need to decode

the constant, which is the new code that has to be written for any
relocation you'd like to use in reloc(). And if we allow such
optimizations, we may end up with section bytes that are different
from the reloc() constant anyway.

Reloc constants are not meant to be "optimized", they are a means of
communicating from the compiler to the linker.

Note that either way the frontend/middleend knows the size of the

relocated object (jump table entry).

>
>>
>> Do you suggest we use this to limit "reloc" to accepting only
>> GlobalValue as the second argument instead of an arbitrary Constant?
>
>
> No, we would accept your example and conceptually translate it into my
> example.
>>
>>
>> >
>> >> On the other hand, it is possible for a constant expression in the IR
>> >> to be lowered to something that is not a valid relocation target, and
>> >> it is hard to detect this problem at the IR level.
>> >
>> >
>> > Right, this is of course a problem we already have for aliasees and
>> > constant
>> > initializers.
>> >
>> >>
>> >> Also, separating the addend from the section data allows the backend
>> >> to choose between .rel and .rela representations.
>> >
>> >
>> > Do you have an example of a rela relocation which uses both r_addend
and
>> > the
>> > underlying value in the object file?
>>
>> The point of .rela is to allow addends that do not fit into the
>> underlying value. Such addends can not be expressed as the third
>> argument of reloc(), either. And IMHO the middleend should not worry
>> about such details.
>
>
> Something has to worry about them at some point. If a frontend/pass is
> creating relocations, then it will need to know at least vaguely which
> addend it wants. If that's the case, we can make it the single component
> responsible for worrying about the whole addend, rather than the
> responsibility being diffuse over a number of components.
>
> Regarding width, I believe that no object format we support uses an
addend
> width wider than 64 bits, so we can just use a uint64_t.

You mean as a fourth argument to reloc()?

No, I mean the third argument.

Peter

>>
>> >>
>> >> I've tried implementing some of the alternatives mentioned in this
>> >> thread, and so far I like this syntax the most:
>> >>
>> >> i32 reloc (29, void ()* @f, 3925868544)
>> >> ; 29 = 0x1d = R_ARM_JUMP24
>> >> ; 3925868544 = 0xea000000
>> >>
>> >> Note the zeroes in the relocated data instead of 0xfffffe in the
>> >> original proposal. This is aligned with the way LLVM emits
>> >> relocations
>> >> in the backend, and avoids encoding the addend in a
>> >> relocation-specific way in the IR.
>> >
>> >
>> > I am confused by this statement. If the zeros aren't what appear in
>> > the
>> > object file, it seems rather relocation specific to me.
>>
>> These bytes will always be zeroes, which makes them not relocation
>> specific.
>> Object file contents, on the other hand, are relocation specific. In
>> particular the constant 0xfffffe is ARM_JUMP24 encoding for zero
>> offset (from the start of the current instruction).
>>
>> Somehow I find this IR representation very natural - you've got data
>> bytes for anything that's not relocated, and the target expression
>> (possibly including addend).
>
>
> My point is that the addend mangling between the IR and the object file
> would be relocation specific.
>
> What happens if I want to start using some new type of relocation? Will
> I
> need to teach the MC layer about it?

Yes. MC needs to know if it is pc-relative or not, at least.

Why?

What's
the benefit in bypassing MC completely?

To reduce complexity. Rather than teaching MC about every relocation to be
used with reloc, you can just teach the component that produces the reloc.

>
>> >
>> >>
>> >> Instead, the addend can be
>> >> specified in the second argument with the regular IR expressions,
>> >> like
>> >> the following:
>> >>
>> >> @w = internal global [3 x i32]
>> >> [i32 reloc (29, void ()* @f, 3925868544),
>> >> i32 reloc (29, [3 x i32]* @w, 3925868544),
>> >> i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x i32]*
>> >> @w to i32*), i32 1), 3925868544)
>> >> ], align 4
>> >>
>> >>
>> >>
>> >> we also get relocations for elements 1 and 2 of @w optimized out for
>> >>
>> >> free. If the "addend" (i.e. the third arg of reloc) was specified as
>> >>
>> >> 0xeafffffe, the backend would have had to decode this value first.
>> >
>> >
>> > I think it may be ok to allow non-global constants as the second
>> > operand
>> > (the utility of this feature being the ability to freely RAUW a
>> > global
>> > without worrying about reloc constants).
>> >
>> > This doesn't necessarily need to act as an alternative means of
>> > specifying
>> > an addend, though. Instead, the backend could synthesise local
>> > symbols
>> > to
>> > act as relocation targets. For example, your example would
>> > conceptually
>> > translate to:
>> >
>> > @w = internal global [3 x i32]
>> > [i32 reloc (29, void ()* @f, 3925868544),
>> > i32 reloc (29, [3 x i32]* @w, 3925868544),
>> > i32 reloc (29, i32* @dummy, i32 1), 3925868544)
>> >
>> > @dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x
>> > i32]* @w
>> > to i32*), i32 1)
>> >
>> > This way, you save yourself from needing to worry about manipulating
>> > addends
>> > in the backend, the linker will take care of it for you.
>>
>> That's no worry at all, AsmPrinter::lowerConstant evaluates both
>> constant expressions to MCExpr: w + 4.
>
>
> But you still need to worry about how "w + 4" is represented in the
> object
> file.

It's a relocation with target "w" and addend "4".

Someone needs to implement how to apply the addend 4 to the addend
0xea000000. That's what I meant by manipulating addends. You could do that
by relying on MC to do it (your proposal), or you can rely on the linker to
do it (my proposal).

It's already implemented in the same code that emits regular branches
on ARM (and 0xea000000 is not an addend; it's section data with space
(zeroes) for the target- and relocation-specific encoding of the
addend).

With my proposal, the frontend/middleend controls section data
indirectly, meaning the actual final section data does not appear as
an IR constant, but we can still get whatever constant we want. On the
other hand, this representation is better for optimizations (instead
of a magic constant 0xfffffe you have a transparent expression w+4).

To optimize 0xfffffe representation, the backend would need to decode
the constant, which is the new code that has to be written for any
relocation you'd like to use in reloc(). And if we allow such
optimizations, we may end up with section bytes that are different
from the reloc() constant anyway.

Reloc constants are not meant to be "optimized", they are a means of
communicating from the compiler to the linker.

Why not? We can have both. We still control output bytes pretty well,
and we get a nice optimization in the case when jump offset can be
calculated by the compiler and the relocation can be omitted.

Note that either way the frontend/middleend knows the size of the
relocated object (jump table entry).

>
>>
>> Do you suggest we use this to limit "reloc" to accepting only
>> GlobalValue as the second argument instead of an arbitrary Constant?
>
>
> No, we would accept your example and conceptually translate it into my
> example.
>>
>>
>> >
>> >> On the other hand, it is possible for a constant expression in the
>> >> IR
>> >> to be lowered to something that is not a valid relocation target,
>> >> and
>> >> it is hard to detect this problem at the IR level.
>> >
>> >
>> > Right, this is of course a problem we already have for aliasees and
>> > constant
>> > initializers.
>> >
>> >>
>> >> Also, separating the addend from the section data allows the backend
>> >> to choose between .rel and .rela representations.
>> >
>> >
>> > Do you have an example of a rela relocation which uses both r_addend
>> > and
>> > the
>> > underlying value in the object file?
>>
>> The point of .rela is to allow addends that do not fit into the
>> underlying value. Such addends can not be expressed as the third
>> argument of reloc(), either. And IMHO the middleend should not worry
>> about such details.
>
>
> Something has to worry about them at some point. If a frontend/pass is
> creating relocations, then it will need to know at least vaguely which
> addend it wants. If that's the case, we can make it the single component
> responsible for worrying about the whole addend, rather than the
> responsibility being diffuse over a number of components.
>
> Regarding width, I believe that no object format we support uses an
> addend
> width wider than 64 bits, so we can just use a uint64_t.

You mean as a fourth argument to reloc()?

No, I mean the third argument.

The third argument is not an addend. It's section data. For
R_ARM_JUMP24, for example, the relocation size is 3 bytes, without
0xea. Rela exists exactly for the case where both can not fit in a
fixed size space.

>
>
>>
>> >>
>> >> >>
>> >> >> I've tried implementing some of the alternatives mentioned in this
>> >> >> thread, and so far I like this syntax the most:
>> >> >>
>> >> >> i32 reloc (29, void ()* @f, 3925868544)
>> >> >> ; 29 = 0x1d = R_ARM_JUMP24
>> >> >> ; 3925868544 = 0xea000000
>> >> >>
>> >> >> Note the zeroes in the relocated data instead of 0xfffffe in the
>> >> >> original proposal. This is aligned with the way LLVM emits
>> >> >> relocations
>> >> >> in the backend, and avoids encoding the addend in a
>> >> >> relocation-specific way in the IR.
>> >> >
>> >> >
>> >> > I am confused by this statement. If the zeros aren't what appear in
>> >> > the
>> >> > object file, it seems rather relocation specific to me.
>> >>
>> >> These bytes will always be zeroes, which makes them not relocation
>> >> specific.
>> >> Object file contents, on the other hand, are relocation specific. In
>> >> particular the constant 0xfffffe is ARM_JUMP24 encoding for zero
>> >> offset (from the start of the current instruction).
>> >>
>> >> Somehow I find this IR representation very natural - you've got data
>> >> bytes for anything that's not relocated, and the target expression
>> >> (possibly including addend).
>> >
>> >
>> > My point is that the addend mangling between the IR and the object
file
>> > would be relocation specific.
>> >
>> > What happens if I want to start using some new type of relocation?
Will
>> > I
>> > need to teach the MC layer about it?
>>
>> Yes. MC needs to know if it is pc-relative or not, at least.
>
>
> Why?
>
>>
>> What's
>> the benefit in bypassing MC completely?
>
>
> To reduce complexity. Rather than teaching MC about every relocation to
be
> used with reloc, you can just teach the component that produces the
reloc.
>
>> >
>> >> >
>> >> >>
>> >> >> Instead, the addend can be
>> >> >> specified in the second argument with the regular IR expressions,
>> >> >> like
>> >> >> the following:
>> >> >>
>> >> >> @w = internal global [3 x i32]
>> >> >> [i32 reloc (29, void ()* @f, 3925868544),
>> >> >> i32 reloc (29, [3 x i32]* @w, 3925868544),
>> >> >> i32 reloc (29, i32* getelementptr (i32, i32* bitcast ([3 x
i32]*
>> >> >> @w to i32*), i32 1), 3925868544)
>> >> >> ], align 4
>> >> >>
>> >> >>
>> >> >>
>> >> >> we also get relocations for elements 1 and 2 of @w optimized out
for
>> >> >>
>> >> >> free. If the "addend" (i.e. the third arg of reloc) was specified
as
>> >> >>
>> >> >> 0xeafffffe, the backend would have had to decode this value first.
>> >> >
>> >> >
>> >> > I think it may be ok to allow non-global constants as the second
>> >> > operand
>> >> > (the utility of this feature being the ability to freely RAUW a
>> >> > global
>> >> > without worrying about reloc constants).
>> >> >
>> >> > This doesn't necessarily need to act as an alternative means of
>> >> > specifying
>> >> > an addend, though. Instead, the backend could synthesise local
>> >> > symbols
>> >> > to
>> >> > act as relocation targets. For example, your example would
>> >> > conceptually
>> >> > translate to:
>> >> >
>> >> > @w = internal global [3 x i32]
>> >> > [i32 reloc (29, void ()* @f, 3925868544),
>> >> > i32 reloc (29, [3 x i32]* @w, 3925868544),
>> >> > i32 reloc (29, i32* @dummy, i32 1), 3925868544)
>> >> >
>> >> > @dummy = internal alias i32* getelementptr (i32, i32* bitcast ([3 x
>> >> > i32]* @w
>> >> > to i32*), i32 1)
>> >> >
>> >> > This way, you save yourself from needing to worry about
manipulating
>> >> > addends
>> >> > in the backend, the linker will take care of it for you.
>> >>
>> >> That's no worry at all, AsmPrinter::lowerConstant evaluates both
>> >> constant expressions to MCExpr: w + 4.
>> >
>> >
>> > But you still need to worry about how "w + 4" is represented in the
>> > object
>> > file.
>>
>> It's a relocation with target "w" and addend "4".
>
>
> Someone needs to implement how to apply the addend 4 to the addend
> 0xea000000. That's what I meant by manipulating addends. You could do
that
> by relying on MC to do it (your proposal), or you can rely on the linker
to
> do it (my proposal).

It's already implemented in the same code that emits regular branches
on ARM (and 0xea000000 is not an addend; it's section data with space
(zeroes) for the target- and relocation-specific encoding of the
addend).

Okay, in your proposal it's "section data".

>> With my proposal, the frontend/middleend controls section data
>> indirectly, meaning the actual final section data does not appear as
>> an IR constant, but we can still get whatever constant we want. On the
>> other hand, this representation is better for optimizations (instead
>> of a magic constant 0xfffffe you have a transparent expression w+4).
>>
>> To optimize 0xfffffe representation, the backend would need to decode
>> the constant, which is the new code that has to be written for any
>> relocation you'd like to use in reloc(). And if we allow such
>> optimizations, we may end up with section bytes that are different
>> from the reloc() constant anyway.
>
>
> Reloc constants are not meant to be "optimized", they are a means of
> communicating from the compiler to the linker.

Why not? We can have both. We still control output bytes pretty well,
and we get a nice optimization in the case when jump offset can be
calculated by the compiler and the relocation can be omitted.

That optimization is at the cost of complexity in MC, and it doesn't really
matter in the end because (1) these constants are rare and (2) the final
linked executable or DSO will have resolved the relocations anyway.

>> Note that either way the frontend/middleend knows the size of the
>> relocated object (jump table entry).
>>
>> >
>> >>
>> >> Do you suggest we use this to limit "reloc" to accepting only
>> >> GlobalValue as the second argument instead of an arbitrary Constant?
>> >
>> >
>> > No, we would accept your example and conceptually translate it into my
>> > example.
>> >>
>> >>
>> >> >
>> >> >> On the other hand, it is possible for a constant expression in the
>> >> >> IR
>> >> >> to be lowered to something that is not a valid relocation target,
>> >> >> and
>> >> >> it is hard to detect this problem at the IR level.
>> >> >
>> >> >
>> >> > Right, this is of course a problem we already have for aliasees and
>> >> > constant
>> >> > initializers.
>> >> >
>> >> >>
>> >> >> Also, separating the addend from the section data allows the
backend
>> >> >> to choose between .rel and .rela representations.
>> >> >
>> >> >
>> >> > Do you have an example of a rela relocation which uses both
r_addend
>> >> > and
>> >> > the
>> >> > underlying value in the object file?
>> >>
>> >> The point of .rela is to allow addends that do not fit into the
>> >> underlying value. Such addends can not be expressed as the third
>> >> argument of reloc(), either. And IMHO the middleend should not worry
>> >> about such details.
>> >
>> >
>> > Something has to worry about them at some point. If a frontend/pass is
>> > creating relocations, then it will need to know at least vaguely which
>> > addend it wants. If that's the case, we can make it the single
component
>> > responsible for worrying about the whole addend, rather than the
>> > responsibility being diffuse over a number of components.
>> >
>> > Regarding width, I believe that no object format we support uses an
>> > addend
>> > width wider than 64 bits, so we can just use a uint64_t.
>>
>> You mean as a fourth argument to reloc()?
>
>
> No, I mean the third argument.

The third argument is not an addend. It's section data. For
R_ARM_JUMP24, for example, the relocation size is 3 bytes, without
0xea. Rela exists exactly for the case where both can not fit in a
fixed size space.

My proposal would be to store it inline or in r_addend. That makes it an
addend (I know that it isn't always added to directly, but that's what ELF
calls it).

Peter

To the right list this time.

To the right list this time.

Hi Peter,

Coming back to his now.

IFCC, the previous attempt to teach LLVM to emit jump tables, was removed
for complicating how functions are emitted, in particular requiring a
subtarget-specific instruction emitter available in subtarget-independent
code. However, the form of a jump table entry is generally well known to

In general I think we can handle the subtarget specific aspect in the
same way that we handle module level inline assembly. Anything at that
object file level needs to be generic enough for the STI we create there
anyhow and should work for your needs in creating a jump table.

How would you create your jump tables if you were able to generate code
in this fashion?

Under that approach we could in principle imagine a new type of
GlobalObject that would represent a jump table and that would hold
reference to its entries as "operands". The asm printer could then use some
target-specific callback to turn those operand references into jump table
entries with EmitInstruction and an STI created like the inline asm STI.

However I'm not sure if this would be the best way of doing things. It
would require using more backend machinery than strictly necessary, and for
other reasons (below).

Alternately, (though I'm not a huge fan) we could create them using
inline assembly as a workaround to get this aspect of your code moving
forward.

Agree that if we can't gain consensus here this would be an uncontroversial
first step.

I would very much like to avoid doing things like encoding relocation
entries into the IR - it seems to be the wrong level to handle that type of
target specific information. I worry that it will create issues with the
folk that are trying to move us to a level where we can delete the IR at
code generation time as well. I've added Jim since I think his team is
looking into that. We might want an MIR level ability to encode jump
tables/constants.

I'd argue that target specific information at this level is to a certain
extent reasonable because it is not much different to other
target/object-specific constructs such as intrinsics, linkage (to a certain
extent) and visibility. i.e. your frontend needs to choose an appropriate
linkage/visibility for a global so at some level it needs to be aware of
the object format.

Although I am sceptical that emitting global value initialisers via MI
would be beneficial, I think that if we do do that, there's already a
substantial representational surface area (e.g. the different types of
ConstantExpr that already exist) that would need an MI representation. As
far as reloc goes it seems like it would be similar to "just another" sort
of ConstantExpr that would need an MI representation; I don't really see
how it would be less or more difficult to handle than other kinds of
ConstantExpr. In fact, I suspect the lowering to MC would be trivial if we
implement something like the ".reloc" directive.

Of course if we went with something like a new type of GlobalObject we
would need an MI-level modelling for that as well. If we wanted to fix
PR17633 or do something else that would otherwise require reloc, we'd need
some separate way of modelling that. It just seems like more code and more
burden overall.

Peter

I just want to point out that this thread is somewhat fragmented and very long.

Also, the subject “arbitrary relocations in constant global initializers” seems to have nothing to do with jump tables at first glance.

I understand that it may actually have something to do with them, but I would never have guessed that was the problem you were setting out to solve based on the subject or the first many pages of email.

I would suggest starting a fresh thread with a more brief and focused RFC, ideally with the problem clearly identified in addition to the proposed solution.

My 2 cents.
-Chandler