[RFC] Section Declarations in LLVM IR

Hi all,

I’d like to propose that LLVM IR have a mechanism to describe sections in a more explicit way than we can today.

Currently, we provide an attribute called “section” on GlobalVariables and Functions. This attribute will choose which section the Value will end up in.
However, it does not describe the attributes of the section.
Without a way of describing the section, we try to infer the section’s attribute from the first Value from that section that MC comes across.

This means that if the first value is constant, the rest of the values
will end up in .rodata even if the intention was for them to be mutable.

Equally problematic is our inability to verify the appropriate use of sections, consider the following:

  • One global value is defined to be thread_local and in section “.foo”
  • Another value is not defined to be thread_local and in section “.foo”

The IR verifier does not catch this nonsensical arrangement of IR.

Further motivation stems from being able to represent the MS ABI’s RTTI data:

  • A single COMDAT section is created which holds both the RTTI data and the vftable for a type.
  • If there is no vftable, the section will start with just the vftable.
  • The entire section is marked with a linkage that indicates the linker to pick the largest.

I think LLVM needs new IR to represent these semantics properly.

I propose that we do the following:

  • Sections are represented at module scope and have an identifier that starts with ‘$’
  • Sections have linkage, all Values inside of a section must agree with the section’s linkage.
  • Sections don’t have visibility, Values may disagree with one another about how visible they are.
  • Sections have attributes annotating what semantics they provide (read, write, execute, etc.)

A concrete example of a const variable inside of a read-only section:

$.my_section = appending read
@my_var = constant float 1.0, section $.my_section, align 4

The following is how I imagine MS RTTI would look like if we had this IR construct:

$.vdata_for_type = pick_largest read
@my_rtti_for_type = pick_largest unnamed_addr constant %rtti_ptr_ty @rtti_complete_object_locator, section $.vdata_for_type, align 4
@vftable_for_type = pick_largest unnamed_addr constant [1 x i8*] [i8* bitcast (void (%struct.S*)* @"\01?fun@S@@UAEXXZ" to i8*)], section $.vdata_for_type, align 4

Attached is a patch to the LangRef.

Thanks for reading!

SectionIRLangRef.patch (2.19 KB)

Hi all,

I'd like to propose that LLVM IR have a mechanism to describe sections in a
more explicit way than we can today.

Currently, we provide an attribute called "section" on GlobalVariables and
Functions. This attribute will choose which section the Value will end up
in.
However, it does not describe the attributes of the section.
Without a way of describing the section, we try to infer the section's
attribute from the first Value from that section that MC comes across.

This means that if the first value is constant, the rest of the values
will end up in .rodata even if the intention was for them to be mutable.

Equally problematic is our inability to verify the appropriate use of
sections, consider the following:

- One global value is defined to be thread_local and in section ".foo"
- Another value is *not* defined to be thread_local and in section ".foo"

The IR verifier does not catch this nonsensical arrangement of IR.

Further motivation stems from being able to represent the MS ABI's RTTI
data:

- A single COMDAT section is created which holds both the RTTI data and the
vftable for a type.
- If there is no vftable, the section will start with just the vftable.
- The entire section is marked with a linkage that indicates the linker to
pick the largest.

I think LLVM needs new IR to represent these semantics properly.

I propose that we do the following:

- Sections are represented at module scope and have an identifier that
starts with '$'

OK until here.

- Sections have linkage, all Values inside of a section must agree with the
section's linkage.

This seems way too restrictive and I don't think it maps to what
object files actually do. We should be able to for example do

$foo = section "bar", ..... comdat "zed", select_largest, etc

@baz = private ... section $foo
@bah = linkonce_odr alias baz, offset 4

This fully general I think. In particular, the above example
represents a section name bar, in a comdat represented by the symbol
zed. The contents of the section is that of @baz and the only visible
symbol in the section is bah, which is at offset 4.

I think we can simply require that every global object in a section
that is a comdat must be isDiscardableIfUnused.

This should also be usable to represent sections that are not comdats,
and those can have any mix of global values as they do now. We just
gain the ability to define more information about the section. In
fact, I would probably suggest getting this in first, for making the
patches incremental.

- Sections don't have visibility, Values may disagree with one another about
how visible they are.
- Sections have attributes annotating what semantics they provide (read,
write, execute, etc.)

Both good.

A concrete example of a const variable inside of a read-only section:

$.my_section = appending read
@my_var = constant float 1.0, section $.my_section, align 4

The following is how I imagine MS RTTI would look like if we had this IR
construct:

$.vdata_for_type = pick_largest read
@my_rtti_for_type = pick_largest unnamed_addr constant %rtti_ptr_ty
@rtti_complete_object_locator, section $.vdata_for_type, align 4
@vftable_for_type = pick_largest unnamed_addr constant [1 x i8*] [i8*
bitcast (void (%struct.S*)* @"\01?fun@S@@UAEXXZ" to i8*)], section
$.vdata_for_type, align 4

Part of the problem is that it seems that the order is important. We
really should not require that at the llvm IR level. The above global
values could be output in any order.

Attached is a patch to the LangRef.

Thanks for reading!

Thanks for working on this! It is an excellent step towards getting
better comdat support in LLVM!

Cheers,
Rafael

> I think LLVM needs new IR to represent these semantics properly.

Cool! This proposal makes a lot of sense to me.

> - Sections have linkage, all Values inside of a section must agree with
the
> section's linkage.

This seems way too restrictive and I don't think it maps to what
object files actually do. We should be able to for example do

$foo = section "bar", ..... comdat "zed", select_largest, etc

@baz = private ... section $foo
@bah = linkonce_odr alias baz, offset 4

This fully general I think. In particular, the above example
represents a section name bar, in a comdat represented by the symbol
zed. The contents of the section is that of @baz and the only visible
symbol in the section is bah, which is at offset 4.

I like this proposal. Any reason to use an explicit offset rather than
allow GEPs into aliases?

The following is how I imagine MS RTTI would look like if we had this IR
> construct:
>
> $.vdata_for_type = pick_largest read
> @my_rtti_for_type = pick_largest unnamed_addr constant %rtti_ptr_ty
> @rtti_complete_object_locator, section $.vdata_for_type, align 4
> @vftable_for_type = pick_largest unnamed_addr constant [1 x i8*] [i8*
> bitcast (void (%struct.S*)* @"\01?fun@S@@UAEXXZ" to i8*)], section
> $.vdata_for_type, align 4

Part of the problem is that it seems that the order is important. We
really should not require that at the llvm IR level. The above global
values could be output in any order.

Yeah, let's not rely on order of the IR.

I like this proposal. Any reason to use an explicit offset rather than
allow GEPs into aliases?

Part of pr10367. A contant gep is more generic than whan an alias can
actually represent I started to work on it just to find that MC would
misassemble the generated code, which is what I have been fixing the
last few days.

Cheers,
Rafael

Just a heads up, I’m planning on sending patches for this soon.

Awesome. I think I fixed all the blocking MC issues and I am now
coding pr10367, so we should have all the parts in place for windows
rtti soon :slight_smile:

My current idea for global alias is to split GlobalValue into
GlobalName (or GlobalAddress?) and GlobalValue. GlobalAlias would
inherit from GlobalName and not have a alignment or section of its
own. With that than it becomes easy to say that a GlobalAlias is just
an offset into a GlobalValue with some different information (linkage,
visibility, etc).

Cheers,
Rafael

> Just a heads up, I'm planning on sending patches for this soon.

Awesome. I think I fixed all the blocking MC issues and I am now
coding pr10367, so we should have all the parts in place for windows
rtti soon :slight_smile:

Nice!

My current idea for global alias is to split GlobalValue into
GlobalName (or GlobalAddress?) and GlobalValue. GlobalAlias would
inherit from GlobalName and not have a alignment or section of its
own. With that than it becomes easy to say that a GlobalAlias is just
an offset into a GlobalValue with some different information (linkage,
visibility, etc).

I'm worried that optimizations won't know about the offset portion of an
alias. Do you think it's worth having something like a GlobalOffset that
is equivalent to an alias with an offset? If we add an offset to
GlobalAlias, most optimizations won't know about it and will have bugs. On
the other hand, it's nice to have fewer IL constructs.

My current idea for global alias is to split GlobalValue into
GlobalName (or GlobalAddress?) and GlobalValue. GlobalAlias would
inherit from GlobalName and not have a alignment or section of its
own. With that than it becomes easy to say that a GlobalAlias is just
an offset into a GlobalValue with some different information (linkage,
visibility, etc).

I'm worried that optimizations won't know about the offset portion of an
alias. Do you think it's worth having something like a GlobalOffset that is
equivalent to an alias with an offset? If we add an offset to GlobalAlias,
most optimizations won't know about it and will have bugs. On the other
hand, it's nice to have fewer IL constructs.

That is an interesting idea. I guess we could make it a C++ level
thing only. At the IR level it would still be just an alias, but when
reading it in we could create a different c++ type to easy the
transition. I am not sure if it needed or not right now, I will
probably give it a try and see.

Cheers,
Rafael