RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler

snidertm · April 30, 2019, 3:17pm

Hello All,

In ARM embedded applications, there are some compilers that support useful function and variable attributes that help the compiler communicate information about symbols to downstream object consumers (i.e. linkers).

One such attribute is the “location” attribute. This attribute can be applied to a global or local static data object or a function to indicate to the linker that the definition of the data object or function should be placed at a specific address in memory.

For example, in the following code:

#include <stdio.h>

extern int a;

int a attribute((location(0x1000))) = 4;

struct bstruct

{

int f1;

int f2;

};

struct bstruct b attribute((location(0x1004))) = {10, 12};

double c attribute((location(0x1010))) = 1.0;

char d attribute((location(0x2000))) = {1, 2, 3, 4};

void foo(double x) __attribute((location(0x4000)));

void foo(double x) { printf(“%f\n”, x); }

A location attribute has been applied to several data objects and the function “foo.” The compiler would then encode information into the compiled object file that tells the downstream linker about these memory placement constraints on the data objects and function.

Without extending the ELF object format, how would this work?

I propose to encode metadata information about a symbol in special absolute symbols, “__sym_attr_metadata.”, that the linker can recognize when scanning the symbol table for an incoming object file. In an ELF symbol table entry:

typedef struct {

Elf32_Word st_name;

Elf32_Addr st_value;

Elf32_Word st_size;

unsigned char st_info;

unsigned char st_other;

Elf32_Half st_shndx;

} Elf32_Sym;

typedef struct {

Elf64_Word st_name;

unsigned char st_info;

unsigned char st_other;

Elf64_Half st_shndx;

Elf64_Addr st_value;

Elf64_Xword st_size;

} Elf64_Sym;

The st_size and st_value fields could be used to represent attribute information about a given symbol:

The st_size field can be split into an attribute ID and a symbol index for the symbol that the attribute applies to
- attribute ID: bits 0…7
- symbol index: bits 8…31
The st_value field can contain the value associated with the attribute (i.e. the address argument of a location attribute)

If the compiler is generating assembly code, a new directive similar to the .eabi_attribute can be used:

.symbol_attribute , ,

Where:

symbol name - will unambiguously identify the symbol that the attribute/value pair applies to
attribute kind - is an unsigned integer between 1 and 255 that specifies the kind of attribute to be applied to the symbol
- I propose a starting base set of 2 attribute IDs: used (1), location (2)
- the compiler will emit the integer constant that identifies the attribute kind
attribute value - a value that is appropriate for the specified attribute kind

Thoughts? Comments? Concerns?

The anticipated next steps would be to add support for the location attribute and update the ARM/ELF LLVM back-end to support encoding the used attribute with the new mechanism.

~ Todd Snider

Code Generation Tools Group

Texas Instruments Incorporated

Peter_Smith · April 30, 2019, 3:51pm

Hello All,

In ARM embedded applications, there are some compilers that support useful function and variable attributes that help the compiler communicate information about symbols to downstream object consumers (i.e. linkers).

One such attribute is the “location” attribute. This attribute can be applied to a global or local static data object or a function to indicate to the linker that the definition of the data object or function should be placed at a specific address in memory.

For example, in the following code:

#include <stdio.h>

extern int a;

int a __attribute__((location(0x1000))) = 4;

struct bstruct

{

    int f1;

    int f2;

};

struct bstruct b __attribute__((location(0x1004))) = {10, 12};

double c __attribute__((location(0x1010))) = 1.0;

char d __attribute__((location(0x2000))) = {1, 2, 3, 4};

void foo(double x) __attribute((location(0x4000)));

void foo(double x) { printf("%f\n", x); }

A location attribute has been applied to several data objects and the function “foo.” The compiler would then encode information into the compiled object file that tells the downstream linker about these memory placement constraints on the data objects and function.

Without extending the ELF object format, how would this work?

I propose to encode metadata information about a symbol in special absolute symbols, “__sym_attr_metadata.<int>”, that the linker can recognize when scanning the symbol table for an incoming object file. In an ELF symbol table entry:

typedef struct {

       Elf32_Word st_name;

       Elf32_Addr st_value;

       Elf32_Word st_size;

       unsigned char st_info;

       unsigned char st_other;

       Elf32_Half st_shndx;

} Elf32_Sym;

typedef struct {

       Elf64_Word st_name;

       unsigned char st_info;

       unsigned char st_other;

       Elf64_Half st_shndx;

       Elf64_Addr st_value;

       Elf64_Xword st_size;

} Elf64_Sym;

The st_size and st_value fields could be used to represent attribute information about a given symbol:

The st_size field can be split into an attribute ID and a symbol index for the symbol that the attribute applies to

attribute ID: bits 0..7
symbol index: bits 8..31

The st_value field can contain the value associated with the attribute (i.e. the address argument of a location attribute)

If the compiler is generating assembly code, a new directive similar to the .eabi_attribute can be used:

        .symbol_attribute <symbol name>, <attribute kind>, <attribute value>

Where:

symbol name - will unambiguously identify the symbol that the attribute/value pair applies to
attribute kind - is an unsigned integer between 1 and 255 that specifies the kind of attribute to be applied to the symbol

I propose a starting base set of 2 attribute IDs: used (1), location (2)
the compiler will emit the integer constant that identifies the attribute kind

attribute value - a value that is appropriate for the specified attribute kind

Thoughts? Comments? Concerns?

Hello Todd,

Thanks for bringing this up, I've got a few comments for you based on
the implementation of a similar attribute in another Embedded Compiler
(Documentation – Arm Developer).
In that case it was __attribute__((at(address))) but the name is not
that important.

The communication with the linker in that case was via section name
and not symbol, from memory at(<address>) translated to a section name
of .ARM.__at_<address>. For us this had some advantages:
- We could use __attribute__((section(".ARM.__at_<address>"))) when
the compiler didn't support the attribute, it also needed no support
in the assembler. This wasn't ideal as it is nice to be able to use
expressions for the address, but it gets you most of the way there.
- In practice you'd likely need a separate section for each variable
to avoid problems at link time. For example if you had two variables
with non-contiguous locations you'd most likely not want these in the
same section so this mapped quite well to something similar to
__attribute__((section(name))).
- We did find some properties of __attribute__((section("name")))
inconvenient, especially that variables would come out as SHT_PROGBITS
when in many cases the user wanted SHT_NOBITS (memory mapped
peripheral), we had our custom attribute fix that.

If you used a section name rather than a symbol then you may not need
any backend changes and it would generalise over all ELF targets.
Linker support is another question entirely though.

Peter

snidertm · April 30, 2019, 6:11pm

Hi Peter,

Thanks for the response.

If we set aside the discussion of the relationship between sections and the application of the "location" or "at" attribute for a moment, do you have any objections to the proposed method of encoding metadata information about symbols (whether they are associated with actual data objects, functions, or sections) in the ELF object file?

There are other use cases that would benefit from this encoding method besides the location attribute. The used attribute is an example. There are likely to be others.

I would agree with you that applying a "location" or "at" attribute to a data object or function definition must require that the compiler generate the definition of the applicable data object or function into its own section, and that section may only contain the definition of that data object or function.

One of the advantages of attaching the location attribute information to the symbol is that if the symbol is associated with a common data object, then the location attribute ends up being applied to the definition that the common symbol resolves to.

~ Todd

pcc · April 30, 2019, 6:30pm

Hi Todd,

In your proposal, you’re storing the symbol index in the st_size field in the symbol table. One of the main problems with this sort of approach is that tools such as objcopy will reorder the symbols in the symbol table, which will invalidate any stored indexes. This is one of the reasons why I designed address-significance tables (which contain symbol indexes) to try to detect cases where a tool such as objcopy has manipulated the object file.

An alternative approach would be to represent the symbol attribute as a section containing:

The attribute data.
A relocation pointing to the symbol with the attribute.
Objcopy et al already know how to rewrite relocation sections, so this works out quite well. This is the approach that I’m taking in https://reviews.llvm.org/D60242 to associate partition names with symbols.

Thanks,
Peter

jh7370 · May 1, 2019, 9:42am

Adding this sort of information to a section feels like a more natural way to me than trying to use special symbols. A question that is probably worth thinking about is whether this new section would encode just the new location data you require or whether it should be extensible for other metadata. The latter approach has the advantage of reducing the number of symbol references required should we continue to add more data like this, but it leads to the need to handle varying amounts of data per symbol entry which is not so nice.

Peter_Smith · May 1, 2019, 10:09am

Hi Peter,

Thanks for the response.

If we set aside the discussion of the relationship between sections and the application of the "location" or "at" attribute for a moment, do you have any objections to the proposed method of encoding metadata information about symbols (whether they are associated with actual data objects, functions, or sections) in the ELF object file?

There are other use cases that would benefit from this encoding method besides the location attribute. The used attribute is an example. There are likely to be others.

I think that some problems map naturally to symbol, section,
relocation or even a separate custom metadata section. As Peter
Collingbourne points out, there can be problems with tools like
objdump when they encounter conventions they don't understand. I think
that the scheme as outlined may not be the best fit.

I would agree with you that applying a "location" or "at" attribute to a data object or function definition must require that the compiler generate the definition of the applicable data object or function into its own section, and that section may only contain the definition of that data object or function.

One of the advantages of attaching the location attribute information to the symbol is that if the symbol is associated with a common data object, then the location attribute ends up being applied to the definition that the common symbol resolves to.

In that particular case I'd say is it worth supporting common symbols
with this attribute? For example if I write something like int foo
__attribute__((location(0x1000))) in one file, and int foo
__attribute__((location(0x2000))) in another file, I'd probably want
that to be a multiple definition error rather than silently choosing
one of them and resolving it to one of the two locations silently, or
I'd want the linker to tell me about clashing metadata? When int foo
__attribute__((section("name"))); is used the definition is not common
even if -fcommon is in use, perhaps that would be a better model for
__attribute__((location(<address)))?

A possible weakness of using absolute metadata symbols is with comdat
groups. Although I suspect the number of sensible use cases of
__attribute__((location(<address>))) and templates is low, it is
possible that something similar to the contrived example below could
occur:
template <class T>
struct Foo {
T bar;
T foo() __attribute__((location(0x1000))) { return bar; }
};

A linker would have to know which absolute metadata symbols to select
and which to ignore, a solveable problem but it would be good to have
a way to have metadata drop out naturally through the group selection
process. Without common symbols it could be possible to define the
metadata symbols relative to the section. It could also be possible to
have a metadata section included in the group.

Peter

christof_douma-arm · May 1, 2019, 12:22pm

Hi Snider.

As you and Peter mentioned there are indeed toolchains that allow location placement from within the C/C++ source code, using attributes or similar. I always wonder if such extension is worth the effort. There are downsides like the non-standard ways of communicating this information to the linker, different places that control location of things (linker and compiler sources). I would love to understand more of what is problematic in the more common approach for placement that is already available.

The conceptual model I follow is that the C/C++ source describes the semantics of the program, and the linker sources (LD scripts or similar, depending on the toolchain in use) describe the placement of the program on the system/device. This gives rise to two common ways for placement that are used a lot that work without any non-standard extensions:

* Define a variable in C/C++ in a dedicated section that a linker can move individually ('section' attribute in the compiler, and regular section placement in the linker).
* Define a symbol in the linker at a certain place and used an extern declaration in C/C++. At this point you can either take the address of it (commonly used) or use it as a regular object (less common).

I am very interested to hear what the weakness in these methods are, to understand the need of a 'location' attribute.

Thanks,
Christof

    >
    >
    >
    > Hello All,
    >
    >
    >
    > In ARM embedded applications, there are some compilers that support useful function and variable attributes that help the compiler communicate information about symbols to downstream object consumers (i.e. linkers).
    >
    >
    >
    > One such attribute is the “location” attribute. This attribute can be applied to a global or local static data object or a function to indicate to the linker that the definition of the data object or function should be placed at a specific address in memory.
    >
    >
    >
    > For example, in the following code:
    >
    >
    >
    > #include <stdio.h>
    >
    >
    >
    > extern int a;
    >
    > int a __attribute__((location(0x1000))) = 4;
    >
    >
    >
    > struct bstruct
    >
    > {
    >
    > int f1;
    >
    > int f2;
    >
    > };
    >
    >
    >
    > struct bstruct b __attribute__((location(0x1004))) = {10, 12};
    >
    > double c __attribute__((location(0x1010))) = 1.0;
    >
    > char d __attribute__((location(0x2000))) = {1, 2, 3, 4};
    >
    > void foo(double x) __attribute((location(0x4000)));
    >
    >
    >
    > void foo(double x) { printf("%f\n", x); }
    >
    >
    >
    > A location attribute has been applied to several data objects and the function “foo.” The compiler would then encode information into the compiled object file that tells the downstream linker about these memory placement constraints on the data objects and function.
    >
    >
    >
    > Without extending the ELF object format, how would this work?
    >
    >
    >
    > I propose to encode metadata information about a symbol in special absolute symbols, “__sym_attr_metadata.<int>”, that the linker can recognize when scanning the symbol table for an incoming object file. In an ELF symbol table entry:
    >
    >
    >
    > typedef struct {
    >
    > Elf32_Word st_name;
    >
    > Elf32_Addr st_value;
    >
    > Elf32_Word st_size;
    >
    > unsigned char st_info;
    >
    > unsigned char st_other;
    >
    > Elf32_Half st_shndx;
    >
    > } Elf32_Sym;
    >
    >
    >
    > typedef struct {
    >
    > Elf64_Word st_name;
    >
    > unsigned char st_info;
    >
    > unsigned char st_other;
    >
    > Elf64_Half st_shndx;
    >
    > Elf64_Addr st_value;
    >
    > Elf64_Xword st_size;
    >
    > } Elf64_Sym;
    >
    >
    >
    > The st_size and st_value fields could be used to represent attribute information about a given symbol:
    >
    >
    >
    > The st_size field can be split into an attribute ID and a symbol index for the symbol that the attribute applies to
    >
    > attribute ID: bits 0..7
    > symbol index: bits 8..31
    >
    > The st_value field can contain the value associated with the attribute (i.e. the address argument of a location attribute)
    >
    >
    >
    > If the compiler is generating assembly code, a new directive similar to the .eabi_attribute can be used:
    >
    >
    >
    > .symbol_attribute <symbol name>, <attribute kind>, <attribute value>
    >
    >
    >
    > Where:
    >
    > symbol name - will unambiguously identify the symbol that the attribute/value pair applies to
    > attribute kind - is an unsigned integer between 1 and 255 that specifies the kind of attribute to be applied to the symbol
    >
    > I propose a starting base set of 2 attribute IDs: used (1), location (2)
    > the compiler will emit the integer constant that identifies the attribute kind
    >
    > attribute value - a value that is appropriate for the specified attribute kind
    >
    >
    >
    > Thoughts? Comments? Concerns?
    >

Hello Todd,

    Thanks for bringing this up, I've got a few comments for you based on
    the implementation of a similar attribute in another Embedded Compiler
    (Documentation – Arm Developer).
     In that case it was __attribute__((at(address))) but the name is not
    that important.

    The communication with the linker in that case was via section name
    and not symbol, from memory at(<address>) translated to a section name
    of .ARM.__at_<address>. For us this had some advantages:
    - We could use __attribute__((section(".ARM.__at_<address>"))) when
    the compiler didn't support the attribute, it also needed no support
    in the assembler. This wasn't ideal as it is nice to be able to use
    expressions for the address, but it gets you most of the way there.
    - In practice you'd likely need a separate section for each variable
    to avoid problems at link time. For example if you had two variables
    with non-contiguous locations you'd most likely not want these in the
    same section so this mapped quite well to something similar to
    __attribute__((section(name))).
    - We did find some properties of __attribute__((section("name")))
    inconvenient, especially that variables would come out as SHT_PROGBITS
    when in many cases the user wanted SHT_NOBITS (memory mapped
    peripheral), we had our custom attribute fix that.

    If you used a section name rather than a symbol then you may not need
    any backend changes and it would generalise over all ELF targets.
    Linker support is another question entirely though.

Peter

Finkel_Hal_J · May 1, 2019, 2:03pm

Hi Snider.

As you and Peter mentioned there are indeed toolchains that allow location placement from within the C/C++ source code, using attributes or similar. I always wonder if such extension is worth the effort. There are downsides like the non-standard ways of communicating this information to the linker, different places that control location of things (linker and compiler sources). I would love to understand more of what is problematic in the more common approach for placement that is already available.

The conceptual model I follow is that the C/C++ source describes the semantics of the program, and the linker sources (LD scripts or similar, depending on the toolchain in use) describe the placement of the program on the system/device. This gives rise to two common ways for placement that are used a lot that work without any non-standard extensions:

* Define a variable in C/C++ in a dedicated section that a linker can move individually ('section' attribute in the compiler, and regular section placement in the linker).
* Define a symbol in the linker at a certain place and used an extern declaration in C/C++. At this point you can either take the address of it (commonly used) or use it as a regular object (less common).

I am very interested to hear what the weakness in these methods are, to understand the need of a 'location' attribute.

I like the idea of these fixed-location variables being defined as
actual global variables. The optimizer can actually reason about them
that way. The common alternative that I've seen is that programmers
don't generate variables at all, but rather, do something like this:

#define DEV_DATA (*((volatile unsigned long *)(0x2000A000)))

and the optimizer needs to make very pessimistic assumptions about the
aliasing, etc. in this case. However, in the end, do we actually want
symbols that the linker resolves? Or do we want the immediate address?
Would the latter be more efficient?

Having to define sections for each of these variables and then maintain
the location mappings in a linker script can be annoying -- on the other
hand, if you target multiple systems for which the addresses might be
different then having the locations in a separate file might be best anyway.

What I don't understand about this proposal is how general it is. How
much of what is specified in a linker script can be specified this way?
Do we really just want a way to embed linker-script fragments into an
object file?

-Hal

Peter_Smith · May 1, 2019, 3:27pm

> Hi Snider.
>
> As you and Peter mentioned there are indeed toolchains that allow location placement from within the C/C++ source code, using attributes or similar. I always wonder if such extension is worth the effort. There are downsides like the non-standard ways of communicating this information to the linker, different places that control location of things (linker and compiler sources). I would love to understand more of what is problematic in the more common approach for placement that is already available.
>
> The conceptual model I follow is that the C/C++ source describes the semantics of the program, and the linker sources (LD scripts or similar, depending on the toolchain in use) describe the placement of the program on the system/device. This gives rise to two common ways for placement that are used a lot that work without any non-standard extensions:
>
> * Define a variable in C/C++ in a dedicated section that a linker can move individually ('section' attribute in the compiler, and regular section placement in the linker).
> * Define a symbol in the linker at a certain place and used an extern declaration in C/C++. At this point you can either take the address of it (commonly used) or use it as a regular object (less common).
>
> I am very interested to hear what the weakness in these methods are, to understand the need of a 'location' attribute.

I like the idea of these fixed-location variables being defined as
actual global variables. The optimizer can actually reason about them
that way. The common alternative that I've seen is that programmers
don't generate variables at all, but rather, do something like this:

#define DEV_DATA (*((volatile unsigned long *)(0x2000A000)))

and the optimizer needs to make very pessimistic assumptions about the
aliasing, etc. in this case. However, in the end, do we actually want
symbols that the linker resolves? Or do we want the immediate address?
Would the latter be more efficient?

Having to define sections for each of these variables and then maintain
the location mappings in a linker script can be annoying -- on the other
hand, if you target multiple systems for which the addresses might be
different then having the locations in a separate file might be best anyway.

What I don't understand about this proposal is how general it is. How
much of what is specified in a linker script can be specified this way?
Do we really just want a way to embed linker-script fragments into an
object file?

I suspect that clang/llvm will be agnostic with respect to what can be
done in the linker. In effect the linker is given the instruction to
place a section at a particular address and it is up to the linker to
work out how to do that or error if it can't.

The majority of the cases I've seen this used for are memory mapped
peripheral registers that typically live way outside the normal memory
map covered by the linker script. These cases are not too difficult to
handle as the linker can generate its own fragment of linker script
(or equivalent) from the Input Section. The more difficult case is
where the location is in the middle of an existing OutputSection and
this can involve changes to the linker's layout to flow non-location
sections around it, this is a fertile source of corner case bugs. How
much or little of this to support might be best left to the linker.

Embedding linker script fragments is an interesting idea, and could
mean that any linker that supports GNU linker scripts could use the
feature. I think that there would be a number of challenges:
- Precedence of section selectors, i.e. how to stop an earlier linker
script pattern from matching the location, I guess a tempname style
section name might help, although wildcards might pick it up.
- The linker script fragment would need to not clash with an existing
OutputSection. I think that this could work for memory mapped
peripherals but it wouldn't for some of the other use cases that a
linker might want to support.
- Embedded ELF linkers may not support GNU Linker Script syntax.
Although custom targets could change the linker script format as they
see fit.

Will be interesting to hear what use cases Todd had in mind.

Peter

snidertm · May 3, 2019, 2:42pm

Our motivation for the "location" or "at" attribute is really as simple as allowing the user to avoid having to mess with a linker command file.

Working on an application for an embedded processor, they may have special hardware features on their board (I/O ports, peripheral register as Peter mentioned, etc) that they know must reside at a specific memory address.

The location attribute makes it easy for the user to express to the linker a constraint on the placement of an object without having to manage the placement themselves in the linker command file.

~ Todd

snidertm · May 3, 2019, 3:23pm

Peter,

Thanks for the response. The idea of using a section for symbol metadata with an associated relocation table is intriguing.

It would be a straightforward representation of the data and it has the benefit that downstream object consumers (especially editors) already know how to renumber symbol indices in relocation table entries.

What gives me pause is that this would go against the normal paradigm of what a relocation table is used for. In most, if not all, other cases a relocation entry is associated with a slice of data in a section and is used to patch that slice of data an link-time or dynamic load time. In this case, the relocation table becomes a simple container for symbol indices.

I am still weighing the encoding metadata in special absolute symbols approach vs. your suggestion. I understand that updating LLVM’s objcopy and strip to renumber the symbol indices embedded in the absolute symbols would be part of the implementation. I need to understand what effort will be involved in making that update before I decide on one approach or the other.

James,

I’d prefer to make the encoding mechanism extensible for other metadata. I think there are other use cases that make a shared encoding method sensible.

~ Todd

jyknight · May 3, 2019, 8:26pm

The need to place an extern symbol at a particular fixed address can already be done just by emitting an absolute symbol. This works today, no object-file modifications needed. The source-level attribute isn’t really necessary either, although having it does make things marginally nicer. (Without it, you can just emit “.globl a; a = 0x1000” assembly, either in module-level inline-asm, or a separate assembly file).

But the new functionality provided by this proposed extension is the allowance for placing initialized data at a fixed address. That seems like a rather strange requirement to me. You don’t need (and, generally can’t even reasonably HAVE) pre-initialized data for something like a memory-mapped peripheral register. Perhaps you could say why this would be a widely useful feature for the embedded processors you’re concerned about?

The one case I’m aware of where fixed-placement initialized data is useful is when setting the “fuses” on an embedded CPU. The fuses are probably not actually in accessible memory at all. But, from the point-of-view of the flash programming system if you write flash data to a particular address, it will write to the config fuses instead. Expressing the fuse configuration as initialized data in the code, rather than separate metadata, can be convenient. But, for that, an ELF extension isn’t needed – you only have one of those, and it’s specified by the platform, which can simply provide the required linker config.

jyknight · May 3, 2019, 9:35pm

It should result in an object file with a global absolute symbol. E.g. (here I’m building on x86-64 linux):

$ echo ‘.globl sym; sym = 0x600’ | as -o /tmp/x1.o
$ nm /tmp/x1.o
0000000000000600 A sym

Compiling a binary that uses this, for demonstration:
$ printf $‘extern int sym; int main() { sym = 5; }’ | clang -c -xc - -o /tmp/x2.o

$ clang -o /tmp/x /tmp/x1.o /tmp/x2.o

And, hey, let’s run it and see it crash…

$ gdb /tmp/x
…
(gdb) run

Starting program: /tmp/x

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400486 in main ()
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x600

(gdb) x/i $pc
=> 0x400486 <main+6>: movl $0x5,0x600

Yep, crashed writing to 0x600, the invalid address we expected.

snidertm · May 6, 2019, 1:10pm

James,

What you are doing below is tricking the compiler into believing that it is dealing with a real int object that has actual space allocated to it in x2.o, but sym is not defined as a real data object in x1.o

Thanks, but that doesn’t really address my use case. I still contend that associating a placement address with an actual data object (whether it be initialized or not) or function, where the symbol is defined in a section containing the definition of the data object or function, is a useful feature for customers.

~ Todd

jyknight · May 6, 2019, 5:38pm

I don’t think it’s a “trick” at all – it’s just a definition of the symbol at an absolute address. That’s what absolute symbols are for. (You can also use “.size sym, 4” “.type sym, object”, if you want to let the linker know that the symbol refers to 4 bytes of data. I’m not sure if that’s part of your concern about it not being real?)

For the use-case of accessing memory-mapped peripheral registers, this functionality seems already sufficient. Do you disagree?

However, if you have a requirement to place initialized data at a fixed address, then this pre-existing functionality does not address that requirement. But, I’m not sure what use-cases you’re thinking of where this is a requirement. Can you talk about what you have in mind?

Rui_Ueyama · May 7, 2019, 6:42am

I have the same question as James has. It seems to me that you can name any address using an absolute symbol, and that should suffice to handle memory-mapped peripherals and such. If you really need to define data (whether it’s in .data or .bss) or a function at a fixed memory address, that’s not something you can do with absolute symbols (but you can do with linker scripts), but is this what you really want?

snidertm · May 8, 2019, 6:53pm

James, Rui,

If we are only talking about addressable hardware registers, peripherals, etc., then the absolute address symbol is one way to facilitate access to a symbol associated with a specific address.

And yes, I would agree that data or code can be placed at a specific address using linker scripts, but it is not the most user-friendly of solutions.

Consider a simple example, say ex1.c contains:

int xyz attribute((section(“.bss:xyz”))) = 10;

The compiler will generate a definition of xyz into the section “.bss:xyz”, in the linker script something like this can be added to dictate the placement:

.special_bss: { ex1.o(.bss:xyz); } > 0x1000

This is straightforward, but now there is a coupling between the application’s source code and the linker script.

I think the location attribute gives a developer a cleaner, more concise means of expressing a placement constraint on a piece of code or data.

Even using the location attribute on the above example shows this,

int xyz attribute((location(0x1000))) = 10;

In addition to the definition of xyz in its own section, the compiler will emit metadata that the linker understands as a specific placement instruction for xyz’s section. No edit of a linker script is needed.

The use cases where I’ve seen the location attribute be particularly helpful are instances where code in ROM or a boot loader needs to access code or data at a particular address. For example, a boot routine in ROM may have security requirements for code and data that is loaded into FLASH memory and may have hardcoded addresses that it accesses to perform the security check. The code that is to be loaded into that FLASH memory can use location attributes to reserve space for the data objects at the specific addresses that the boot routine needs to access. In this instance using location attributes helps to reduce the maintenance that a developer may otherwise have to do with linker scripts.

I mentioned earlier in this thread that a motivation for the location attribute is to allow the user to avoid messing with a linker command file or script. While the location attribute does not provide new functionality, I am arguing that the location attribute provides enough value in terms of usability improvements vs. existing methods to justify adding support for it.

~ Todd

Rui_Ueyama · May 9, 2019, 12:02pm

James, Rui,

If we are only talking about addressable hardware registers, peripherals, etc., then the absolute address symbol is one way to facilitate access to a symbol associated with a specific address.

And yes, I would agree that data or code can be placed at a specific address using linker scripts, but it is not the most user-friendly of solutions.

Consider a simple example, say ex1.c contains:

int xyz attribute((section(“.bss:xyz”))) = 10;

The compiler will generate a definition of xyz into the section “.bss:xyz”, in the linker script something like this can be added to dictate the placement:

.special_bss: { ex1.o(.bss:xyz); } > 0x1000

This is straightforward, but now there is a coupling between the application’s source code and the linker script.

I think the location attribute gives a developer a cleaner, more concise means of expressing a placement constraint on a piece of code or data.

Even using the location attribute on the above example shows this,

int xyz attribute((location(0x1000))) = 10;

In addition to the definition of xyz in its own section, the compiler will emit metadata that the linker understands as a specific placement instruction for xyz’s section. No edit of a linker script is needed.

The use cases where I’ve seen the location attribute be particularly helpful are instances where code in ROM or a boot loader needs to access code or data at a particular address. For example, a boot routine in ROM may have security requirements for code and data that is loaded into FLASH memory and may have hardcoded addresses that it accesses to perform the security check. The code that is to be loaded into that FLASH memory can use location attributes to reserve space for the data objects at the specific addresses that the boot routine needs to access. In this instance using location attributes helps to reduce the maintenance that a developer may otherwise have to do with linker scripts.

In the above scenario I believe you will end up having to write a linker script anyway. If you write a program that reside in a flash memory at a specific location, I think not only some specific data but the entire program needs to be instructed how to lay it out. There might be a scenario that you don’t care about how other parts of your program are located in memory, but what if the address of the flash memory collides with the default layout? What is the expected behavior?

There are many tricky scenarios that I do not know what is the expected behavior:

If a user attempt to locate a function at 0x1000, data at 0x2000, another function at 0x3000, and another data at 0x4000. Should we create four segments for each function and data?
What if a specified location collides with other data’s specified location?
What if a specified location collides with the default layout?
What if a user attempts to put data and function to the same page?

I think if you can just say “place this piece of data at address 0xXXXX” and everything automagically works, it’s great, but putting some piece of data at a specific location have global effect how other pieces of data and functions are laid out, so it looks like that kind of directive underspecifies what we actually want.

Peter_Smith · May 9, 2019, 1:24pm

From: Snider, Todd <t-snider@ti.com>
Date: Thu, May 9, 2019 at 3:53 AM
To: Rui Ueyama, James Y Knight
Cc: llvm-dev

James, Rui,

If we are only talking about addressable hardware registers, peripherals, etc., then the absolute address symbol is one way to facilitate access to a symbol associated with a specific address.

And yes, I would agree that data or code can be placed at a specific address using linker scripts, but it is not the most user-friendly of solutions.

Consider a simple example, say ex1.c contains:

int xyz __attribute__((section(“.bss:xyz”))) = 10;

The compiler will generate a definition of xyz into the section “.bss:xyz”, in the linker script something like this can be added to dictate the placement:

.special_bss: { ex1.o(.bss:xyz); } > 0x1000

This is straightforward, but now there is a coupling between the application’s source code and the linker script.

I think the location attribute gives a developer a cleaner, more concise means of expressing a placement constraint on a piece of code or data.

Even using the location attribute on the above example shows this,

int xyz __attribute__((location(0x1000))) = 10;

In addition to the definition of xyz in its own section, the compiler will emit metadata that the linker understands as a specific placement instruction for xyz’s section. No edit of a linker script is needed.

The use cases where I’ve seen the location attribute be particularly helpful are instances where code in ROM or a boot loader needs to access code or data at a particular address. For example, a boot routine in ROM may have security requirements for code and data that is loaded into FLASH memory and may have hardcoded addresses that it accesses to perform the security check. The code that is to be loaded into that FLASH memory can use location attributes to reserve space for the data objects at the specific addresses that the boot routine needs to access. In this instance using location attributes helps to reduce the maintenance that a developer may otherwise have to do with linker scripts.

In the above scenario I believe you will end up having to write a linker script anyway. If you write a program that reside in a flash memory at a specific location, I think not only some specific data but the entire program needs to be instructed how to lay it out. There might be a scenario that you don't care about how other parts of your program are located in memory, but what if the address of the flash memory collides with the default layout? What is the expected behavior?

There are many tricky scenarios that I do not know what is the expected behavior:

- If a user attempt to locate a function at 0x1000, data at 0x2000, another function at 0x3000, and another data at 0x4000. Should we create four segments for each function and data?
- What if a specified location collides with other data's specified location?
- What if a specified location collides with the default layout?
- What if a user attempts to put data and function to the same page?

I think if you can just say "place this piece of data at address 0xXXXX" and everything automagically works, it's great, but putting some piece of data at a specific location have global effect how other pieces of data and functions are laid out, so it looks like that kind of directive underspecifies what we actually want.

I think you are right to point out that the implementation in the
linker, particularly when the desired address overlaps with sections
covered by the existing layout, is going to have a lot of edge-cases
to deal with. I think that this attribute mostly, possibly only, makes
sense in an embedded (non paged) environment with physical addresses.

At the Clang/LLVM level the underspecification may be a strength.
Given that many embedded toolchains that use a compiler based on clang
often have a proprietary linker, that may not use ld.bfd style linker
scripts or may have extensions to them (For example
http://software-dl.ti.com/ccs/esd/documents/sdto_cgt_Linker-Command-File-Primer.html),
it will be difficult to come up with a linker portable description of
how it will interact with the linker script (or equivalent).

I think that this attribute could work if it became up to the
linker/toolchain to define its own rules for how it interprets the
metadata. For example, some possible design decisions:
- No support at all, error or ignore on encountering the metadata.
- Non overlapping only, for example memory mapped peripherals and
error if there is a clash with any other section.
- Explicit placement only for overlapping, you have to name the output
section in the linker script, the linker will attempt to lay out the
input sections within that output section to honour the location
request.
- Automatic placement into the most appropriate output section and
error if you can't honour the location request.

snidertm · May 9, 2019, 1:26pm

Ø In the above scenario I believe you will end up having to write a linker script anyway.

That’s right, there will be a base linker script to describe the memory layout of the target architecture and dictate generally where code and data sections go.

What the location attribute helps you to avoid is having to edit the linker script account for a specific data object that requires specific placement that may occur in the middle of other data sections.

The linker would be responsible for entertaining placement requests from input metadata as well as instructions from the linker command file. Typically, a linker will try to honor more constrained placement requests before more general ones (e.g. a specific address placement request would be handled before a request to place a section in a region of memory). The linker should be able to arbitrate the scenarios that you call out below, generating an error diagnostic when placement instructions conflict or cannot be honored.

Ø If a user attempt to locate a function at 0x1000, data at 0x2000, another function at 0x3000, and another data at 0x4000. Should we create four segments for each function and data?

Yes, the linker would take into account all four specific placement requests, try to honor them, then place other code and data according to linker script guidelines around them.

I am not advocating that the use of a location attribute would eliminate the need for a linker command file, but it does help the user to express specific placement for specific pieces of code or data when they need to. If a user doesn’t require a piece of data or code to exist at a specific address, then they ought not attach a location attribute to it.

~ Todd

Topic		Replies	Views
RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler LLVM Dev List Archives	0	88	April 30, 2019
[RFC] AArch64 Build Attributes for ELF relocatable objects LLVM Project rfc , arm64	0	296	November 22, 2023
RFC: Support for Memory Regions in ELF LLVM Project rfc , clang , llvm	3	388	May 8, 2024
elf direct object emission LLVM Dev List Archives	1	90	November 30, 2011
[llvm] r188726 - Adding PIC support for ELF on x86_64 platforms LLVM Dev List Archives	3	81	February 2, 2015

RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler

Related Topics