grouping global variables by alignment: safe to do at LLVM level, or only at Clang level?

Dear all,

Howdy! :wink: This is Abe of the Samsung Austin R&D Center`s compilers team.

As an early part of a project that should help performance on some CPUs, I would like to cause global variables [at least from C and C++ programs] to be grouped together according to their alignment needs, which should help to slightly reduce RAM requirements in some cases. IMO this should be done at "-O3" and higher levels of optimization [maybe "-O2" and higher?] and at "-Oz" [and maybe also at "-Os"]. This change is likely to break some poorly-written programs which rely on undefined behavior by assuming that consecutively-{declared/defined} global variables are allocated consecutively in RAM, so I propose to allow those programs to go on compiling to code that works as the author expected when there is no strong optimization imperative for either RAM savings or performance.

My question this time is this: is re-ordering globals at the LLVM level both possible [without massive rewriting of LLVM] and safe? I don`t know whether or not it`s safe because perhaps some languages -- unlike C and C++ -- _explicitly_ allow programmers to make the assumption that consecutively-written globals at the source-code level map to consecutively-allocated globals at run-time. If so, and if any such language is using LLVM as a back-end, then changing that behavior at the LLVM level with _any_ optimization flag may break programs that _are_ well-formed according to the rules of their respective source language[s].

Regards,

Abe

Hi Abe,

I don’t see why you couldn’t make a frontend query in the pass-manager to selectively enable such globals reordering/aligning pass. If this is a discouraged practice noted in docs somewhere and LLVM’s goal is to be truly frontend/platform agnostic, please do correct me. AFAIK, Pass Manager selects some optimizations based on Optimization Levels (-O2,-O3,etc), so extending this shouldn’t be a challenge.

Regards,

Kevin

Intel WOS

Dear Kevin,

Thanks for your quick and thoughtful reply.

Please tell me whether or not I have understood you correctly: I think what you are saying is that it might be possible to do what I`m talking about at the LLVM level, have it turned off by default so it`s safe, have Clang turn it on when the optimization level says to do so [since AFAIK all Clang-supported languages allow compilers to re-order globals], and let other front ends turn it on if their respective language specifications also allow compilers to re-order globals.

If that`s a correct understanding, then I think that`s a great idea. Thanks either way, i.e. even if I _have_ misunderstood. :wink:

Regards,

Abe

Yes.

It would be nice if there was a status matrix to track what optimizations aren’t enabled by default for other frontends that could possibly benefit from them.

I do not believe that we guarantee any order of globals. Unless someone else points out a case where we do, we should not intro extra complexity to provide a choice here.

You may want to take a look an unnamed_addr flag on globals. It may be relevant here. (*may*)

Philip

First, I think it's completely safe as far as LLVM is concerned (and
have no sympathy for any C or C++ developers who might claim to rely
on the order either). In fact it looks like GCC and Clang already do
it differently so the risk is pretty tiny too.

For LTO it's probably fairly simple to do (sort by alignment during MC
emission would be my first stab). In the general case, it's really an
issue for the linker to resolve though (as the only program that
actually gets to see all globals). Probably with the help of
"-fdata-sections" on ELF targets. But it doesn't look like either of
the GNU linkers do support it.

Cheers.

Tim.

Dear all,

Howdy! :wink: This is Abe of the Samsung Austin R&D Center`s compilers team.

As an early part of a project that should help performance on some CPUs, I would like to cause global variables [at least from C and C++ programs] to be grouped together according to their alignment needs, which should help to slightly reduce RAM requirements in some cases. IMO this should be done at "-O3" and higher levels of optimization [maybe "-O2" and higher?] and at "-Oz" [and maybe also at "-Os"]. This change is likely to break some poorly-written programs which rely on undefined behavior by assuming that consecutively-{declared/defined} global variables are allocated consecutively in RAM, so I propose to allow those programs to go on compiling to code that works as the author expected when there is no strong optimization imperative for either RAM savings or performance.

My question this time is this: is re-ordering globals at the LLVM level both possible

Isn’t something a linker could/should do?

[without massive rewriting of LLVM] and safe? I don`t know whether or not it`s safe because perhaps some languages -- unlike C and C++ -- _explicitly_ allow programmers to make the assumption that consecutively-written globals at the source-code level map to consecutively-allocated globals at run-time. If so, and if any such language is using LLVM as a back-end, then changing that behavior at the LLVM level with _any_ optimization flag may break programs that _are_ well-formed according to the rules of their respective source language[s].

Such programming language should not use LLVM global variables if they need to provide such guarantee, but probably use instead something more like a global array when lowering to LLVM IR.

My question this time is this: is re-ordering globals at the LLVM level both possible [...]

Isn’t something a linker could/should do?

AFAIK & TTBOMK, a compiler should do it [if allowed by the language] and a linker should not mess it up. I don`t know that a linker is allowed to make assumptions about code not traversing labels in its data accesses. After all, a compiler may have assigned labels to e.g. the start of a "struct" instance and to a specific part of that same struct. That doesn`t mean the linker is free to move the middle of the struct somewhere else far from the start of the struct.

IOW, a linker is required to handle the code no matter what the source language was and what its specification says about implementations being allowed to re-order things and whether or not the language allows the programmer to _validly_ assume that when two arrays of identical element type are created back-to-back, the second array immediately follows the first, thus effectively creating a single array with its element count being the sum of the two element counts of the as-written arrays.

In ELF land, it's not, which is why you pass -fdata-sections. That
option makes Clang put every global into its own section, and the
linker *is* allowed to assume accesses can't cross section boundaries.

In MachO land, the linker is allowed to make that assumption if the
object file has been marked with a .subsections_via_symbols directive
(which it is by default on all Darwin targets).

It really should be the linker's job in the general case, both for
optimal code and to avoid duplicating effort.

Tim.

My question this time is this: is re-ordering globals at the LLVM level both possible [...]

Isn’t something a linker could/should do?

AFAIK & TTBOMK, a compiler should do it [if allowed by the language]

Compiler only see one translation unit at a time, the linker sees multiple, it is not clear to me why you are thinking the compiler should handle it?

and a linker should not mess it up. I don`t know that a linker is allowed to make assumptions about code not traversing labels in its data accesses. After all, a compiler may have assigned labels to e.g. the start of a "struct" instance and to a specific part of that same struct. That doesn`t mean the linker is free to move the middle of the struct somewhere else far from the start of the struct.

IOW, a linker is required to handle the code no matter what the source language was and what its specification says about implementations being allowed to re-order things and whether or not the language allows the programmer to _validly_ assume that when two arrays of identical element type are created back-to-back, the second array immediately follows the first, thus effectively creating a single array with its element count being the sum of the two element counts of the as-written arrays.

Such programming language should not use LLVM global variables if they need to provide such guarantee,
but probably use instead something more like a global array when lowering to LLVM IR.

Agreed, but I can`t [en]force that. That would need to be decided by the LLVM steering committee or something like that, then documented publicly.

I’m saying that it is *already* the current state of affairs, AFAIK we don’t provide any guarantee about what you describes.

We have a problem where some globals are lowered to an immediate address in the DAG. It would be nice if we had some fixed order for the globals before then. Right now the total allocated size at the end depends on the use order, block by block, inverse within the block

-Matt

I’m saying that it is *already* the current state of affairs, AFAIK we don’t provide any guarantee about what you describes.

We already do some quite invasive things to real-world globals that
would violate such assumptions. We increase their alignment beyond ABI
requirements if we think it's helpful; the GlobalMerge pass sorts
globals before shoving them all under one symbol (in increasing order
size for some reason that escapes me right now).

Tim.

My apologies, but I don`t think I really understood that statement. Could anybody please expand/explain?

Regards,

Abe

The first allocated global has address 0. If a 16 byte object is allocated, the next object’s address will be the constant 16

-Matt

Arriving late to this thread, but to add what others have said: reordering or sorting data items based on alignment is straightforward to do in a linker plugin. An example can be found in this test (part of gold linker test suite):

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gold/testsuite/plugin_layout_with_alignment.sh;h=c5f07aec287e87de955dd924efefabd56e8d6213;hb=HEAD

The source for the plugin is at

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gold/testsuite/plugin_section_alignment.cc;h=6f64bdc6e8687bc5cfb61109503844d9608ee757;hb=HEAD

Than