A question on register allocation

My use case is to store a global value permanently in a register (say
xmm15) using LLVM

The following piece of code works as expected in GCC while it is not
supported in LLVM. It complains with "fatal error: error in backend:
Invalid register name global variable".

Can any expert here provide me a solution to this in Clang/LLVM?

Code:
#include <stdio.h>
volatile register int A asm ("xmm15");
int main() {
    A = 100;
    return A;
}

I don’t think clang supports global register variables in general although ‘sp’ appears to work from clang-3.7 onwards. However it doesn’t appear to understand ‘r13’ on arm which is equivalent.

You can always do something like

asm (“movq %0, %%xmm15”::“x”(100):“xmm15”);

although it’s not strictly equivalent.

int main() {

asm (“movq %0, %%xmm15”::“x”(100):“xmm15”);

return 0;

}

will generate

movl $0, -4(%rbp)

movss .LCPI0_0(%rip), %xmm0

#APP

movq %xmm0, %xmm15

#NO_APP

xorl %eax, %eax

popq %rbp

retq

So you could just have that as a replacement everytime ‘A’ is written but is not the same I appreciate that. The above also lacks context of the actual application of what is that you’re trying to do so it’s possible better (and faster) solutions exist!

I think you have got my question wrong (Or perhaps my writing was bad).

I want that register (say xmm15) to hold value 100 throughout the
execution of program.
And LLVM should lock down xmm15 for me and should always allocate some
other register for other functions.

The solution provided applies only to context of main() and is not
generic if my source code grows bigger.

Clang doesn’t support global register variables like GCC afaik so it needs to be emulated if really needed by forcing the register to the intended value.

In general it can be clobbered by whatever else but such is the way without backend support.

But someone more knowledgeable about Clang’s backend may provide a better insight.

OK. I am also willing to change the source code (clang and/or LLVM) to
get the desired behavior.
So if some expert can point me to right areas in source code, it would
be great!

Your problem here is that you also need to “remove” your chosen register from the list of registers the compiler may use for other purposes. As far as I understand, there’s no infrastructure to “remove this register ALWAYS”, so you’ll most likely need to do a fair bit of changes - I’m not familiar with this particular bit of code, but it’s obviously an ugly/difficult enough solution that for the “make Linux compile with Clang” was solved by “remove use of global registers”, except for “stack pointer” (Stack pointer is OK, since compiler will not use that register for anything other than “as stack pointer”).

You can make the register reserved (requires changing the LLVM backend) and then you can use the named global register support added in http://reviews.llvm.org/D3261.

Jacques

Not really. That change was meant *only* for the stack pointers and
will not be accepted upstream for anything else. Don't waste your
time. :slight_smile:

As an alternative, you can use *local* named registers when using
inline ASM. Something like:

int foo() {
  register int A asm("xmm15");
  asm("mov %0, %0" : "=A" : "A"); // whatever the syntax, I can't remmeber
  return A;
}

The asm block is a trick to tell Clang to honour the register
allocation. AFAICR, without it, Clang will accept it but will not
honour it. You may be lucky.

The reason to all these restrictions is that named registers place the
allocator in FUBAR mode, and we'd rather have it working well before
we start breaking things. It's very likely that this will never be
accepted, though.

cheers,
--renato

I don’t think it’d actually be terribly difficult: every target already has a list of target-specific reserved registers that are setup by TargetRegisterInfo::getReservedRegs(const MachineFunction &). Merging that builtin set with an set of “extra” reserved registers attached to a MachineFunction seems like it’d be easy enough.

That seems sufficient to remove the register from normal register allocation, but of course if the target hard-codes the use of a register, or some asm clobbers the register, or whatever, this wouldn’t prevent it from being clobbered. That caveat applies to GCC’s feature as well (see docs).

Of note for the use-case given here: there are no callee-saved xmm registers in the sysV ABI. So, unless you re-compile everything linked into your binary with the reserved register (including libc, dynamic linker, compiler runtime support library, etc.) and validate that no asm code in any of those clobbers it either, you can’t be guaranteed that the value will always be preserved – even if clang has this feature.

James

I don't think it'd actually be terribly difficult: every target already has
a list of target-specific reserved registers that are setup by
TargetRegisterInfo::getReservedRegs(const MachineFunction &). Merging that
builtin set with an set of "extra" reserved registers attached to a
MachineFunction seems like it'd be easy enough.

It's not about the implementation, but the side effects. There has
been enough discussions on this topic already as a quick search on the
mailing list will show. One may be able to persuade people to accept
it, but this argument above is the first that everyone uses when the
threads start (I have used that, too, myself), and it never works.

So, unless someone comes with a rock solid investigation on all the
side effects on the multiple register allocators and the side effects
on code generation and low-level optimizations for all affected
targets, people will look away. :slight_smile:

Of note for the use-case given here: there are no callee-saved xmm registers
in the sysV ABI. So, unless you re-compile *everything* linked into your
binary with the reserved register (including libc, dynamic linker, compiler
runtime support library, etc.) and validate that no asm code in any of those
clobbers it either, you can't be guaranteed that the value will always be
preserved -- even if clang has this feature.

Precisely. :slight_smile:

To be clear: I don’t think this feature actually makes sense for any normal userland code. The list of caveats is so large that probably the only place it might make sense at all is in a kernel. And since apparently the linux kernel decided not to use it, it’s probably not worth implementing, anyways.

That said, I still don’t actually think there’s any real complication here, even after briefly scanning a couple recent threads. A user-reserved register isn’t fundamentally any different than any other target-reserved register. Said list is already computed separately per-MachineFunction. This has basically nothing to do with the multiple varied register allocators, and requires no additional functionality in them. Obviously, reserving registers will make the generated code worse due to fewer registers being available, and if you reserve too many registers, you’ll have a Bad Time. Don’t do that.

I really think it’d be very nearly trivial to implement something which is at the same level of usefulness as GCC’s version of this. It’s just that that level of usefulness is pretty low.

To be clear: I don't think this feature actually makes sense for any normal
userland code. The list of caveats is so large that probably the only place
it might make sense at all is in a kernel.

Unfortunately, unlike some car manufacturers, we can't selectively
enable / disable features based on what's using the compiler. :slight_smile:

And since apparently the linux
kernel decided not to use it, it's probably not worth implementing, anyways.

It isn't. No one else uses that, either.

cheers,
--renato

Just for the record, I'm seeing another use case beyond operating systems: Virtual machines, runtimes for languages that do not put much value on an easy C interface.

That's still a pretty short list of use cases.
It could be pretty big for each use case. I guess operating system coders tend to have a specialist for each architecture anyway, and can do better than a compiler could; I'm not so sure for VM and runtime coders.

OT3H it may be that to make good use of a register, you need to know so much about the architecture that you always can do better than the compiler anyway. Is that a reasonable assumption?

I’m not sure it is a reasonable assumption. There’s often some benefit in ensuring that a particular register holds a known value in some custom language code. I believe that either the Haskell of Erlang calling conventions have this property - some registers are designated as having special language semantics and must be both preserved across calls and not clobbered by a call. The FFI code is responsible for ensuring that these register values are saved across calls to foreign code and set on calls from foreign code.

Even in C, it’s useful to be able to reserve these sometimes. For example, being able to reserve a register for a stack canary (and have the run-time linker set up interworking stubs that resets it from a global when calling out to code that doesn’t support this). The SafeStack work would also have benefitted from being able to reserve a register to hold the location of the other stack.

I’m not convinced that the C builtins are particularly useful, but there are a number of cases where the LLVM IR intrinsics would be.

Just because you want to reserve one register for some frequently-accessed global state doesn’t mean that you don’t want to use all of the rest of the LLVM infrastructure.

David

Just for the record, I'm seeing another use case beyond operating systems:
Virtual machines, runtimes for languages that do not put much value on an
easy C interface.

IIRC, VMs use the "platform register", which is a specially reserved
register allowed by the ABIs of different targets, in certain cases.

It could be pretty big for each use case. I guess operating system coders
tend to have a specialist for each architecture anyway, and can do better
than a compiler could; I'm not so sure for VM and runtime coders.

That excuse died a long time ago. When people tell you they can do
better than the compiler, it's normally for one very special corner
case, not for the whole program. Global named register affect the
whole program and can upset the compiler in unpredictable ways. My
reaction when people say "it should be simple" about register
allocation is the same as when people say "quantum mechanics is
easy"...

There are already three well supported ways to be better than the
compiler at a local level: assembly files, inline assembly, intrinsics
in C code. None of them require special registers to be globally
allocated. All of them well supported in LLVM and GNU toolchains.

As David said, the only *real* use for global named registers is for
registers that have a specific meaning throughout the program: the
stack pointer being the only one that has a useful meaning. You don't
want to be changing the PC that easily, but if you really do, doing so
in inline asm is perfectly valid.

Specific platforms have flags to reserve specific registers. For
instance, on ARM you can use -ffixed-r9, then you can use them in
local named registers and be sure that you're the only one updating
it. However, if the platform you're running requires it (I think
Darwin does), then your code isn't portable, as expected.

Assuming you can write more optimal code than the compiler for
specific functionality is ok. Assuming you can change how objects are
built and make it work regardless the platform or environment, is not.

cheers,
--renato

As David said, the only *real* use for global named registers is for
registers that have a specific meaning throughout the program: the
stack pointer being the only one that has a useful meaning. You don't
want to be changing the PC that easily, but if you really do, doing so
in inline asm is perfectly valid.

It doesn't change your point but just for completeness, some targets do have other useful global registers. For example, Mips uses the otherwise unused 'global pointer' as a cheap global variable. IIRC, it's a pointer to the thread context.

Ah, yes, the thread pointer. I think some systems use R9 on ARM for that, too.

This is simplifying things too much. The classic use case for a global
named register is to hold an extremely hot thread local variable. This
is primarily from a time when (a) thread local variable didn't exist and
(b) the overhead typically associated with them on some platforms is too
high. Consider older ARM or MIPS system, where any access to the thread
register would involve a trap. Things like Lisp engines often have a
global context pointer which is easily hot enough to justify burning a
register on it.

Joerg

Hi Joerg,

Daniel also pointed out this usage. :slight_smile:

My point was: it’s a very limited usage.

AFAIK, the only part of the kernel that had it was arch/arm, and that was removed because of Clang back then.

Embedded or RTOS usage is different. But the tools are usually different, too.

Cheers,
Renato