Named register variables GNU-style

Folks,

I just had a discussion about __builtin_stack_pointer in the GCC list,
and there were a number of arguments against it, and it got me
thinking I didn't have strong arguments against GNU's named register
extension. Does anyone remember the arguments for not implementing
that extension?

My view is that making it an intrinsic (say @llvm.register(name))
would have the exact same semantics as __builtin_<register_name> has,
in that it'll be carried down all the way to SelectionDAG and be just
a CopyFromReg from that reg's name.

The fact that they remain as intrinsics will guarantee that they will
last until SelectionDag and not commoned up or heavily modified. I'm
not sure how to make Clang do that, but it shouldn't be too hard to
short-circuit the asm handler if we're dealing with a
declaration/instantiation and "register" is a specifier of the type.

The arguments supporting the builtins is that, in case of stack
pointer, it's not target specific, thus avoiding ifdefs. The
counter-argument is that most usage of the named register extension is
already target specific (together with everything around it), so that
extra value is very limited. Also, since kernel and library code
(heavy users of named registers) will have to support old compilers,
this will *have* to be ifdefd anyway.

The arguments against builtins are that named register is more
generic, is already in use for more than the stack pointer and is
reasonably straightforward to both understand and implement.

Both builtins and named registers don't give you the guarantees that
you would like from such a high-level construct, and users are already
aware that this is the case, so we don't have to worry that much about
it.

Also, reading back some comments, it seems that the biggest concern
was that inline asm wasn't really a first-class citizen in the LLVM
back-end, but I think that has changes with M, right?

My questions are:

1. Were the initial concerns dealt with by the introduction of MC?

2. Is there any remaining argument against named registers that
stronger than the ones supporting it?

3. Is my draft for implementing named registers acceptable?

cheers,
--renato

In clang or llvm? I think we can implement it in clang by lowering
them with a similar trick that we do for local ones.

For local register variables, clang just keeps a note that it has to
add a constraint when it creates an inline assembly.

For global ones, it should also codegen every non inline asm to use an
llvm intrinsic (llvm.read_register/llvm.write_register for example).

This is not exactly the semantics gcc uses since the register would
still be allocatable, but should cover 99% of the uses, including
reading the stack pointer in the kernel.

I don't think we should implement this directly in LLVM, since it
introduces the really odd notion that reading of an value is
observable. For example, is it legal to move the read of rsp out of a
loop? By using an intrinsic at the llvm level we trivially represent
and preserve all the reads and writes from the source program.

Cheers,
Rafael

For global ones, it should also codegen every non inline asm to use an
llvm intrinsic (llvm.read_register/llvm.write_register for example).

That's my idea, yes. I'm not sure how Clang would transform the named
registers into the intrinsic, but something along the lines of:

i8* @SP = "SP";

define void @step() nounwind {
entry:
  %0 = call i32 @llvm.read_register(i8* @SP)
  %1 = add i32 %0, i32 4
  call void @llvm.write_register(i8* @SP, %1)
}

declare void @llvm.write_register(i8*, i32) nounwind readnone
declare i32 @llvm.read_register(i8*) nounwind readnone

This is not exactly the semantics gcc uses since the register would
still be allocatable, but should cover 99% of the uses, including
reading the stack pointer in the kernel.

It seems that the semantics is to avoid PCS registers, or they will be
clobbered...

Nevertheless, we can reserve the register on demand, as we already do
with R9, for instance.

For example, is it legal to move the read of rsp out of a
loop?

No. It should be a volatile read/write.

By using an intrinsic at the llvm level we trivially represent
and preserve all the reads and writes from the source program.

Exactly!

cheers,
--renato

That's my idea, yes. I'm not sure how Clang would transform the named
registers into the intrinsic, but something along the lines of:

i8* @SP = "SP";

define void @step() nounwind {
entry:
  %0 = call i32 @llvm.read_register(i8* @SP)
  %1 = add i32 %0, i32 4
  call void @llvm.write_register(i8* @SP, %1)
}

declare void @llvm.write_register(i8*, i32) nounwind readnone
declare i32 @llvm.read_register(i8*) nounwind readnone

I would not produce any llvm global for it. So some insanity like

register long a asm("rsp");
long f(long x) {
  long ret = a;
  a = x;
  return ret;
}

would compile to

define i64 @f(i64 %x) {
  %ret = call i64 @llvm.read_register("rsp");
  call void @llvm.write_register("rsp", i64 %x)
  ret %ret
}
declare void @llvm.write_register(i8*, i64)
declare i64 @llvm.read_register(i8*)

This is not exactly the semantics gcc uses since the register would
still be allocatable, but should cover 99% of the uses, including
reading the stack pointer in the kernel.

Global Reg Vars (Using the GNU Compiler Collection (GCC))

It seems that the semantics is to avoid PCS registers, or they will be
clobbered...

Yes, it is really odd. It says "Global register variables reserve
registers throughout the program.", which is obviously not the case
since not all compile units might see it.

For example, is it legal to move the read of rsp out of a
loop?

No. It should be a volatile read/write.

Agreed. With the intrinsic the semantics are easy to represent.

Cheers,
Rafael

From: "Renato Golin" <renato.golin@linaro.org>
To: "LLVM Dev" <llvmdev@cs.uiuc.edu>, "Clang Dev" <cfe-dev@cs.uiuc.edu>
Sent: Thursday, March 27, 2014 8:55:47 AM
Subject: [LLVMdev] Named register variables GNU-style

Folks,

I just had a discussion about __builtin_stack_pointer in the GCC
list,
and there were a number of arguments against it,

Can you summarize?

and it got me
thinking I didn't have strong arguments against GNU's named register
extension. Does anyone remember the arguments for not implementing
that extension?

My view is that making it an intrinsic (say @llvm.register(name))
would have the exact same semantics as __builtin_<register_name> has,
in that it'll be carried down all the way to SelectionDAG and be just
a CopyFromReg from that reg's name.

I think this also would be a nice feature to have, and fairly straightforward to implement.

That having been said, are there not cases where only the backend knows in what register the stack pointer is held? A sophisticated backend might even spill the stack pointer during some portions of the function to create a range in which it was allocatable, and I certainly would not want to preclude such an implementation.

-Hal

From: "Rafael Espíndola" <rafael.espindola@gmail.com>
To: "Renato Golin" <renato.golin@linaro.org>
Cc: "Clang Dev" <cfe-dev@cs.uiuc.edu>, "LLVM Dev" <llvmdev@cs.uiuc.edu>
Sent: Thursday, March 27, 2014 11:30:46 AM
Subject: Re: [cfe-dev] [LLVMdev] Named register variables GNU-style

> That's my idea, yes. I'm not sure how Clang would transform the
> named
> registers into the intrinsic, but something along the lines of:
>
> i8* @SP = "SP";
>
> define void @step() nounwind {
> entry:
> %0 = call i32 @llvm.read_register(i8* @SP)
> %1 = add i32 %0, i32 4
> call void @llvm.write_register(i8* @SP, %1)
> }
>
> declare void @llvm.write_register(i8*, i32) nounwind readnone
> declare i32 @llvm.read_register(i8*) nounwind readnone

I would not produce any llvm global for it. So some insanity like

register long a asm("rsp");
long f(long x) {
  long ret = a;
  a = x;
  return ret;
}

would compile to

define i64 @f(i64 %x) {
  %ret = call i64 @llvm.read_register("rsp");
  call void @llvm.write_register("rsp", i64 %x)
  ret %ret
}
declare void @llvm.write_register(i8*, i64)
declare i64 @llvm.read_register(i8*)

+1

-Hal

That’s my idea, yes. I’m not sure how Clang would transform the named
registers into the intrinsic, but something along the lines of:

i8* @SP = “SP”;

define void @step() nounwind {
entry:
%0 = call i32 @llvm.read_register(i8* @SP)
%1 = add i32 %0, i32 4
call void @llvm.write_register(i8* @SP, %1)
}

declare void @llvm.write_register(i8*, i32) nounwind readnone
declare i32 @llvm.read_register(i8*) nounwind readnone

I would not produce any llvm global for it. So some insanity like

register long a asm(“rsp”);
long f(long x) {
long ret = a;
a = x;
return ret;
}

would compile to

define i64 @f(i64 %x) {
%ret = call i64 @llvm.read_register(“rsp”);
call void @llvm.write_register(“rsp”, i64 %x)
ret %ret
}
declare void @llvm.write_register(i8*, i64)
declare i64 @llvm.read_register(i8*)

That was actually my first idea, but I got confused on the implementation. :slight_smile:

I’ll try it again.

Cheers,
Renato

I would not produce any llvm global for it. So some insanity like

  %ret = call i64 @llvm.read_register("rsp");

+1

Aren't you going to need some kind of "private unnamed_addr" thing
just to make syntactically valid IR?

Tim.

Aren't you going to need some kind of "private unnamed_addr" thing
just to make syntactically valid IR?

I guess the best would probably be a metadata string, that way we are
sure we don't actually output anything.

Cheers,
Rafael

Is there any sane reason to actually implement it?

Are there any cases when inline asm would work well enough? We have the __builtin_stack_pointer now, which is somewhat questionable[1], however this should not serve a precedent to implement further "extensions" like this. Argument against it..? "Why?"

-Krzysztof

[1] As someone has pointed out, the stack pointer builtin only makes sense on targets that actually have a "stack pointer". Not all architectures do, and even on those that do, the register could be used for other purposes in some cases. I'm guessing that the main use scenarios come from OS kernels or device drivers, but even there inline asm would likely suffice.

Is there any sane reason to actually implement it?

Bare metal code is never sane, but that's not an excuse not to write
it. C is a very complete language, especially modern variations like
C11, but there's still a lot that can't be done in C and for good
reasons. Kernel code and optimal libraries can stretch the compiler a
lot more than user code, and often enough, there's simply no way to
represent ideas in C. One of these examples is unwind code.

Then you ask...

Are there any cases when inline asm would work well enough?

Well, assembly is very powerful and so, but it's also very hard to
understand what's going on. Inline assembly is an extension to the
language, and because of that, different compilers implement them in
different ways. Most of GCC's implementation is hidden in layers upon
layers of legacy code that not many people dare to touch, but that
compiles code that can't stop being compiled, nor easily migrated (for
technical and legal reasons).

The interaction between inline assembly and C is, therefore, not easy
to implement, and it's even harder to get it "right" (ie. similar to
the "other" compiler). The most you could write in C the better for
*compilers*, and minimising the exposure by using register variables
is actually not a bad idea.

But mostly, the __builtin_stack_frame is, in essence, a special case
of the generic pattern of named registers, and gives us the same level
of guarantees, so, in the end, there isn't much *technical* difference
in doing one and the other, with the added benefit that named
registers are already widely used and you won't need to add ifdefs for
new compilers.

We have the
__builtin_stack_pointer now, which is somewhat questionable[1], however this
should not serve a precedent to implement further "extensions" like this.

No we don't. Not yet. :wink:

I was about to commit when I thought I should get a point of view from
all sides, including (and especially) GCC.

The first issue here is that, if GCC doesn't "buy in", having a
__builtin_stack_pointer in Clang is as good as not having it.

The second issue is that, if we're going to implement an extension,
and there is already a technically equivalent solution in existence,
there's no reason to create a different one.

Lastly, and (much) more importantly, their technical arguments were
significantly better than mine.

Argument against it..? "Why?"

Why? Because the less asm and more C code we have, the better. Because
inline asm semantics is a lot more obtuse than named registers.
Because builtins are just special cases of named registers. Because
target-independence is not possible in that kind of code.

Some arguments...

** With builtins, we don't need ifdefs, if all compilers implement it.
-- Yes, we do, for old compilers (GCC and Clang), and kernel/glibc
will need it for decades to come

** Builtins make the code target-intependent
-- No they don't, only the register part. The rest of the code
surrounding is highly target-specific. That's because you only need
that level of detail on code that is very target specific. That kind
of code is normally in separate files.

** Builtins remove the need for named registers
-- No they don't. glibc uses other registers for special purpose code.

** Builtins would work on all targets
-- No they won't. What do you do in targets that the compiler doesn't
implement the builtin because there is no equivalent of a stack
pointer?

Other arguments in favour of named registers:

-- Without this extension, it is e.g. hard to force some input or output
arguments of inline asm into specific machine registers, the only other way
is to use constraints, but most targets don't have constraints for every
specific register

-- The guarantees that a builtin would give us are very little more
than the ones by
named registers, and not enough value to justify the implementation of
yet-another
extension that other compilers won't implement.

You see, I'm not a big fan of extensions, nor I think we should
blindly follow GCC on whatever they do, but in this particular case,
we don't have a strong enough case to make. Even taking the legacy
argument off the table, the technical arguments comparing named
registers and specially crafted builtins are still in favour of named
registers, at least IMHO.

cheers,
--renato

Is there any sane reason to actually implement it?

Bare metal code is never sane, but that's not an excuse not to write
it. C is a very complete language, especially modern variations like
C11, but there's still a lot that can't be done in C and for good
reasons. Kernel code and optimal libraries can stretch the compiler a
lot more than user code, and often enough, there's simply no way to
represent ideas in C. One of these examples is unwind code.

That's all fine, but I'm not sure how this supports having named register builtins. The problem is that once we implement a feature, it may be impossible to get rid of it, should it turn out to be a flop. Do we understand all intended and unintended consequences of implementing this?

Then you ask...

Are there any cases when inline asm would work well enough?

Sorry, that was meant to be "would not work well enough".

Well, assembly is very powerful and so, but it's also very hard to
understand what's going on. Inline assembly is an extension to the
language, and because of that, different compilers implement them in
different ways.

True, but I know of only one other compiler that allows register variables (and it's not even a C compiler). The portability argument is not a very strong one here, at least as I understand it.

Most of GCC's implementation is hidden in layers upon
layers of legacy code that not many people dare to touch, but that
compiles code that can't stop being compiled, nor easily migrated (for
technical and legal reasons).

The interaction between inline assembly and C is, therefore, not easy
to implement, and it's even harder to get it "right" (ie. similar to
the "other" compiler). The most you could write in C the better for
*compilers*, and minimising the exposure by using register variables
is actually not a bad idea.

I'm not sure if I'm following your argument. The code that uses inline asm that cannot be easily migrated will likely not be rewritten to use named registers. Specifically, for those reasons, we need to be implementation-compatible with GCC when it comes to inline asm. So, whether we like it or not, we have to have that part working.

Maybe I'm missing something, but in the previous comments, you mention __builtin_<register_name>, and in the examples given by others, the register name is given as a string. Also, there is some example where the register name is associated with a variable via "asm". All these options are not exactly equivalent, but they all come with some issues.

If a register name is given via a string, as in "register long a asm("rsp")", who will check the type of "a"? On PowerPC, "fpr0" is a floating point register. Would it be legal to have "uint64_t a asm("fpr0")"? How about "float a asm("fpr0")"? What's funny here is that fpr0 is 64-bit long and is incapable of holding a 32-bit IEEE value. If you load a 32-bit fp value into it, it will be automatically extended to 64 bits. With VSX things are different, and if I remember correctly, it is now possible to have a single-precision values in some set of registers. Do you want the front-end to deal with all this? Actually, what's even funnier is that the official PPC assembler syntax does not define "fpr0". It's just 0, and the meaning of it depends on where it's placed. But I digress...

Another example: on Hexagon you can use pairs of registers, for example r15:14. Some instructions can use 64-bit values given in even-odd pairs like that. At the same time, you can use r14 and r15 separately, and both of them will be aliased with the r15:14...

But mostly, the __builtin_stack_frame is, in essence, a special case
of the generic pattern of named registers, and gives us the same level
of guarantees, so, in the end, there isn't much *technical* difference
in doing one and the other, with the added benefit that named
registers are already widely used and you won't need to add ifdefs for
new compilers.

__builtin_frame_pointer specifies a register via the functionality, not by name. In that sense, it is actually something more general than named registers. While many architectures have something like "frame pointer", not a lot of them have "rax".

Argument against it..? "Why?"

Why? Because the less asm and more C code we have, the better.

Yes, but making it "more C" by keeping direct uses of registers and only changing how that is accomplished, is, ahem, akin to porting code from C to C++ by changing the file name from .c to .cpp.

Because
inline asm semantics is a lot more obtuse than named registers.

I'm not sure if that's true. The compiler still needs to conform to the user's desire to have some some value in R3, even if everything except R3 would be a better choice.

Because builtins are just special cases of named registers. Because
target-independence is not possible in that kind of code.

Ok, so by now I'm confused. What exactly are we discussing here:
1. unsigned a asm("rax") : make a be an alias for "rax",
2. a = __builtin_register("rax") : copy "rax" to a,
3. a = __builtin_rax() : copy "rax" to a?

Is it about which of the above is superior to others?

As you can see, I don't like any of those, but some of them I like even less than others. :slight_smile:

The option 3 makes it easier to type-check, since each target can define its own set of builtins with proper types. On the other hand, what do you expect to get? The value of "rax" at that particular place in the code? How would you control what is being loaded into rax?

-Krzysztof

> That's my idea, yes. I'm not sure how Clang would transform the named
> registers into the intrinsic, but something along the lines of:
>
> i8* @SP = "SP";
>
> define void @step() nounwind {
> entry:
> %0 = call i32 @llvm.read_register(i8* @SP)
> %1 = add i32 %0, i32 4
> call void @llvm.write_register(i8* @SP, %1)
> }
>
> declare void @llvm.write_register(i8*, i32) nounwind readnone
> declare i32 @llvm.read_register(i8*) nounwind readnone

I would not produce any llvm global for it. So some insanity like

register long a asm("rsp");
long f(long x) {
  long ret = a;
  a = x;
  return ret;
}

would compile to

define i64 @f(i64 %x) {
  %ret = call i64 @llvm.read_register("rsp");
  call void @llvm.write_register("rsp", i64 %x)
  ret %ret
}
declare void @llvm.write_register(i8*, i64)
declare i64 @llvm.read_register(i8*)

I don't think that works. Per the GCC documentation, a global register
variable reserves the register entirely for use with that name in a
translation unit. We don't seem to want exactly that model, but the
approach you're suggesting doesn't seem to capture the semantics. For
instance:

register long a asm("r12");
void f(long n) {
  n *= 3;
  a += n;
}

... could do the wrong thing if the multiplication happens to use r12.

The IR in question has no mention that the register cannot be
reserved. In fact we do that already in the ARM back-end, reserving
the R9 for special purposes. We could very well reserve the register
in question to not be used. Some of us also mentioned that we should
reserve the register, and I think there's nothing stopping us from
reserving the registers on a compilation unit (module) level for
global named registers.

The problem here is that you can't reserve all registers. On ARM,
R0~R3, SP, LR and PC (and sometimes R9 or R11) cannot be fully
reserved, as they are part of the PCS/execution model, or are reserved
already. GCC docs state that "it's not safe" assuming those things
will by reserved, which is the same effect. The stack pointer can
still be safely used for reading, for example, as is the case of
unwinding.

cheers,
--renato

I think you're misunderstanding... I'm not proposing named register
builtins, just named registers, like GCC.

cheers,
--renato

Well, my main point is that as long as users can access individual registers, the code won't be much closer to C, regardless what mechanism they will use. Since we have to have working inline asm, I'm not sure if having an extra set of builtins will really help a lot. By inserting uses of physical registers in the code the user may impede register allocation and scheduling. The user already has means to accomplish all that and more.

-K

I don't think that works. Per the GCC documentation, a global register
variable reserves the register entirely for use with that name in a
translation unit. We don't seem to want exactly that model, but the approach
you're suggesting doesn't seem to capture the semantics. For instance:

register long a asm("r12");
void f(long n) {
  n *= 3;
  a += n;
}

... could do the wrong thing if the multiplication happens to use r12.

Correct. The proposed solution would work for all non allocatable
registers. We should probably still err if someone tries to use an
allocatable one.

Cheers,
Rafael

AFAIK, GCC reserves the allocatable registers. If we're going to do
this we'd have to be as close as possible to the current behaviour to
avoid surprises.

--renato

Hi Rafael,

I don't get the risk here, why would that risk any output?

I can see that the @llvm.annotation intrinsics use metadata for text,
but they discard them on use. It may be simpler to get an MDNode and
convert it into text, but if the intrinsic is *always*
converted-or-fail, I don't understand how it could output any unwanted
values.

cheers,
--renato

This has been the long standing historical objection to the feature. It is
a *really* invasive change to the register allocator to plumb this kind of
register reservation through it. Worse, the semantics for it being
inherently translation-unit based become deeply confusing in LLVM due to
the potential for (partial) LTO.