Help with register allocation for undef inputs to inline asm

(since this is inline asm I'm sending this to cfe-dev, though it
includes llvm's logic)

I've been looking at this bug report:
https://bugs.llvm.org/show_bug.cgi?id=50647
(the initial report is for ARM but I've found it applies to any architecture)

Where an undef input to an inline asm statement isn't assigned its own
register and overlaps a second input value. Here's a minimal version
of it:
void func(unsigned long long n)
{
    unsigned long long b;
    n = 99;

    __asm__ volatile (
        "add %[_b], %[_b], %[_b] \n\t" // Assigned register X
        "add %[_n], %[_n], %[_n] \n\t" // Also assigned register X
        :
        : [_n] "r" (n), [_b] "r" (b)
    );
}

Godbolt: Compiler Explorer

This produces an inline asm statement in IR where the input for "b" is undef.
tail call void asm sideeffect "add $1, $1, $1 \0A\09add $0, $0, $0
\0A\09", "r,r"(i64 99, i64 undef) #2, !dbg !22, !srcloc !23

This makes sense and I can see intuitively why you wouldn't assign a
unique register to an undef value. It has no value after all, it could
be anything including the same value as the other input. I tracked
this decision down to somewhere in the VirtRegRewriter pass but I
haven't been able to pin down the exact place yet.

My question is:
Would making an exception here for the inline asm case make sense? Or
is this an instance of undef values gives you undef results, in a way
that we would be happy to keep. (FWIW gcc does assign unique registers
in this case)

Thanks,
David Spickett.

Ignore undef for a moment. The first question to ask here is, is it legal for two different inputs to an inline asm to be assigned the same register? Consider, for example:

void func()
{
    unsigned long long b = 99, n = 99;
    __asm__ volatile ("# %0, %1" :: "r"(n), "r"(b));
}

The answer here is yes, we can assign them to the same register, since they have the same value. Clang and gcc agree here.

Now consider the case of an undef input. Given the behavior we've established, you're basically asking, "can we add an exception that undef is never equal to 99?".

-Eli

Thanks! The example really helps make this clearer.

Now consider the case of an undef input. Given the behavior we've established, you're basically asking, "can we add an exception that undef is never equal to 99?".

Reading LLVM Language Reference Manual — LLVM 16.0.0git documentation I see:
"These examples show the crucial difference between an undefined value
and undefined behavior. An undefined value (like ‘undef’) is allowed
to have an arbitrary bit-pattern."

It's valid for us to say that undef could be 99 and use the same
register. Or you could say that undef could be != 99 and assign
different registers.

But the main issue here is the assumption (in the source code) that
different inputs always get different registers. Which you've shown is
not true, undef or otherwise.

So no exception needed for the undef case, this is working as expected.

(since this is inline asm I'm sending this to cfe-dev, though it
includes llvm's logic)

I've been looking at this bug report:
50647 – Incorrect code generation for ARM with inline assembly
(the initial report is for ARM but I've found it applies to any architecture)

Where an undef input to an inline asm statement isn't assigned its own
register and overlaps a second input value. Here's a minimal version
of it:
void func(unsigned long long n)
{
     unsigned long long b;
     n = 99;

     __asm__ volatile (
         "add %[_b], %[_b], %[_b] \n\t" // Assigned register X
         "add %[_n], %[_n], %[_n] \n\t" // Also assigned register X
         :
         : [_n] "r" (n), [_b] "r" (b)
     );
}

Godbolt: Compiler Explorer

This produces an inline asm statement in IR where the input for "b" is undef.
tail call void asm sideeffect "add $1, $1, $1 \0A\09add $0, $0, $0
\0A\09", "r,r"(i64 99, i64 undef) #2, !dbg !22, !srcloc !23

forgive me, but isn't that asm just specifying inputs? There are no outputs, so the allocator doesn't know anything's being clobbered?

Indeed, if I change it to:
    __asm__ volatile (
         "add %[_b], %[_b], %[_b] \n\t"
         "add %[_n], %[_n], %[_n] \n\t"
         : [_n] "+r" (n), [_b] "+r" (b)
         :
     );

I get different registers for _n and _b.

forgive me, but isn't that asm just specifying inputs? There are no outputs, so the allocator doesn't know anything's being clobbered?

Yes. I wasn't aware that "+" was a thing but that looks like the
proper way to do this.

"Operands using the ‘+’ constraint modifier count as two operands
(that is, both as input and output) towards the total maximum of 30
operands per asm statement."

And you're right without the clobber info you're going to have issues
if this asm is anywhere other than at the end of a void returning
function (which the original report was). Using + solves both issues,
thanks!

forgive me, but isn't that asm just specifying inputs? There are no outputs, so the allocator doesn't know anything's being clobbered?

Yes. I wasn't aware that "+" was a thing but that looks like the
proper way to do this.

"Operands using the ‘+’ constraint modifier count as two operands
(that is, both as input and output) towards the total maximum of 30
operands per asm statement."

And you're right without the clobber info you're going to have issues
if this asm is anywhere other than at the end of a void returning
function (which the original report was). Using + solves both issues,
thanks!

great! a couple of other points:

a) be aware of the '&' constraint -- this means that output is written *before* all the inputs have been read. It's an 'early clobber', and cannot reside in the same register as an input.

b) in your example 'volatile' could inhibit optimization, there's nothing volatile about the actual asm. But of course, the code you derived this from may have hidden side-effects. Generally "set magic_reg, %[val]" kinds of asms need that.