Troubling promotion of return value to Integer ...

In this thread I’m trying to merge two email threads into one, because both discuss the same problem that’s troubling us and I would like to reach a conclusion on what would be the best approach.

To minimize the size of this thread I only mention the subject of the other two threads:

(1) [LLVMdev] Integer promotion of return node operand (initiated by Sachin)

(2) [LLVMdev] trouble with 32-bit promotion of return value (initiated by myself)

To summarize:

Evan has replied to thread (1) and suggested to add a calling convention and check for it in visitRet

Dan replied to thread (2) and suggested to add a new field to the TargetLowering class, and then let each target specify the minimum integer type to promote return types to

Either way, I think we all agree that the root of the problem is the FIXME in SelectionDAGLowering::visitRet()

But what is the preferred method?

Thanks

Ali

Either way, I think we all agree that the root of the problem is the
FIXME in SelectionDAGLowering::visitRet()

What I would like to know is if the promotion of return values to
integer is specified by the C language standard, or if it is specified
by the (target specific) ABI.

Ciao,

Duncan.

In this thread I’m trying to merge two email threads into one, because both discuss the same problem that’s troubling us and I would like to reach a conclusion on what would be the best approach.
To minimize the size of this thread I only mention the subject of the other two threads:
(1) [LLVMdev] Integer promotion of return node operand (initiated by Sachin)
(2) [LLVMdev] trouble with 32-bit promotion of return value (initiated by myself)

To summarize:
Evan has replied to thread (1) and suggested to add a calling convention and check for it in visitRet
Dan replied to thread (2) and suggested to add a new field to the TargetLowering class, and then let each target specify the minimum integer type to promote return types to

Either way, I think we all agree that the root of the problem is the FIXME in SelectionDAGLowering::visitRet()

But what is the preferred method?

Both. :slight_smile:

If you are using a non-C ABI, then you may not need to do the promotion at all. However, if you do need to do the promotion, you need to know the type of “int”. It’s not entirely clear to me what is the best way to accomplish. To me, I don’t think TargetLowering is the right place to add the hook. How about TargetData.h ? There is a getIntPtrType(), perhaps we need a getIntType()?

Evan

If you are using a non-C ABI, then you may not need to do the promotion at all. However, if you do need to do the ….

Our port is for C language, however, on an 8-bit platform priorities shift a little bit.

Really the natural size of int must be 8 bit, however, 8 bit is too small for most calculations so we have to choose 16, but there is no 16-bit register available. And that is where the whole problem starts because promoting the return of char to int becomes too expensive; so we don’t want to do it.

Being able to turn the promotion off is going to solve our problem, but currently there is no hook for that in target specific classes.

Ali.

Hi Evan,

If you are using a non-C ABI, then you may not need to do the
promotion at all. However, if you do need to do the promotion, you
need to know the type of "int".

how does this work for parameters. Suppose I have an i1 parameter.
The AMD64 ABI says it is passed in one of %rsi, %rdx, %rcx, %r8 and
%r9 registers. In the case of _Bool (i1) it says that the upper 63
bits will be zero. How does this work? I thought i1 would be legalized
to i8 (which is legal, right?). Due to being marked zext, the upper
7 bits of the i8 will be zero. But where does it get extended to
an i64 with upper 63 bits zero?

Thanks,

Duncan.

Hi Ali,

I think there are two sides to this issue. The ultimate problem is that the code generator has a builtin assumption that sizeof(int) = 32 bits. IIRC, this comes into play in two places:

  1. Arguments to calls are known to be sign/zero extended to int in certain cases.
  2. The return value of functions is implicitly promoted.

Lets talk about #2 first, because it is the simplest case. Here the front-end types the function as (f.e.) returning i8, and then has the code generator insert the extension to (for example) i32. The attribute on the function tells the code generator whether to sign or zero extend it.

To me, the solution to this is to have the C front-end insert an explicit extension of the appropriate type to the right size. This would mean that the code generator would now never insert the extension. Since the C front-end knows the size of int, on your target, you could have it extend to 16 or 8 bits, as appropriate. This is also good, because the extension is exposed to the mid-level optimizer, allowing it to be eliminated more aggressively.

The bad thing about doing this is that, on the caller side, we lose information. With the current approach, the caller knows that the value is (for example) 8 bits but sign or zero extended as appropriate. This allows the caller side to do optimizations based on this fact. If all return values were i32, we wouldn’t know that the top 24 bits were sign/zero extended. I think we can handle this by adding some new attributes, but this is marginally ugly.

What do you think?

-Chris

To me, the solution to this is to have the C front-end insert an
explicit extension of the appropriate type to the right size.

Opinion:

The front end should be responsible for introducing all necessary
promotions, sign extensions, or zero extensions. The absence of these
operations where required is a type error in the emitted IR, and should
be treated as such. The job of the back end is to reject ill-typed input
as malformed, not correct it. Therefore, the code generator should not
be inserting these promotions.

As a matter of implementation, an explicitly run decorating IR pass
might be a fine implementation that avoids invasive changes in current
front ends.

If all return values were i32, we wouldn't know that the top 24 bits
were sign/zero extended. I think we can handle this by adding some
new attributes, but this is marginally ugly.

The real issue here is that there is a divergence between the logical
type and the representation type, the back end needs to know both for
different purposes, and within the constraints of the current IR (absent
new annotations) information about one or the other is lost.

Is this a more general issue, or are annotations sufficient?

shap

To me, the solution to this is to have the C front-end insert an
explicit extension of the appropriate type to the right size.

Opinion:

The front end should be responsible for introducing all necessary
promotions, sign extensions, or zero extensions. The absence of these
operations where required is a type error in the emitted IR, and should
be treated as such. The job of the back end is to reject ill-typed input
as malformed, not correct it. Therefore, the code generator should not
be inserting these promotions.

Actually, the code generator only accepts well formed code. The front-end is responsible for rejecting invalid code.

If all return values were i32, we wouldn't know that the top 24 bits
were sign/zero extended. I think we can handle this by adding some
new attributes, but this is marginally ugly.

The real issue here is that there is a divergence between the logical
type and the representation type, the back end needs to know both for
different purposes, and within the constraints of the current IR (absent
new annotations) information about one or the other is lost.

This is a representational issue, not a philisophical one. The question is where to expose the operation, not whether to do so. The sign/zero extension must be done.

-Chris

Hi Evan,

If you are using a non-C ABI, then you may not need to do the
promotion at all. However, if you do need to do the promotion, you
need to know the type of "int".

how does this work for parameters. Suppose I have an i1 parameter.
The AMD64 ABI says it is passed in one of %rsi, %rdx, %rcx, %r8 and
%r9 registers. In the case of _Bool (i1) it says that the upper 63
bits will be zero. How does this work? I thought i1 would be legalized
to i8 (which is legal, right?). Due to being marked zext, the upper
7 bits of the i8 will be zero. But where does it get extended to
an i64 with upper 63 bits zero?

X86ISelLowering will promote the i8 parameters to i32.

Evan

Chris,

I did not quite understand the “The bad thing about …” part of your argument. I’m not sure which of the two scenarios are you comparing (promoting in FrontEnd vs BackEnd) or (promotion to int vs i32)

Regardless, I agree with you and shap that promotions should take place in the FrontEnd and this is my reason:

The standard allows calling a function without prototype and assumes “int” return value; and I realize that this is the primary reason why the return value is being promoted. This ties this promotion to int type and not the size of any register in target, which in turn makes it a language issue rather than code generation issue; hence FrontEnd must take care of it.

Now for our port, things are different. In most (sane) ports, the target provides an integer class for int… however, things are different in our (insane) port. We only have an 8-bit data bus but we also want 16-bit int type. So this promotion will make char return values (which are used heavily in many application in 8-bit environment) quite inefficient. So we need a way to turn off this promotion all together and deal with default function prototype in a different way (perhaps issue a diagnostic …)

Cheers

Ali

Chris,
I did not quite understand the “The bad thing about …” part of your argument. I’m not sure which of the two scenarios are you comparing (promoting in FrontEnd vs BackEnd) or (promotion to int vs i32)

Assume we make the change I suggested. This means that this C function (on x86-32):

char foo() {
char c = …
return c;
}

would compile to something like:

define i32 @foo() {

%tmp = sext i8 … to i32
ret i32 %tmp
}

Since “char” is sign extend to 32 bits on this target, there would be an explicit sign extension. This is goodness. The problem is that now you have a caller:


char bar() {
char x = foo()
return char;
}

This would get compiled (by the front-end to something like this:

define i32 @bar() {
; call
%tmp = call i32 @foo()
%x = trunc i32 %tmp to i8

; return
%tmp2 = sext i8 to i32
ret i32 %tmp2
}

The problem with this is that we now have a trunc+sext instruction that cannot be eliminated. The loss here is that while foo returns an i32 value, we lost the fact that the top 25 bits all have the same value (i.e. that it is a sign extended 8-bit value). Because of this, we can’t eliminate the trunc+sext pair in bar, pessimizing code (and in a very common case!). I am just saying that we should have some attribute (or something) on foo that indicates its result is actually sign extended, allowing the optimizer to zap the trunc+sext in bar.

The standard allows calling a function without prototype and assumes “int” return value; and I realize that this is the primary reason why the return value is being promoted. This ties this promotion to int type and not the size of any register in target, which in turn makes it a language issue rather than code generation issue; hence FrontEnd must take care of it.

Right, I agree this should be fixed.

Now for our port, things are different. In most (sane) ports, the target provides an integer class for int… however, things are different in our (insane) port. We only have an 8-bit data bus but we also want 16-bit int type. So this promotion will make char return values (which are used heavily in many application in 8-bit environment) quite inefficient. So we need a way to turn off this promotion all together and deal with default function prototype in a different way (perhaps issue a diagnostic …)

Sure. The other good thing about exposing this extensions in the IR is that interprocedural optimizations can eliminate them. This gives the optimizer the ability to actually eliminate the extensions and truncations completely.

-Chris

Chris:

In an attempt to avoid adding distraction, I withheld the following
thought earlier. It may be equivalent to your annotation idea, but I
want to throw it out in case it provokes something useful.

Extension and promotion is a peculiar case. If an

   %x = sext i8 %y to i32

has already been performed, then there is no harm (in the sense that no
value is changed) by executing it redundantly on the low field of the
result register, as in:

   %z = sext i8 %x to i32

I'm giving an input %x that is known to be an i32 here intentionally.

This prompts me to the following thought, which may not be helpful.
Perhaps we might introduce something like the following two
pseudo-instructions in the IR:

   resext sign extend on a quantity that the front end asserts has
            already been sign extended.

   rezext zero extend on a quantity that the front end asserts has
            already been zero extended.

These instructions serve to advise the caller-side optimizer that the
required extend has been done, which tells us what we need to know for
optimizations. If they inadvertently make it to the back end, they
peephole away as a NO-OP.

Since we are already trusting the front end to introduce the correct
extension in the callee, I don't see any harm in trusting it to advise
us that it did so at the caller return site. The outcome is certainly no
worse than a front end that simply generates IR that is wrong, and
omission of the advisory instruction merely results in loss of an
optimization opportunity rather than any error of computational
semantics.

Like I say, not sure if this is helpful.

shap

In an attempt to avoid adding distraction, I withheld the following
thought earlier. It may be equivalent to your annotation idea, but I
want to throw it out in case it provokes something useful.

No problem, ideas are welcome.

I'm giving an input %x that is known to be an i32 here intentionally.

This prompts me to the following thought, which may not be helpful.
Perhaps we might introduce something like the following two
pseudo-instructions in the IR:

  resext sign extend on a quantity that the front end asserts has
           already been sign extended.

  rezext zero extend on a quantity that the front end asserts has
           already been zero extended.

These instructions serve to advise the caller-side optimizer that the
required extend has been done, which tells us what we need to know for
optimizations. If they inadvertently make it to the back end, they
peephole away as a NO-OP.

Ok, so these are basically "copies with assertions". There are a couple of similar proposals for things like this (for example, with pointers you might want to make some assertion w.r.t. aliasing).

The downside of this sort of approach (vs attributes) is that it increases the size of the every caller of the function. The advantage is that it is more explicit, potentially more general, and doesn't require new attributes for each size. OTOH, some of these advantages go away if this gets extended for other properties: we end up having a copy with attributes on it :slight_smile:

Since we are already trusting the front end to introduce the correct
extension in the callee, I don't see any harm in trusting it to advise
us that it did so at the caller return site.

Right. If it's part of the ABI, it is fair (and important!) for the optimizer to be able to use this.

The outcome is certainly no
worse than a front end that simply generates IR that is wrong, and
omission of the advisory instruction merely results in loss of an
optimization opportunity rather than any error of computational
semantics.

Yep.

-Chris

Agreed, with one caveat:

I don't understand how attributes work at this point, but if the
use-case is specific to each caller, the attribute will need to be
attached separately to each call point, and the difference in space for
such cases vs "copies with assertions" may not turn out to be very big.

A general observation I would make -- this is coming from other systems
that I have worked on and does not reflect any understanding of LLVM
internals -- is that annotation schemes are often a source of error in
programs. Perhaps I have gotten them wrong when I have attempted them,
but what I have found in the past is that I end up with a lot of code
where I have a primary switch-like construct on the main object type (in
this case I imagine a switch on IR instruction), followed by a bunch of
special cases that look for applicable annotations within each case.

What I have observed to happen when code is structured this way is that
two types of coding failures become common:

  1. Failures of case analysis, where some annotation is applicable in
     several places or states, but is missing in some subset of these.
     These can be hard to locate.

  2. Failures of update, of the form "I caught 4 of the 5 places that
     needed this update."

  3. What one really ends up with, in effect, is a situation where the
     *real* instruction (from a semantic perspective) was the alleged
     instruction plus the semantically important annotation, but the
     representational encoding of the real instruction has been made
     awkward to deal with by splitting it into multiple substructures.

That is: they are "robust to error", in the sense that with high
likelihood errors will be introduced by maintenance over time.

Because of this, I have come to feel in my own code that
operation-specific annotations are often better dealt with by new
operations that can be inserted directly into the primary switch. I can
always run multiple cases into the same switch block, and the compiler
will help me notice which ones I forgot. I try to reserve use of
annotations and attributes for situations where:

  1. The annotation/attribute is advisory or non-semantic, or

  2. The annotation/attribute is almost universally applicable, and is
     therefore not likely to need independent consideration at
     multiple points in a long switch-like construct. Instead
     it gets dealt with at one place, either at top or bottom of
     the switch.

It's a trade-off, and I see no hard and fast rule that can be extracted
here, but my personal bias is to err in favor of robustness under
maintenance rather than in favor of reduced memory consumption.

shap

>> If you are using a non-C ABI, then you may not need to do the
>> promotion at all. However, if you do need to do the promotion, you
>> need to know the type of "int".
>
> how does this work for parameters. Suppose I have an i1 parameter.
> The AMD64 ABI says it is passed in one of %rsi, %rdx, %rcx, %r8 and
> %r9 registers. In the case of _Bool (i1) it says that the upper 63
> bits will be zero. How does this work? I thought i1 would be
> legalized
> to i8 (which is legal, right?). Due to being marked zext, the upper
> 7 bits of the i8 will be zero. But where does it get extended to
> an i64 with upper 63 bits zero?

X86ISelLowering will promote the i8 parameters to i32.

OK, so why doesn't it work the same for return values?

Ciao,

Duncan.

To me, this “optimization” problem doesn’t seem to be a consequence of doing the promotion differently than what is currently being done in llvm.

Anyways, I don’t agree that the information is completely lost. Shouldn’t an llvm pass be able to figure out if the truncated value is coming from a function call, and since it is aware of calling convention, it should be able to deduce that tmp2 can really take its value from function call?

So the pass will modify

define i32 @bar() {

; call

%tmp = call i32 @foo()

%x = trunc i32 %tmp to i8

; return

%tmp2 = sext i8 to i32

ret i32 %tmp2

}

To

define i32 @bar() {

; call

%tmp = call i32 @foo()

%tmp2 = %tmp

%x = trunc i32 %tmp to i8

; return

ret i32 %tmp2

}

Meanwhile, the FrontEnd is consistent with the calling convention and produce the correct IR with necessary promotions.

Ali

It could be. That would be one way to fix (at least part of the fix) it.

Evan

No, because the callee and caller may be in different translation units.

-Chris

But they both follow the same calling convention. There are two possibilities in the caller:

  1. Call node returns sizeof(int) or larger: in this case there is no truncation.

  2. Call node returns smaller than sizeof(int): in this case callee always has to return an int so it has to consistently either sign extend or zero extend because int is either signed or unsigned consistently for that port. Assuming that caller and callee follow the same calling convention, caller always knows (hence the llvm pass knows) that, if the return value of the called function is being truncated, it is because of return value promotion. It also knows the calling convention, so it knows whether it is sign or zero extended. This is regardless of translation unit where the callee is in.

Correction:

The analysis I made regarding the callers knowledge of sign/zero extension of return value is flawed. So I take it back.

Never the less, I don’t see how adding attributes would resolve this problem either.

Ali