Soft-float

Hi,

I tried out the new soft-float support from the mainline.

Overall, it looks very nice and pretty clean. It is now extremely easy
to add the soft-float support for your target. Just do not call
addRegisterClass() for your FP types and they will be expanded into
libcalls.

But there are several minor things that would be still nice to have:

a) It is not possible to express that:
   - f32 and f64 are both illegal and therefore are mapped to integers
   - but only f64 is emulated on the target and there are no f32
arithmetic libcalls available (which is the case on our target)

To make it possible, f32 should be always promoted first to f64 and
then an f64 operation should be applied.

I see a small problem here with the current code, since f32 should be
promoted to the illegal type f64. It might require some special-case
handling eventually. For example, what should be the result
getTypeToTransformTo(f32)? On the one hand, it is f64. On the other
hand it is illegal.
           
b) LLVM currently uses hard-wired library function names for
  FP libcalls (and ALL other libcalls as well). It would be nice if
they would be customizable, since some targets (e.g. some embedded
systems) have some existing naming conventions that are to be followed
and cannot be changed. For example, on our embedded target all libcalls
for soft-float and some integer operations have very target-specific
names for historical reasons.

TargetLowering class specific for a target could be extended to handle
that and this would require only minor changes.
              
c) LLVM libcalls currently pass their parameters on stack. But on some
embedded systems FP support routines expect parameters on specific
registers.
   At the moment, SelectionDAGLegalize::ExpandLibCall() explicitly uses
CallingConv::C, but it could be made customizable by introducing a
special libcall calling convention or even better by allowing the
target specific lowering of libcalls. Actually it can be combined with
the solution for (b). In this case target-specific lowering can take
care about names of libcalls and also how they handle their parameters
and return values.
              
d) Would it be possible with current implementation of soft-float
support to map f32/f64 to integer types smaller than i32, e.g. to i16?
I have the impression that it is not necessarily the case, since it
would require that f64 is split into 4 parts.
   This question is more about a theoretical possibility. At the moment
my embedded target supports i32 registers. But some embedded systems
are still only 16bit, which means that they would need something like
this.
   I'm wondering, how easy or difficult would it be to support such a
mapping to any integer type?

My impression is that (b) and (c) are very easy to implement, but (a)
and (d) could be more chellenging.

Evan, I guess you are the most qualified person to judge about this,
since you implemented the new soft-float support.

What do you think about these extension proposals?

-Roman

Overall, it looks very nice and pretty clean. It is now extremely easy
to add the soft-float support for your target. Just do not call
addRegisterClass() for your FP types and they will be expanded into
libcalls.

Great.

a) It is not possible to express that:
  - f32 and f64 are both illegal and therefore are mapped to integers
  - but only f64 is emulated on the target and there are no f32
arithmetic libcalls available (which is the case on our target)

To make it possible, f32 should be always promoted first to f64 and
then an f64 operation should be applied.

Ok. This shouldn't be the default behavior (because it will generate significantly less efficient code for targets that have both) but we should support this.

I see a small problem here with the current code, since f32 should be
promoted to the illegal type f64. It might require some special-case
handling eventually. For example, what should be the result
getTypeToTransformTo(f32)? On the one hand, it is f64. On the other
hand it is illegal.

I think getTypeToTransformTo(f32) should return f64. That would be recursively expanded to i64, then to 2x i32 if needed.

I haven't looked at the mechanics required to get this working, but it shouldn't be too ugly.

b) LLVM currently uses hard-wired library function names for
FP libcalls (and ALL other libcalls as well). It would be nice if
they would be customizable, since some targets (e.g. some embedded
systems) have some existing naming conventions that are to be followed
and cannot be changed. For example, on our embedded target all libcalls
for soft-float and some integer operations have very target-specific
names for historical reasons.

TargetLowering class specific for a target could be extended to handle
that and this would require only minor changes.

Yes, TargetLowering would be a natural place to put this. Patches welcome :slight_smile:

c) LLVM libcalls currently pass their parameters on stack. But on some
embedded systems FP support routines expect parameters on specific
registers.
  At the moment, SelectionDAGLegalize::ExpandLibCall() explicitly uses
CallingConv::C, but it could be made customizable by introducing a
special libcall calling convention or even better by allowing the
target specific lowering of libcalls. Actually it can be combined with
the solution for (b). In this case target-specific lowering can take
care about names of libcalls and also how they handle their parameters
and return values.

Yep, TargetLowering can have a pair for each libcall: a function name and a calling convention to use. Patches welcome :slight_smile:

d) Would it be possible with current implementation of soft-float
support to map f32/f64 to integer types smaller than i32, e.g. to i16?
I have the impression that it is not necessarily the case, since it
would require that f64 is split into 4 parts.

Yes, this should be fine.

  This question is more about a theoretical possibility. At the moment
my embedded target supports i32 registers. But some embedded systems
are still only 16bit, which means that they would need something like
this.
  I'm wondering, how easy or difficult would it be to support such a
mapping to any integer type?

It should be transparently handled by the framework. Basically, you'd get:

f32 -> f64 -> i64 -> 2x i32 -> 4x i16

If you don't add a register class for i32 or i64, but you do have one for i16, legalize will already 'expand' them for you.

Note that we don't have any 16-bit targets in CVS, so there may be minor bugs, but the framework is all there. Duraid was working on a 16-bit port at one point (and reported a few bugs, which were fixed) but it was never contributed.

-Chris

d) Would it be possible with current implementation of soft-float
support to map f32/f64 to integer types smaller than i32, e.g. to i16?
I have the impression that it is not necessarily the case, since it
would require that f64 is split into 4 parts.

Yes, this should be fine.

  This question is more about a theoretical possibility. At the moment
my embedded target supports i32 registers. But some embedded systems
are still only 16bit, which means that they would need something like
this.
  I'm wondering, how easy or difficult would it be to support such a
mapping to any integer type?

It should be transparently handled by the framework. Basically, you'd
get:

f32 -> f64 -> i64 -> 2x i32 -> 4x i16

If you don't add a register class for i32 or i64, but you do have one for
i16, legalize will already 'expand' them for you.

This will probably require a slightly more extensive patch to legalizer. The current mechanism assumes either 1->1 or 1->2 expansion. It also assumes the result of expansion are of legal types. That means, you will have to either 1) modify ExpandOp() to handle cases which need to be recursively expanded or 2) modify it to return a vector of SDOperand's. Solution one is what I would pursue. It's not done simply because there isn't a need for it right now. :slight_smile:

Evan

>> d) Would it be possible with current implementation of soft-float
>> support to map f32/f64 to integer types smaller than i32, e.g. to

>> i16?
>> I have the impression that it is not necessarily the case, since
it would require that f64 is split into 4 parts.
>
> Yes, this should be fine.
>
>> This question is more about a theoretical possibility. At the
>> moment my embedded target supports i32 registers. But some

embedded systems are still only 16bit, which means that they would
need something likethis.

>> I'm wondering, how easy or difficult would it be to support such
a mapping to any integer type?
>
> It should be transparently handled by the framework. Basically,
you'd
> get:
>
> f32 -> f64 -> i64 -> 2x i32 -> 4x i16
>
> If you don't add a register class for i32 or i64, but you do have
> one for i16, legalize will already 'expand' them for you.
>

This will probably require a slightly more extensive patch to
legalizer. The current mechanism assumes either 1->1 or 1->2
expansion.

Exactly. This is what I meant with "more chellenging":wink: It is assumed
at several places that 1->1 or 2->2 expanstions are taking place. A
generic case is not handled yet.

It also assumes the result of expansion are of legal
types.

Yes. And this is also a reason why it is not too obvious how to handle
f32->f64 promotion and later f64->i64 expansion on targets that support
only f64 soft-floats.

Chris Lattner wrote:

That would be recursively expanded to i64, then to 2x i32 if needed.

I tried to set getTypeToTransformTo(f32) to return f64, even when f64
is illegal type. But this recursive expansion does not take place with
the current legalizer implementation. Currently, it is assumed that the
result of getTypeToTransformTo() is a legal type. For example,
CreateRegForValue tries to create a register of such a promoted type
and fails in the above mentioned case.

Evan wrote:

That means, you will have to either 1) modify ExpandOp() to
handle cases which need to be recursively expanded or 2) modify it to

return a vector of SDOperand's. Solution one is what I would pursue.

Agreed. I also feel that some sort of recursive expansion is required.

I also have a feeling that getTypeToTransformTo(MVT::ValueType) should
probably also recurse until it finds a type T where
getTypeToTransformTo(T) = T, i.e. it finds a legal type. This would
almost solve the issue with f32->f64 promotion where both FP types are
illegal. The only concern here is that in this case
getTypeToTransformTo(MVT::f32) would return MVT::i64 and therefore the
information about the fact that it should first be promoted to f64 is
lost. The problem is that getTypeToTransformTo() is used for two
"different" goals: to tell which type to use for register mapping and
to tell which type to use for promotions/expansions for the sake of
"type system correctness". May be it would even make sense to have two
different mappings because of this? One mapping will be used for
allocation of virtual registers and the like and would always return a
legal type and the other will be used just as getTypeToTransformTo() in
LegalizeOp(), ExpandOp() and PromoteOp() and can return also illegal
types?

It's not done simply because there isn't a need for it right now. :slight_smile:

Since I have this need, I'll try to find a solution for this issue and
to provide a patch.

-Roman

This will probably require a slightly more extensive patch to
legalizer. The current mechanism assumes either 1->1 or 1->2
expansion.

Exactly. This is what I meant with "more chellenging":wink: It is assumed
at several places that 1->1 or 2->2 expanstions are taking place. A
generic case is not handled yet.

It also assumes the result of expansion are of legal
types.

Yes. And this is also a reason why it is not too obvious how to handle
f32->f64 promotion and later f64->i64 expansion on targets that support
only f64 soft-floats.

Chris Lattner wrote:

That would be recursively expanded to i64, then to 2x i32 if needed.

I tried to set getTypeToTransformTo(f32) to return f64, even when f64
is illegal type. But this recursive expansion does not take place with
the current legalizer implementation. Currently, it is assumed that the
result of getTypeToTransformTo() is a legal type. For example,
CreateRegForValue tries to create a register of such a promoted type
and fails in the above mentioned case.

All of the issues can be solved by adding the logic to recursively expand operands. They shouldn't be too complicated.

Evan wrote:

That means, you will have to either 1) modify ExpandOp() to
handle cases which need to be recursively expanded or 2) modify it to

return a vector of SDOperand's. Solution one is what I would pursue.

Agreed. I also feel that some sort of recursive expansion is required.

I also have a feeling that getTypeToTransformTo(MVT::ValueType) should
probably also recurse until it finds a type T where
getTypeToTransformTo(T) = T, i.e. it finds a legal type. This would
almost solve the issue with f32->f64 promotion where both FP types are
illegal. The only concern here is that in this case
getTypeToTransformTo(MVT::f32) would return MVT::i64 and therefore the
information about the fact that it should first be promoted to f64 is
lost. The problem is that getTypeToTransformTo() is used for two
"different" goals: to tell which type to use for register mapping and
to tell which type to use for promotions/expansions for the sake of
"type system correctness". May be it would even make sense to have two
different mappings because of this? One mapping will be used for
allocation of virtual registers and the like and would always return a
legal type and the other will be used just as getTypeToTransformTo() in
LegalizeOp(), ExpandOp() and PromoteOp() and can return also illegal
types?

No need to change getTypeToTransformTo(). There is a getTypeToExpandTo() that is expand the type recursively until it find a legal type.

It's not done simply because there isn't a need for it right now. :slight_smile:

Since I have this need, I'll try to find a solution for this issue and
to provide a patch.

Great! There are a few spots where ExpandOp() are called recursively. It would be nice to remove those and use the general expansion facility instead.

Evan

Hi,

I was working on extending soft-float support for handling expansion of
i64 and f64 into i16, i.e. on supporting the expansion of long values
of illegal types into more then two parts. For that I modified
SelectionDAGLowering::getValue() and several other functions.

This seems to work now on my test-cases, but while testing it I ran
into a different problem. I have the impression that I found a bug in
the linear-scan register allocator (local and simple allocators work
just fine on the same code). The problem is that linear scan loops for
ever under certain conditions. May be I overlook something in my target
specific code, but I think that it is wrong in any case if a register
allocator loops for ever.

I attach a part of the log file produced with "llc -debug" where you
can see reg.alloc related bits. Register naming convention is:
rbN - for 8bit registers
rwN - for 16bit registers
rxN - for 32bit registers

I also attach the C source file and the LLVM assembler file for your
convenience.

My personal understanding of what is going on is that it is due to the
incorrect joining of live intervals. If I disable intervals joining by
using --join-liveintervals=false, everything works fine.
According to the log file, what happens during joining is the
following:
1) some of the fixed registers intervals are merged with some virtual
registers intervals
2) later there is a need to spill one of the allocated registers, but
since all joined intervals are FIXED intervals now due to (1), they
cannot be spilled. Therefore, the register allocator loops for ever.

I would be grateful, if someone would confirm that this is a bug. And
of course, it would be very nice if one of the RegAlloc Gurus could fix
it :wink:

-Roman

RegAlloc.txt (17.7 KB)

RegAllocTest.c (121 Bytes)

RegAllocTest.ll (1.15 KB)

following:
1) some of the fixed registers intervals are merged with some virtual
registers intervals
2) later there is a need to spill one of the allocated registers, but
since all joined intervals are FIXED intervals now due to (1), they
cannot be spilled. Therefore, the register allocator loops for ever.

I would be grateful, if someone would confirm that this is a bug. And
of course, it would be very nice if one of the RegAlloc Gurus could fix
it :wink:

This is likely a bug, probably PR711.

Unfortunately, this isn't super easy to fix, I don't have plans to do so in the near future...

-Chris

> following:
> 1) some of the fixed registers intervals are merged with some
virtual
> registers intervals
> 2) later there is a need to spill one of the allocated registers,
but
> since all joined intervals are FIXED intervals now due to (1), they
> cannot be spilled. Therefore, the register allocator loops for
ever.
>
> I would be grateful, if someone would confirm that this is a bug.
And
> of course, it would be very nice if one of the RegAlloc Gurus could
fix
> it :wink:

This is likely a bug, probably PR711.

Unfortunately, this isn't super easy to fix, I don't have plans to do
so in the near future...

OK. I looked at the PR711 at http://llvm.org/bugs/show_bug.cgi?id=711
Indeed, it sounds as the same bug.

Two questions:
1) At least, it would be better if LLVM would crash on an assertion
instead of running for ever in such situations. I think this can be
easily detected, since this is a case where nothing could be spilled.

2) You write in PR711:

This is due to the coallescer coallescing virtregs with both EAX and
EDX, which makes them unavailable to satisfy spills, causing the RA to
run out of registers. We want to coallesce physregs when possible,
but we cannot pin them in the spiller:
we have to be able to >uncoallesce them.

First of all, I totally agree with "we have to be able to uncoallesce
them". Linear Scan already does a backtracking in some situations. I'm
wondering, if it is very difficult to implement the logic, where when
it detects that it runs for ever (i.e. nothing can be spilled for some
reason) it basically backtracks completely, does coallescing again but
ignoring some selected virtual/physical registers (or any attempts to
coalesce with physregs) and tries to allocate the whole function again?
Alternatively, may be it would be possible to rerun the linear-scan
pass without interval joining on a given function? This will probably
produce a worse code, but it is better then crashing or looping for
ever.

this isn't super easy to fix

OK. Do you see any further problems that I have not mentioned above?
Can you elaborate a bit?

-Roman

This is likely a bug, probably PR711.

Unfortunately, this isn't super easy to fix, I don't have plans to do
so in the near future...

OK. I looked at the PR711 at 711 – [linscan] Coallescing physregs with virtregs should allow spilling the virtreg portion
Indeed, it sounds as the same bug.

Two questions:
1) At least, it would be better if LLVM would crash on an assertion
instead of running for ever in such situations. I think this can be
easily detected, since this is a case where nothing could be spilled.

Yes, this is certainly desirable.

2) You write in PR711:

This is due to the coallescer coallescing virtregs with both EAX and
EDX, which makes them unavailable to satisfy spills, causing the RA to
run out of registers. We want to coallesce physregs when possible,
but we cannot pin them in the spiller:
we have to be able to >uncoallesce them.

First of all, I totally agree with "we have to be able to uncoallesce
them". Linear Scan already does a backtracking in some situations. I'm
wondering, if it is very difficult to implement the logic, where when
it detects that it runs for ever (i.e. nothing can be spilled for some
reason) it basically backtracks completely, does coallescing again but
ignoring some selected virtual/physical registers (or any attempts to
coalesce with physregs) and tries to allocate the whole function again?
Alternatively, may be it would be possible to rerun the linear-scan
pass without interval joining on a given function? This will probably
produce a worse code, but it is better then crashing or looping for
ever.

Without being able to detect this situation, we can't recover from it. Because coallescing and linscan are two different passes, we can't really undo both of them. Once coallsecing is done, it's done.

this isn't super easy to fi

OK. Do you see any further problems that I have not mentioned above?
Can you elaborate a bit?

Fixing this properly requires changing the way we represent coallesced registers and how the RA acts on it. In particular, right now, when we coallesce a physreg with a virtreg, we completely lose information about what the code looked like before the coallescing occurred. Instead of treating coallescing like this, it would be better to treat coallescing with a physreg as a hint that we would prefer the virtreg to be allocated to the physreg, not as a command.

-Chris