[GlobalISel] Narrowing uneven/non-pow-2 types

Hi all,

recently when working with GlobalISel we have often encountered cases in the legalizer where instructions could not be narrowed because the narrowing code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type.

This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`.

Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legalizer handle the rest, even if it requires extending the type before it can be narrowed.

We were now wondering if such a behavior in the legalizer is even desired? I.e. that an instruction is extended to an easier-to-handle type even though narrowing was requested (as long as the narrowing is done in the end)?

Would be great to get some feedback so we know whether we can start adding support for some instructions or if we just need to rewrite some of our legalization rules using the internal knowledge.

Regards,

Dominik

Hi all,

recently when working with GlobalISel we have often encountered cases in the legalizer where instructions could not be narrowed because the narrowing code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type.

This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`.

Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legalizer handle the rest, even if it requires extending the type before it can be narrowed.

Being able to specify the order of specific legalization steps is pretty useful, and one area I found frustrating in SelectionDAG. For example, we do use custom lowering in some places to hack around the order that operation expansion and legalization is done. In this particular case, another helper may be useful, since forcing a power of 2 and bounding the size are two distinct constraints.

We were now wondering if such a behavior in the legalizer is even desired? I.e. that an instruction is extended to an easier-to-handle type even though narrowing was requested (as long as the narrowing is done in the end)?

I would need more context on this question, and what specifically you are hitting. The current set of implemented legalization rules is fairly incomplete, and not all operations use a consistent strategy

-Matt

Hi Matt,

thanks for responding. I left a couple of comments down below.

Hi all,

recently when working with GlobalISel we have often encountered cases in the legalizer where instructions could not be narrowed because the narrowing code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type.

This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`.

Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legalizer handle the rest, even if it requires extending the type before it can be narrowed.

Being able to specify the order of specific legalization steps is pretty useful, and one area I found frustrating in SelectionDAG. For example, we do use custom lowering in some places to hack around the order that operation expansion and legalization is done. In this particular case, another helper may be useful, since forcing a power of 2 and bounding the size are two distinct constraints.

Hm, I see. That makes a lot of sense. What could such a helper look like? I was thinking about something along the lines of clampScalarWithWiden or narrowScalarWithWiden (if you have better suggestions, I would highly appreciate it) which would be a short-hand for a widen to the next multiple of the narrow type and narrowScalar.

My main concern here is that if at one point in the future narrowing for an uneven type suddenly gets supported, I would rather have the legalizer skip the widen, which would not work with such a helper, as far as I'm aware.

I guess what I would actually prefer is a helper which first tries to narrow and if it doesn't work uses widen before trying again.

We were now wondering if such a behavior in the legalizer is even desired? I.e. that an instruction is extended to an easier-to-handle type even though narrowing was requested (as long as the narrowing is done in the end)?

I would need more context on this question, and what specifically you are hitting. The current set of implemented legalization rules is fairly incomplete, and not all operations use a consistent strategy

Basically we have an architecture which only works on 32-bit types and we are starting to hit many of the edge cases. For example this particular question arose because we are seeing the following LLVM-IR, which we cannot legalize with our current legalization rules:

 %6 = zext i32 %5 to i33
 %7 = zext i32 %0 to i33
 %8 = mul i33 %6, %7
 %9 = lshr i33 %8, 1
 %10 = trunc i33 %9 to i32

getActionDefinitionsBuilder(G_MUL)
.legalFor({s32})
.clampScalar(0, s32, s32);

getActionDefinitionsBuilder(G_LSHR)
.legalFor({{s32, s32}})
.clampScalar(1, s32, s32)
.clampScalar(0, s32, s32);

We would be able to legalize the above code if we just add a widenScalarToNextPow2 before the clampScalar in both cases. However we only know that this will work because we looked at the narrowing code and we are now having an internal discussion if our legalization rules should rely on such implementation details. Furthermore, this then raises the question if other instructions also need this widen-before-narrow pattern because a straight narrow action would not work, even though we're not seeing any actual failures due to them so far.

I hope that clarifies the context of this problem.

-Matt

Regards,

Dominik

Hi Dominik,

I'm way behind on llvm-dev at the moment but I'm slowly catching up and came across your post.

Hi all,

recently when working with GlobalISel we have often encountered cases in the legalizer where instructions could not be narrowed because the narrowing code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type.

This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`.

Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legalizer handle the rest, even if it requires extending the type before it can be narrowed.

I agree that this is exploiting internal knowledge. clampScalar() (and the underlying narrowScalar()) is supposed to do the right thing by itself. It can be useful to direct the legalizer to make something less-legal before converging on a legal result but in this case I think you're working around a bug in narrowScalar().

It sounds like you're hitting cases where narrowScalar either gives up or emits invalid code. Either way, it's a bug as (for the scalar case from your example) it's intended to be able to handle any scalar to any smaller scalar. It's technically valid for narrowScalar to emit an intermediate widen using a different operation. That's something that has to be done very carefully though because of the risk of infinite loops where the ruleset and the implementation conflict. That said, I'd expect it to be safe w.r.t infinite loops if the widen is done with G_ANYEXT because G_ANYEXT/G_TRUNC have some mandatory legality and we already rely on being able to freely widen/narrow with them.

We were now wondering if such a behavior in the legalizer is even desired? I.e. that an instruction is extended to an easier-to-handle type even though narrowing was requested (as long as the narrowing is done in the end)?

The big danger of this is that you end up with infinite loops in the ruleset. For the case in your example you're safe as long as later rules never produce a non-power-of-2 type.

Hi Matt,

thanks for responding. I left a couple of comments down below.

Hi all,

recently when working with GlobalISel we have often encountered cases in the legalizer where instructions could not be narrowed because the narrowing code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type.

This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`.

Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legalizer handle the rest, even if it requires extending the type before it can be narrowed.

Being able to specify the order of specific legalization steps is pretty useful, and one area I found frustrating in SelectionDAG. For example, we do use custom lowering in some places to hack around the order that operation expansion and legalization is done. In this particular case, another helper may be useful, since forcing a power of 2 and bounding the size are two distinct constraints.

Hm, I see. That makes a lot of sense. What could such a helper look like? I was thinking about something along the lines of clampScalarWithWiden or narrowScalarWithWiden (if you have better suggestions, I would highly appreciate it) which would be a short-hand for a widen to the next multiple of the narrow type and narrowScalar.

This sounds like it's really just blessing the workaround which I'm not really keen on as a solution. I'd rather narrowScalar() was taught better ways to implement s33->s32.

Hi Matt,

thanks for responding. I left a couple of comments down below.

Hi all,

recently when working with GlobalISel we have often encountered cases in the legalizer where instructions could not be narrowed because the narrowing code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type.

This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`.

Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legalizer handle the rest, even if it requires extending the type before it can be narrowed.

Being able to specify the order of specific legalization steps is pretty useful, and one area I found frustrating in SelectionDAG. For example, we do use custom lowering in some places to hack around the order that operation expansion and legalization is done. In this particular case, another helper may be useful, since forcing a power of 2 and bounding the size are two distinct constraints.

Hm, I see. That makes a lot of sense. What could such a helper look like? I was thinking about something along the lines of clampScalarWithWiden or narrowScalarWithWiden (if you have better suggestions, I would highly appreciate it) which would be a short-hand for a widen to the next multiple of the narrow type and narrowScalar.

This sounds like it's really just blessing the workaround which I'm not really keen on as a solution. I'd rather narrowScalar() was taught better ways to implement s33->s32.

Shortly after the last email in this thread I came across the extractParts and insertParts helper which use unmerge/extract to extract the requested size plus any left-overs (and vice-versa). I think that most of the instructions we care about with our backend could be rewritten to use them. We already have a couple of them (mainly arithmetics) on our TODO list but unfortunately I have other things to take care of first.