ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.

I’m curious about this code in ReduceLoadWidth (and in DAGCombiner in general):

if (LegalOperations && !TLI.isLoadExtLegal(ExtType, ExtVT))
return SDValue();

LegalOperations is false for the first pre-legalize pass and true for the post-legalize pass. The first pass is target-independent yes? So that makes sense.

The issue we are having is this: we don’t support 8 bit loads and we don’t support 8 bit extloads, so we end up with LD1 with zext after either the first pass or the second pass (depending on the test case). If we add the TargetLowering callback method setLoadExtAction(ISD::ZEXTLOAD, MVT::i8, Expand) then it crashes during legalization and if we don’t have that in then it crashes during instruction selection.

There are two ways to fix this:

  1. Add the setLoadExtAction AND comment out ‘LegalOperations &&’ in the conditional. (this solves the problem)

  2. Create a custom expand to undo the optimization added by ReduceLoadWidth.

The 2nd approach seems more in line with what LLVM infrastructure wants but it seems silly to have to undo an optimization?

Essentially, we have some bit packing structures and the code is trying to get the upper bits. The initial dag generates an LD2 with srl (which makes sense, it’s what we want). The DAGCombiner then goes in and changes that LD2 with srl to an LD1 zextload, which we don’t support.

Why is LegalOperations really needed here? What is the purpose and point of this? It seems you could eliminate this and be all the better for it.

Thanks.

I'm curious about this code in ReduceLoadWidth (and in DAGCombiner in
general):

if (LegalOperations && !TLI.isLoadExtLegal(ExtType, ExtVT))
   return SDValue();

LegalOperations is false for the first pre-legalize pass and true for the
post-legalize pass. The first pass is target-independent yes? So that makes
sense.

The issue we are having is this: we don't support 8 bit loads and we don't
support 8 bit extloads, so we end up with LD1 with zext after either the
first pass or the second pass (depending on the test case). If we add the
TargetLowering callback method setLoadExtAction(ISD::ZEXTLOAD, MVT::i8,
Expand) then it crashes during legalization

This part is surprising. What happens? This seems to me like the
correct solution.

I'm guessing it's because there's no good way to legalize 8 bit loads?
I have no idea what happens when those are illegal, as I'd expect
them to be always available?

Here's a cheap hack though: I see that ReduceLoadWidth is predicated on the
TLI "shouldReduceLoadWidth" hook. Did you try overriding that to
avoid creating 8 bit loads?

and if we don't have that in
then it crashes during instruction selection.

There are two ways to fix this:

1) Add the setLoadExtAction AND comment out 'LegalOperations &&' in the
conditional. (this solves the problem)

2) Create a custom expand to undo the optimization added by ReduceLoadWidth.

The 2nd approach seems more in line with what LLVM infrastructure wants but
it seems silly to have to undo an optimization?

Essentially, we have some bit packing structures and the code is trying to
get the upper bits. The initial dag generates an LD2 with srl (which makes
sense, it's what we want). The DAGCombiner then goes in and changes that LD2
with srl to an LD1 zextload, which we don't support.

Why is LegalOperations really needed here? What is the purpose and point of
this? It seems you could eliminate this and be all the better for it.

FWIW I somewhat agree, and believe this is a common "problem": we
eagerly generate obviously-illegal nodes, because we're before
legalisation, so it's OK, right? Except, it's sometimes hard to
recover from. Most of the time, it's a good thing, precisely because
it catches patterns that would be disturbed by legalization.

Did you try running the integration tests - at least X86 - after
changing the condition?

-Ahmed

  1. It’s crashing because LD1 is produced due to LegalOperations=false in pre-legalize pass. Then Legalization does not know how to handle it so it asserts on a default case. I don’t know if it’s a reasonable expectation or not but we do not have support for it. I have not tried overriding shouldReduceLoadWidth.

  2. I see, that makes sense to some degree, I’m curious if you can provide an example? It doesn’t seem good to generate something pre-legalize (target-independent) that you can’t then handle when you find that it’s illegal in the very next step that is legalization. I’m guessing, from what you and others have said, that it’s just expected that every machine will support 8-bit extloads and if not then the target should handle the undoing of the target-independent generated ‘optimization’ via LegalizeDAG and TargetLowering callback methods?

  3. I have not tried running it on any other target, sorry, I suppose I should but I didn’t want to take the time if I’m not following the right path here.

Thanks.

We’d love to hear what the community thinks would be the ‘best’ solution. We are trying not to change any ‘core’ code.

Ahmed,

It looks like shouldReduceLoadWidth is the best solution, since we can create override and just return false if the new size is 8, this should avoid generating LD1 in this case. I’m also seeing LD1 being generated by other opt functions in DAGCombiner, so I’m not sure this is a full solution for us, unless they are also generating LD1 via ReduceLoadWidth, I’m not sure yet.

Though we are not yet using the version of LLVM that has the shouldReduceLoadWidth virtual function.

These LD1 extloads are generated independent of ReduceLoadWidth (ie by other DAGCombiner functions) so shouldReduceLoadWidth alone does not solve this issue it seems?

+Chandler, Hal, Owen, who - among others - know much more than I do.

1) It's crashing because LD1 is produced due to LegalOperations=false in
pre-legalize pass. Then Legalization does not know how to handle it so it
asserts on a default case.

Yes, and where, and on what, does the assert fire? 8-bit load
legalization I assume?

I don't know if it's a reasonable expectation or
not but we do not have support for it. I have not tried overriding
shouldReduceLoadWidth.

2) I see, that makes sense to some degree, I'm curious if you can provide an
example? It doesn't seem good to generate something pre-legalize
(target-independent) that you can't then handle when you find that it's
illegal in the very next step that is legalization.

I believe the general idea is one of separation of concerns: the way
I see it, the DAGCombiner's job is to canonicalize so that other
(perhaps target-specific) transformations only need to handle the
canonical representation. Usually, that means optimizing code: stuff
like e.g. (sub x, x) shouldn't be special-cased in each target (at
least not on ISD nodes ;))
The legalizer (or rather, the various legalizers) later does its own
job, legalizing, after which all code must be "legal". The further
rounds of DAGCombining just maintain that invariant.

So really, this is a legalization problem: for instance, is it not
possible for the SelectionDAGBuilder to generate 8-bit loads as well?
In this specific case, I'd rather we avoid the problem altogether, and
would be fine with not doing this kind of transform if the resulting
loads is illegal (and one might argue for the same in all the other
"!LegalOperations" combines).

But the actual solution seems to be with the load legalization: I
wouldn't be surprised if many other spots assumed 8-bit loads are
legal (as you noticed after disabling ReduceLoadWidth?), and this is
your real problem.

Take this with a grain of salt though, I only ever look at
bog-standard X86-ish targets. The cc'd people might have a better
argument.

-Ahmed

Ahmed,

Yes, this is the case, I’m sure many other ‘spots’ in DAGCombiner use this same check or use a similar check with LegalOperations. It just seems like bad form to have core code that generates an illegal node that legalization cannot seem to handle, unless I’m missing something, which is entirely possible. Potentially we are using the wrong LegalAction, though each I’ve tried breaks at different points so I don’t think that’s it.

Yes, it is breaking during the legalize phase, depending on which TargetLowering callback method we use. For example, Custom will let it through to instructions selection, which it breaks at the that phase, otherwise I believe it breaks during legalization. If we use Expand instead, the assert during Legalize is: “EXTLOAD should always be supported”. I don’t really understand that message :slight_smile:

The pre-legalization canonicalization makes sense to me but either 1) legalize should be able to handle everything or 2) nodes that aren’t going to be legal anyways should not be produced (though this obviously contradicts the term pre-legalize).

The SelectionDAGBuilder does not generate 8 bit loads nor 8 bit exloads.

There are other places we could fix this issue, accepting 8 bit extloads and then expanding it back further down the pipe, but this does seem a bit hacky.

Thanks.

Ahmed,

Yes, this is the case, I'm sure many other 'spots' in DAGCombiner use this
same check or use a similar check with LegalOperations. It just seems like
bad form to have core code that generates an illegal node that legalization
cannot seem to handle, unless I'm missing something, which is entirely
possible. Potentially we are using the wrong LegalAction, though each I've
tried breaks at different points so I don't think that's it.

Yes, it is breaking during the legalize phase, depending on which
TargetLowering callback method we use. For example, Custom will let it
through to instructions selection, which it breaks at the that phase,
otherwise I believe it breaks during legalization. If we use Expand instead,
the assert during Legalize is: "EXTLOAD should always be supported". I don't
really understand that message :slight_smile:

Keep in mind "EXTLOAD" usually means "load, possibly followed by an
extension". So, the "EXT" part is probably irrelevant here, if that's
what's bugging you :wink:

The pre-legalization canonicalization makes sense to me but either 1)
legalize should be able to handle everything or 2) nodes that aren't going
to be legal anyways should not be produced (though this obviously
contradicts the term pre-legalize).

  The SelectionDAGBuilder does not generate 8 bit loads nor 8 bit exloads.

There are other places we could fix this issue, accepting 8 bit extloads
and then expanding it back further down the pipe, but this does seem a bit
hacky.

Is there even a general way to do this? Say a 2-byte load, from an
address right before a page boundary, (or the end of your address
space, if pages don't make sense on your target), was transformed into
a 1-byte load, from the last address. You can't just turn it into a
2-byte load, you need to also adjust the pointer. If you want to just
legalize a 1-byte loads into 2-byte, you need to know whether to do it
"up" or "down", no? If you just have a 1-byte load with no
information on the address, you can't do that? Or am I missing
something?

For a concrete example, consider:

    (and (load i16* %p), 0xFF00)

turning into:

    (load i8* (%p + 1))

How do you legalize that? The obvious thing would be:

    (and (load i16* (%p + 1)), 0xFF)

But if %p is dynamically (~(uintptr_t)0), that's a bad idea.

So, in some cases you'd need to turn it back into

    (and (load i16* ((%p + 1) - 1)), 0xFF00)

But by the time you're legalizing, you lost the "some cases" information.

That's what the assert is saying, and the core problem with the
legalization, no? What does it even mean to legalize a 1-byte load?

-Ahmed

Nevermind, grepping around shows this is specifically about
ISD::EXTLOAD, in LegalizeLoadOps (LegalizeDAG.cpp).

There's some code above, with an "isTypeLegal(SrcVT)" check, that
tries to turn an EXTLOAD into LOAD+[SZ]EXT. I'm guessing that on your
target, both the EXTLOAD from i8 and the i8 type are illegal?

In that case, again, I don't know how one could legalize this.

-Ahmed

Ahmed,

Yes, we do not have an 8 bit type and do not support 8 bit loads/extloads.

For your first post, I imagine that anything that the DAGCombiner does it could undo EXCEPT deciding to opt to a type that is not allowed, but that’s the design choice made by LLVM: to allow illegal operations AND types during the pre-legalize phase, yes? So this is either a bug (design flaw) or maybe someone more knowledgeable about a way to get around this without having to change core code that cannot be put into off. release will chime in. :slight_smile:

Yes, via Expand the assert is issued during LegalizeLoadOps (since it’s trying to legalize the 8 bit extload because the hardware does not support it).

The EXT part is not bugging me, I suppose it’s possible that the DAGCombiner could optimize to just an 8-bit load, but we’ve only seen this opt if it’s going to an extload, this is why I keep mentioning it. 8 bit loads (NON_EXTLOAD) is also illegal on our machine. We do not support any type of load that is 8 bit. Does LLVM also feel that 8 bit NON_EXTLOAD should always be supported also? For example, the Legalize can expand an EXTLOAD to NON_EXTLOAD and EXT but that would not help either, since we don’t support 8 bit loads.

I would just like to know the cleanest possible solution to this issue within LLVM infrastructure, assuming it’s possible with the current ‘core’ code?

Thanks.

Ahmed,

Yes, we do not have an 8 bit type and do not support 8 bit loads/extloads.

  For your first post, I imagine that anything that the DAGCombiner does it
could undo EXCEPT deciding to opt to a type that is not allowed,

No, I think the SelectionDAG legalization should be able to "undo" any
illegal type as well. For loads/stores, this usually means making the
memory size bigger. For instance, an i4 store would be legalized to
use i8, and that's what getStoreSize is for. LLVM assumes 8bit
byte-addressing though, so that's fine. Again, you can't really
change the number of bytes addressed though, so you can't do much
here.

but that's
the design choice made by LLVM: to allow illegal operations AND types during
the pre-legalize phase, yes

In general, yes.

So this is either a bug (design flaw) or maybe
someone more knowledgeable about a way to get around this without having to
change core code that cannot be put into off. release will chime in. :slight_smile:

Yes, via Expand the assert is issued during LegalizeLoadOps (since it's
trying to legalize the 8 bit extload because the hardware does not support
it).

The EXT part is not bugging me, I suppose it's possible that the
DAGCombiner could optimize to just an 8-bit load, but we've only seen this
opt if it's going to an extload, this is why I keep mentioning it. 8 bit
loads (NON_EXTLOAD) is also illegal on our machine. We do not support any
type of load that is 8 bit. Does LLVM also feel that 8 bit NON_EXTLOAD
should always be supported also? For example, the Legalize can expand an
EXTLOAD to NON_EXTLOAD and EXT but that would not help either, since we
don't support 8 bit loads.

I would just like to know the cleanest possible solution to this issue
within LLVM infrastructure, assuming it's possible with the current 'core'
code?

Well, I can come up with a bunch of hacks to avoid messing with upstream code =)

If you have an answer to the "how do you legalize 8-bit loads?"
problem I mentioned (perhaps MMOs provide enough information), then
the easiest way would be to just do so in your target. Perhaps a
target-specific DAGCombine would work, or marking loads as "Custom".

Another solution would be to immediately DAGCombine (again,
target-specific code, which IIRC run before target-independent
DAGCombines) loads into target-specific loads. This means you'd
sacrifice some of the target-independent combines on ISD::LOAD, but
you can avoid that by turning your target-specific XXXISD::LOAD nodes
into target-independent counterparts in another DAGCombine, right
after legalization.

You can also play whack-a-mole with such DAGCombines (there isn't that
much of them on LOAD/STOREs, other operations should be easy to
promote/expand) and submit patches; I don't think we can justify
keeping them as is if we clearly don't support the result.

-Ahmed

The load width reduction also happens in TargetLowering::SimplifySetCC, but for some reason is not guarded by the hook.

We had a similar problem in the Hexagon backend (but for a different reason) and we had to disable the shortening of loads in both places.

-Krzysztof

Thanks for the reply:

So should LLVM continue to assume 8-bit byte addressing? It would be nice, not only to us but potential future machines, to have a permanent fix to this assumption? This sounds reasonable yes?

Marking them as Custom in XXXISelLowering still produces error, the pre-legalize phase is still going to opt to LD1 since it’s not caring about target-specifics. Then, again, we’ll be stuck trying to undo the addressing.

I see what you mean about the DAGCombiner, that’s a two part solution, first turning them into target-specific before any opts in pre-legalize and then back before any opts in post-legalize. Certainly a potential solution, though we may lose some optimization opportunities, even on valid loads, as you mention.

The ultimate issue here, for us, is that LLVM makes the assumption that every machine is going to support 8-bit byte-addressing. I’m not sure this is a solid or even reasonable assumption going forward?

Possibly, we could just pseudo support it and expand it later, but the addressing would still be an issue, I believe.

Thanks.

Thanks for the reply:

So should LLVM continue to assume 8-bit byte addressing? It would be nice,
not only to us but potential future machines, to have a permanent fix to
this assumption? This sounds reasonable yes?

Marking them as Custom in XXXISelLowering still produces error, the
pre-legalize phase is still going to opt to LD1 since it's not caring about
target-specifics. Then, again, we'll be stuck trying to undo the addressing.

I see what you mean about the DAGCombiner, that's a two part solution, first
turning them into target-specific before any opts in pre-legalize and then
back before any opts in post-legalize. Certainly a potential solution,
though we may lose some optimization opportunities, even on valid loads, as
you mention.

The ultimate issue here, for us, is that LLVM makes the assumption that
every machine is going to support 8-bit byte-addressing. I'm not sure this
is a solid or even reasonable assumption going forward?

Short version: Patches welcome :wink: Long version: in general, if
something is not supported yet, it means no one needed it, so no one
put the work into it. If you're willing to do so, I'd approve of
fixing the various parts that assume this (if it's only DAGCombines
that's fine). Then there's the problem of how to test this in-tree
when no in-tree target has these problems (Krzysztof mentions Hexagon,
do you both have the same problem?). But that's another conversation!

Possibly, we could just pseudo support it and expand it later, but the
addressing would still be an issue, I believe.

You might want to look into MachineMemOperands, which tracks the
memory object a load/store points into. It could help with the
expansion.

-Ahmed

Ahmed,

Ok, thanks. I’m not sure at this point the extend of LLVM’s dependence on the requirement of 8-bit byte-addressing and how extensive that patch would have to be.

If we made the changes, we would keep it local to us until at which time we could add our target, I would presume anyways, though that’s not exactly my call, though even so I would think the more important issue for testing in-tree would be to make sure it didn’t break any existing tests on existing targets but yes, that too. :slight_smile:

Thanks again.

Hi Ryan,

We maintain internally a number of patches (towards trunk) to support 16-bit bytes in LLVM (not Clang). I can provide them upon request. Disclaimer: They work for us. Some are good, others ugly or incomplete.

/Patrik Hägglund