GlobalISel legalization artifact legalization

Hi,

I’m trying to handle some vector operations with splitting/scalarization and keep running into similar sorts of issues which are making me question the intended function of the various legalization operations (particularly G_MERGE_VALUES/G_UNMERGE_VALUES, but also G_EXTRACT/G_INSERT and conversion instructions) and what the contract between the legalizer and selector actually is.

For scalar values, things seem clearer, but I’m still confused. The AArch64 selector code seems content to allow strange sized values in the source of G_EXTRACT/G_SEXT etc, and then just set the register class as the type information isn’t really needed anymore. However, it also defines a more restricted set of legal types, but the legalizations are not implemented. Similarly quite a lot of legalization rules defined for merge/unmerge (an ~88 line block), but again none of these legalizations seem to actually be implemented. For example, it specifies that vector types with < 8-bit elements should be scalarized.

For vectors it’s less clear to me what to do. For example, I’ve looked at implementing widenScalar for G_UNMERGE_VALUES. This in turn ends up defined as an extend of the vector type, which in turn will be implemented with another G_UNMERGE_VALUES which needs to be split and it’s a legalization loop producing an infinite number of instructions. It’s also not really clear to me what it means to scalarize a G_UNMERGE_VALUE, other than to rewrite the def instruction of the source to produce scalar values. Rewriting as a series of extract_vector_elt on the source just runs into the same issue that nothing actually is dealing with the problematic vector type. Is this something the legalizer should really be doing? For AMDGPU essentially any vector type needs to be eliminated (especially for something like an s1 vector) but it’s not clear to me how this should really happen if legalization just keeps moving the vector type to a different unmerge source.

My questions are :

  1. What is AArch64 doing with all of its legalization rules when they don’t seem to actually be implemented?
  2. Is the target selector supposed to be responsible for handling G_UNMERGE_VALUES/G_MERGE_VALUES/G_EXTRACT/G_INSERT/G_SEXT/G_ZEXT/G_ANYEXT for any strange type source, assuming the extra high bits over the LLT type are undefined or appropriately extended?
  3. For vectors, is any arbitrary vector only supposed to be legalized to a certain number of elements, and the target is supposed to treat this as N registers with possibly undefined high bits in NumRegBits - TypeScalarSize?
  4. Should the legalizer for some of these operations really be triggering legalization of the def instruction of the input instead?

-Matt

+Daniel & Justin

Hi Matt,

Hi,

I’m trying to handle some vector operations with splitting/scalarization and keep running into similar sorts of issues which are making me question the intended function of the various legalization operations (particularly G_MERGE_VALUES/G_UNMERGE_VALUES, but also G_EXTRACT/G_INSERT and conversion instructions) and what the contract between the legalizer and selector actually is.

For scalar values, things seem clearer, but I’m still confused. The AArch64 selector code seems content to allow strange sized values in the source of G_EXTRACT/G_SEXT etc, and then just set the register class as the type information isn’t really needed anymore. However, it also defines a more restricted set of legal types, but the legalizations are not implemented. Similarly quite a lot of legalization rules defined for merge/unmerge (an ~88 line block), but again none of these legalizations seem to actually be implemented. For example, it specifies that vector types with < 8-bit elements should be scalarized.

You’re right, at the moment there is a mismatch between what the legalizer says is legal, and what we actually support in terms of instruction selection. While it would be nice if the AArch64 implementation was a gold standard reference for other targets to use as a practical reference, we aren’t there yet. For the G_MERGE/UNMERGE specifically, some of those rules are vestigial remnants of a more powerful merge/unmerge op that I intend to clean up.

For vectors it’s less clear to me what to do. For example, I’ve looked at implementing widenScalar for G_UNMERGE_VALUES. This in turn ends up defined as an extend of the vector type, which in turn will be implemented with another G_UNMERGE_VALUES which needs to be split and it’s a legalization loop producing an infinite number of instructions.

I’m not sure widenScalar of a vector G_UNMERGE_VALUES really makes sense. The source operand is a vector type after all. It seems you’re looking to scalarize this operation so I think fewerElementsVector is the more appropriate choice.

It’s also not really clear to me what it means to scalarize a G_UNMERGE_VALUE, other than to rewrite the def instruction of the source to produce scalar values. Rewriting as a series of extract_vector_elt on the source just runs into the same issue that nothing actually is dealing with the problematic vector type. Is this something the legalizer should really be doing? For AMDGPU essentially any vector type needs to be eliminated (especially for something like an s1 vector) but it’s not clear to me how this should really happen if legalization just keeps moving the vector type to a different unmerge source.

In theory, once the legalizer has finished, every generic instruction entering the isel phase should be individually selectable. G_UNMERGE etc are mostly legalizer artifacts, and we expect them to be cleaned up most of the time by the legalizer artifact combiner, however if they don’t then the target must still be able to select it independently. We document a second meaning of “legal” on this (now outdated) page: https://llvm.org/docs/GlobalISel.html#legalizer
Specifically: "a legal instruction is defined as selectable & operating on vregs that can be loaded and stored – if necessary, the target can select a G_LOAD/G_STORE of each gvreg operand.
So the expectation is that in the worst case, the target should use loads and stores to implement those operations.

My questions are :

  1. What is AArch64 doing with all of its legalization rules when they don’t seem to actually be implemented?
  2. Is the target selector supposed to be responsible for handling G_UNMERGE_VALUES/G_MERGE_VALUES/G_EXTRACT/G_INSERT/G_SEXT/G_ZEXT/G_ANYEXT for any strange type source, assuming the extra high bits over the LLT type are undefined or appropriately extended?

To reiterate the earlier point, there is a contract between the legalizer and selector that legal instructions should be selectable in some way, but that’s not currently done in the AArch64 backend (we will be addressing that this year, among other things)

  1. For vectors, is any arbitrary vector only supposed to be legalized to a certain number of elements, and the target is supposed to treat this as N registers with possibly undefined high bits in NumRegBits - TypeScalarSize?

I don’t think there should be cases where scalarization results in undefined bits, maybe an example would help?

  1. Should the legalizer for some of these operations really be triggering legalization of the def instruction of the input instead?

In general I don’t think that’s reliable, as we can’t always guarantee to be able to see through copies or other operations that obscure the def.

Hope that begins to clear up the issues.

Thanks,
Amara

+Daniel & Justin

Hi Matt,

Hi,

I’m trying to handle some vector operations with splitting/scalarization and keep running into similar sorts of issues which are making me question the intended function of the various legalization operations (particularly G_MERGE_VALUES/G_UNMERGE_VALUES, but also G_EXTRACT/G_INSERT and conversion instructions) and what the contract between the legalizer and selector actually is.

For scalar values, things seem clearer, but I’m still confused. The AArch64 selector code seems content to allow strange sized values in the source of G_EXTRACT/G_SEXT etc, and then just set the register class as the type information isn’t really needed anymore. However, it also defines a more restricted set of legal types, but the legalizations are not implemented. Similarly quite a lot of legalization rules defined for merge/unmerge (an ~88 line block), but again none of these legalizations seem to actually be implemented. For example, it specifies that vector types with < 8-bit elements should be scalarized.

You’re right, at the moment there is a mismatch between what the legalizer says is legal, and what we actually support in terms of instruction selection. While it would be nice if the AArch64 implementation was a gold standard reference for other targets to use as a practical reference, we aren’t there yet. For the G_MERGE/UNMERGE specifically, some of those rules are vestigial remnants of a more powerful merge/unmerge op that I intend to clean up.

For vectors it’s less clear to me what to do. For example, I’ve looked at implementing widenScalar for G_UNMERGE_VALUES. This in turn ends up defined as an extend of the vector type, which in turn will be implemented with another G_UNMERGE_VALUES which needs to be split and it’s a legalization loop producing an infinite number of instructions.

I’m not sure widenScalar of a vector G_UNMERGE_VALUES really makes sense. The source operand is a vector type after all. It seems you’re looking to scalarize this operation so I think fewerElementsVector is the more appropriate choice.

Scalarizing was what I originally intended, but I just hit the same problem in a different way. I don’t see what the end result is supposed to be without somehow modifying the source value, or doing something overly complicated. I could replace it with a build_vector, which in turn will just be implemented in terms of other unmerges. I suppose I could bitcast to an equivalent scalar integer, and then use a series of G_EXTRACT which I think the combiner would have a difficult time figuring out

Basically I don’t see what the expected legalization end result is for scalarization given that g_unmerge_values is supposed to be used for vector splitting

It’s also not really clear to me what it means to scalarize a G_UNMERGE_VALUE, other than to rewrite the def instruction of the source to produce scalar values. Rewriting as a series of extract_vector_elt on the source just runs into the same issue that nothing actually is dealing with the problematic vector type. Is this something the legalizer should really be doing? For AMDGPU essentially any vector type needs to be eliminated (especially for something like an s1 vector) but it’s not clear to me how this should really happen if legalization just keeps moving the vector type to a different unmerge source.

In theory, once the legalizer has finished, every generic instruction entering the isel phase should be individually selectable. G_UNMERGE etc are mostly legalizer artifacts, and we expect them to be cleaned up most of the time by the legalizer artifact combiner, however if they don’t then the target must still be able to select it independently. We document a second meaning of “legal” on this (now outdated) page: https://llvm.org/docs/GlobalISel.html#legalizer
Specifically: "a legal instruction is defined as selectable & operating on vregs that can be loaded and stored – if necessary, the target can select a G_LOAD/G_STORE of each gvreg operand.
So the expectation is that in the worst case, the target should use loads and stores to implement those operations.

My questions are :

  1. What is AArch64 doing with all of its legalization rules when they don’t seem to actually be implemented?
  2. Is the target selector supposed to be responsible for handling G_UNMERGE_VALUES/G_MERGE_VALUES/G_EXTRACT/G_INSERT/G_SEXT/G_ZEXT/G_ANYEXT for any strange type source, assuming the extra high bits over the LLT type are undefined or appropriately extended?

To reiterate the earlier point, there is a contract between the legalizer and selector that legal instructions should be selectable in some way, but that’s not currently done in the AArch64 backend (we will be addressing that this year, among other things)

  1. For vectors, is any arbitrary vector only supposed to be legalized to a certain number of elements, and the target is supposed to treat this as N registers with possibly undefined high bits in NumRegBits - TypeScalarSize?

I don’t think there should be cases where scalarization results in undefined bits, maybe an example would help?

If I wanted G_UNMERGE_VALUES <2 x s1> to be legal on AMDGPU, I could implement this as sub register extracts from 2 32-bit registers, and treat the high bits as undefined. Otherwise something needs to be done during legalize to make these 32-bit element vectors