Mlir-s390x-linux failure

I have found mlir-s390x-linux buildbot job keep failing since 8 days ago.

Apparently, this change did not get the response as expected.

Looks like it’s a format issue of constant type. But I’m not sure why this happens only in the System 390 platform. Does someone have any idea?

# | <<<<<<
# |              .
# |              .
# |              .
# |             29: @dense_i64_vector = internal global <3 x i64> <i64 1, i64 2, i64 3> 
# |             30: @splat_i64_vector = internal global <3 x i64> <i64 42, i64 42, i64 42> 
# |             31: @dense_float_vector_2d = internal global [2 x <2 x float>] [<2 x float> <float 1.000000e+00, float 2.000000e+00>, <2 x float> <float 3.000000e+00, float 4.000000e+00>] 
# |             32: @splat_float_vector_2d = internal global [2 x <2 x float>] [<2 x float> <float 4.200000e+01, float 4.200000e+01>, <2 x float> <float 4.200000e+01, float 4.200000e+01>] 
# |             33: @dense_float_vector_3d = internal global [2 x [2 x <2 x float>]] [[2 x <2 x float>] [<2 x float> <float 1.000000e+00, float 2.000000e+00>, <2 x float> <float 3.000000e+00, float 4.000000e+00>], [2 x <2 x float>] [<2 x float> <float 5.000000e+00, float 6.000000e+00>, <2 x float> <float 7.000000e+00, float 8.000000e+00>]] 
# |             34: @splat_float_vector_3d = internal global [2 x [2 x <2 x float>]] [[2 x <2 x float>] [<2 x float> <float 4.200000e+01, float 4.200000e+01>, <2 x float> <float 4.200000e+01, float 4.200000e+01>], [2 x <2 x float>] [<2 x float> <float 4.200000e+01, float 4.200000e+01>, <2 x float> <float 4.200000e+01, float 4.200000e+01>]] 
# | check:104'0                                                                                                                                                                                                                                                                                                                                      X error: no match found
# |             35: @dense_resource_tensor_constant = internal constant [5 x float] [float 0x38834A07C0000000, float 0x3B067057C0000000, float 0xC79DD757C0000000, float 0x47475E17C0000000, float 0x39D3BDC7C0000000] 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | check:104'1     ?                                                                                                                                                                                                   possible intended match
# |             36: @dense_resource_vector_constant = internal constant <5 x float> <float 0x38834A07C0000000, float 0x3B067057C0000000, float 0xC79DD757C0000000, float 0x47475E17C0000000, float 0x39D3BDC7C0000000> 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             37: @dense_resource_multidim_tensor_constant = internal constant [1 x [2 x [2 x float]]] [[2 x [2 x float]] [[2 x float] [float 0x429476A7C0000000, float 0xC2D81667C0000000], [2 x float] [float 0x42BA3457A0000000, float 0xC4BA5767C0000000]]] 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             38: @dense_resource_multidim_vector_constant = internal constant [1 x [2 x <2 x float>]] [[2 x <2 x float>] [<2 x float> <float 0x429476A7C0000000, float 0xC2D81667C0000000>, <2 x float> <float 0x42BA3457A0000000, float 0xC4BA5767C0000000>]] 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             39: @private = private global i32 42 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             40: @internal = internal global i32 42 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |              .
# |              .
# |              .
# | >>>>>>

I thought that with the initial change there folks were going to exclude s390 (only different endian system there).

2 Likes

Test needs to be excluded. The test is hard coding a LE binary and validating it decoded on a BE system. That attribute is always host but order, so the test is invalid on BE.

(Most devs don’t have access to a BE system so this tends to become a defacto low priority, since people are reduced to guessing if a fix will be ok)

We should not be excluding tests on BE hosts, unless they actually involve executing code in the test.

If a test is using some sort of “default” target, we should instead fix the target to something specific. Otherwise, the failure likely indicates a real bug in the code.

1 Like

We have transformations that are loading a large constant array blob (mmap style) and interpret the data. The problem is that the test encodes the data as a blob that is endianness dependent.

I have been looking at LLVM bitcode and didn’t find an abstraction for constant array serialization, how does LLVM deal with this?

And also an open question if we want to: this is IMHO meant for where producer and consumer agree on a layout and so interpretation. The serialization and interpretation is intended to be cheap (raw dump out and mmap in). Options to address it adds conversions everywhere (or command line flags to parsing). And I recall we have the pending one bytecode side too for this.

Another option (re test coverage) is for simulate the corresponding producer: so write all in LE, have a tool that converts if one such a machine and then test runs on both.

Assuming we agree that this is a valid design for what upstream should provide in the bytecode, that does not mean it has to be done without safety (the tooling should be able to be setup to encode the assumptions, detect the inconsistencies, and diagnose).

There has to be an attribute that works like Jacques says, and this one seems to be it. We can add an additional endianness indicator to it (native, big, little) but what is done with that has to be up to the consumer. I propose we document that.

This doesn’t preclude having another attribute that does conversions but I’m not aware of a use case for it that isn’t better handled by some higher level thing.

We can split these specific tests that bake a le byte string into a lit tests so that they run only in an appropriate context…

That makes sense yes. I think the proposal was to have it somewhere on the outer module rather than per instance.

The problem is that the test encodes the data as a blob that is endianness dependent

Based on the endianness of what? It’s fine to generate different code for little-endian targets vs. big-endian targets; it’s only a problem when you can’t build for a little-endian target on a big-endian host.

It looks like the test in question only cares about the endianness of the target, so if you specify a target, that should solve the issue.

1 Like

The feature here is storing a constant array of let’s say “float32” as a “blob” in the bytecode so that it memory mapped. This is meant to be used for multi-MB (or GBs) of data which needs to have zero-overhead access.
The compiler internally “bitcast” the char * to a float * and treat this as the constant float32 array.

So it’s not even a matter of “target”, it’s a matter of “the blob has to be the same endianness” as the host running the compiler.

An equivalent in LLVM would be: serializing LLVM bitcode with constant array on a little endian host, and then deserialize this bitcode on a big-endian host: then you’d want the code doing for example constant folding on this data (so the compiler needs to interpret the bytes as float).

The endianness of the data in the blob. I still believe this attribute only makes sense for native endian cases, but was trying to ack that having an indicator of what that was may help tooling make a more informed diagnosis. The problem is that in the limit, you don’t just generate a little bit of low level code to munge something like this: you would either introduce a tool or something at the tip of a pipeline that does the conversion. Such a thing inherently involves large memory movements and introducing a copy vs preserving the “it may be mapped memory” characteristic. By the time you get to llvm code generation, the best we can do is error/warn if the native byte order differs from what the attribute claims: fixing that is a job for the higher level system far in advance.

(Edit: if we converge on this, I think we can add an endianness indicator and we should definitely document what this is and the hazards of using it for something else)

If I’m understanding correctly, the primary purpose of the data is that it’s supposed to be passed through to the target.

There are basically two possible approaches here. One is something like LLVM’s ConstantDataSequential. In memory, this is native endianness; we byteswap when we read or write the data.

The other approach is along the lines of the way we parse ELF object files. Basically every field in an ELF object is native endianness, and we do zero-copy parsing of ELF. The way we handle this in LLVM is that we have two ELF parsers: a little-endian ELF parser, and a big-endian ELF parser, and we choose the appropriate parser depending on the file. We use templates from Endian.h to abstract this out: each field is marked with its endianness, and the necessary byte-swapping happens when the field is accessed.

In the context of MLIR, I expect it makes sense to do something similar to what we do for ELF: a given MLIR file only makes sense for one target, so it makes sense to store data in the endianness of that target. If the compiler needs to access the data, it can translate as needed.

(If we need to modify LLVM to make MLIR->LLVM translation more efficient in this context, it’s something we can look into.)

1 Like

Thanks - It is always tricky talking about “MLIR” because it is not one thing but many, but in this case, I think the analogy holds in principle in that artifacts that include these attributes have to match in terms of byte order between producer and consumer. Like an ELF file, they are optimized for the case where producer and consumer can directly “splat structs” and portable systems can have additional tooling to interop with them in either arrangement.

As an example, if a program is extracted from a live PyTorch session and emitted into an MLIR module, its DenseResourceElementsAttr blobs will be backed by pointers into parameter memory of the hosting process. We may not even serialize it prior to compiling but can and often do for various workflows. But there are limits about the expectation of portability of such an artifact. Since these attributes are often used to capture extra-large data, there is already an entire ecosystem of tools for data munging all of this: since these systems are often running at the limit of what can be held in memory (or even on disk), conversions/copies have to be done explicitly at system boundaries where you know you have the resources to do so, and the raw data is typically splatted directly into accelerator memory for operation.

So I’ll maintain that DenseResourceElementsAttr is a tool for reinterpret casting raw blobs as arrays (currently of primitives but can be more compound types soon) – with all of the limitations and cautions that entails. We use it to build compilers and tools that are more portable than that but a complete implementation of that is their job, not the job of the attribute.

I think this issue is possibly three things:

  • A single blob based test of this can’t apply to LE and BE simultaneously. Can be split/ignored, etc on the other platform.
  • It would be a nice feature addition to add annotations to the attribute (or at the blob level – which we may already have since the “alignment” field could do double duty as a BOM) to describe its byte order providence when known. This can be used for diagnostics, and higher level tools could use it in their conversion flows.
  • All of this needs to be documented and people cautioned on the limits. We already have an attribute DenseElementsAttr which “plays nicely” from a portability standpoint. We use DenseResourceElementsAttr when it is important to sacrifice portability for direct access to the backing blob.

The tricky part here is that “MLIR” is not “the compiler”. It houses the pieces of many compilers, and good ones should have an approach for dealing with mismatches that arise from the use of such an attribute.

Looking at the code, I suspect the simplest thing to do here is actually to split the attribute into two different attributes: a big-endian attribute, and a little-endian attribute. So, for example, a DenseI16ArrayBEAttr contains big-endian data, a DenseI16ArrayLEAttr contains little-endian data. When we construct the attribute, the code constructing it chooses whichever one is appropriate. Serialization/deserialization just read/write the raw data. And for C++ code accessing the data, we can add friendly wrappers around the raw data where it’s appropriate.

This is similar to the way the ELF header handling I mentioned before works: we have different classes depending on how the data is represented.

1 Like

The test is question is not even a binary format; it’s text. It doesn’t make sense to say it can only be parsed on little-endian hosts.

I think you meant to be talking about the DenseResourceElementsAttr? That is different from the *ArrayAttrs.

Not sure I agree from the perspective of what the DenseResourceElementsAttr was designed to do. The simplest thing is to have a test that runs on LE on a binary blob that is LE encoded and a test that runs on BE for a blob that is BE encoded.

If we’re talking about “splitting attributes”, that can also be accomplished by adding a field to the attribute to discriminate the endianness of the blob, no? (or carrying this at the blob level)

It’s encoded in ASM, yes, but that is how we test things. In this case, we can have a blob of data attached to the ASM by hex encoding it for text.

Yes, sorry.

Well, I guess, but that’s really confusing.

If we do go this way, probably we want to at least change the spelling of the attribute in the text serialization to indicate the endianness.

That’s the same thing in some sense? Subclassing allows adding endian-specific methods, which would probably be helpful.

From the perspective of the use case, I think it is the least confusing thing: this attribute exists to provide bitcast access to a blob of data. If you want to run the test successfully on BE, you need a test that includes a blob suitable for BE. Maybe there’s another way to write the test in question…

Yes, we can change the text format based on attribute fields.

It just has implications on multiple layers of APIs to split classes, etc. I’d prefer a lightweight mechanism if doing it vs something highly structural.

Thanks! I was expecting this to be in the bitcode reader/writer instead of in the accessor. But actually I’m not sure where to find this.
Seems like contradictory things I’m reading in the code like the comment on this API:

But also:

I find little reference to “endian” in Constants.h or Constants.cpp, I’m missing something, maybe you can point where the endianness is abstracted for these?

It’s not as simple I believe: it can be parsed fine, the problem is after that how you interpret in-memory data.
Our textual format or bytecode should have the same behavior as much as possible, I wouldn’t expect the textual format to provide more portability than our bytecode.

I am not sure why we can’t provide a better system here. I would even consider that without more ambition, maybe DenseResourceElementsAttr as is does not belong to MLIR in-tree: there is not much magic and folks could handle unsafe deployment in their codebase.
MLIR provided “batteries” should IMO be safer than that, and maybe that means some more boilerplate-oriented API to be used from within the compiler, with raw-access only at the boundaries.

For example in the case at hand: it’s not clear to me why a fully safe system couldn’t be provided when it comes to constant to thread through to LLVM.