[resending so the message is smaller]
I think that teaching the optimizer about big-Endian lane ordering would have been better.
It’s certainly arguable. Even in hindsight I’m glad we didn’t - that’s the approach GCC took and they’ve been fixing subtle bugs in their vectorizer ever since.
Inserting the REV after every LDR
We only do this conceptually. In most cases REVs cancel out, and we have the LD1 instruction which is LDR+REV. With enough peepholes there’s really no need for code to run slower.
Given what’s been done, should we update the LangRef.
Potentially, yes. I hadn’t realised quite how strongly worded it was with respect to this.
James
From: "James Molloy" <james@jamesmolloy.co.uk>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Quentin Colombet" <qcolombet@apple.com>
Sent: Wednesday, January 13, 2016 9:54:26 AM
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection> I think that teaching the optimizer about big-Endian lane ordering
> would have been better.It's certainly arguable. Even in hindsight I'm glad we didn't -
that's the approach GCC took and they've been fixing subtle bugs in
their vectorizer ever since.> Inserting the REV after every LDR
We only do this conceptually. In most cases REVs cancel out, and we
have the LD1 instruction which is LDR+REV. With enough peepholes
there's really no need for code to run slower.> Given what's been done, should we update the LangRef.
Potentially, yes. I hadn't realised quite how strongly worded it was
with respect to this.
Please do
-Hal
From: "James Molloy" <james@jamesmolloy.co.uk>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Quentin Colombet" <qcolombet@apple.com>
Sent: Wednesday, January 13, 2016 9:54:26 AM
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selectionI think that teaching the optimizer about big-Endian lane ordering
would have been better.It's certainly arguable. Even in hindsight I'm glad we didn't -
that's the approach GCC took and they've been fixing subtle bugs in
their vectorizer ever since.Inserting the REV after every LDR
We only do this conceptually. In most cases REVs cancel out, and we
have the LD1 instruction which is LDR+REV. With enough peepholes
there's really no need for code to run slower.Given what's been done, should we update the LangRef.
Potentially, yes. I hadn't realised quite how strongly worded it was
with respect to this.Please do
I'm not sure changing bitcast is the right place. Since the bitcast is representing the in-register value (which doesn't change), maybe we should define it as part of the load/store instead? That's essentially what's going on; we're converting from a canonical register form to a variety of memory forms. (Right?)
(Right?)
Uh no, the register content explicitly does change We insert REV instructions (byteswap) on each bitcast. Bitcasts can be merged and elided etc, but conceptually there’s a register content change on every bitcast.
James
Ok. Then we need to change the LangRef as suggested. Given this is a rather important semantic change, I think you need to send a top level RFC to the list. A couple of points that will need clarified: - Does this only apply to vector types? It definitely doesn’t apply between pointer types today. What about integer, floating point, and FCAs? - Is combining two casts into one a legal operation? I think it is so far, but we need to explicitly state that. - Do we have a predicate for identifying no-op casts that can be freely removed/combined? - Is coercing a load to the type it’s immediately bitcast to legal under this model?
Ok. Then we need to change the LangRef as suggested. Given this is a rather important semantic change, I think you need to send a top level RFC to the list.
FWIW, I don’t think this is a semantic change to LLVM-IR itself. I think it’s more clearing up the misconception that LLVM-IR semantics also apply to SelectionDAG’s operations. That said, I do think it’s important to mention this in LangRef since it’s very easy to make this mistake and very few targets need to worry about the distinction.
To explain why I don’t think this is a semantic change to LLVM-IR, let’s consider this example from earlier:
%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
store <2 x i64> %1, <2 x i64>* %y
In LLVM-IR terms, if the value of %0 is:
%0 = 0x00112233_44556677_8899aabb_ccddeeff
then the value of %1 is:
%1 = 0x0011223344556677_8899aabbccddeeff
which agrees with the store/load and the ‘no bits change’ statements in LangRef.
However, the mapping of these bits to physical register bits is not consistent between types:
Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
Essentially, I’m saying that BitCastInst and ISD::BITCAST have slightly different semantics because of their different domains. The former is working on an abstract representation of the values where both statements in LangRef are true, but the latter is closer to the target where the ‘no bits change’ statement ceases to be true in some cases.
A couple of points that will need clarified:
- Does this only apply to vector types? It definitely doesn’t apply between pointer types today. What about integer, floating point, and FCAs?
I’ve only seen it for vector types so far but in theory it could happen for other types. I’d expect FCAs to encounter it since the physical registers may contain padding that isn’t present in the LLVM-IR representation and the placement and amount of padding will depend on the exact FCA.
I can think of cases where address space casts can encounter the same problem but that’s already been covered in LangRef (“It can be a no-op cast or a complex value modification, depending on the target and the address space pair.”).
Does anyone use FCAs directly? Most targets seem to convert them to same-sized integers or bitcast an FCA* to i8*.
- Is combining two casts into one a legal operation? I think it is so far, but we need to explicitly state that.
Yes, A->B->C and A->C are equivalent.
- Do we have a predicate for identifying no-op casts that can be freely removed/combined?
James mentioned one in CGP but I haven’t been able to find it. I don’t think it’s necessary to have one at the LLVM-IR level but we do need one in the backends. I remember adding one to the backend but I can’t find that either so I think I’m remembering one of my patches from before I split MSA’s registers into type-specific classes.
- Is coercing a load to the type it’s immediately bitcast to legal under this model?
Yes.
Hi,
I’ve given a bit of misinformation here and have caused some confusion. After talking with Tim and Mehdi last night on IRC, I need to correct what I said above to fall more in line with what Daniel is saying. If any of the below contradicts what I’ve said already, please accept my apologies. This version should be right.
The behaviour of the code generator for big-endian NEON and MIPS is derived from the fact that we did not want to change IR semantics at all. A fundamental property that we do not want to break is memory round-tripping:
%1 = load <4 x i32>, %p32
%2 = bitcast <4 x i32> %1 to <2 x i64>
store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
The value of memory before and after the store MUST NOT change (contrary to what I said in an earlier post, I know).
So in fact everything you can do in IR is valid. There are no changes to IR semantics in the slightest. However, when it comes to generating code from the IR, there are new rules:
- Loads and stores are selected to be special loads and stores that do some transform from a canonical form in memory to a type-specific form in register.
- Because bitcasts are load/store pairs in semantic, they must behave as if a store then load was done. Specifically (bitcast TyA to TyB) must transform TyA → canonical form → TyB, as a store then load would. Therefore bitcasts are not no-ops during code generation (but behave as if they are from an IR perspective!).
The reason this works neatly in IR is due to the IR’s type system. In order to change type, a cast must be inserted or a memory round trip. There is no other way. However in SDAG, things break down a bit. SDAG is more weakly typed, and bitconverts are often simply removed. We need that not to happen. Bitconverts are not no-ops.
Daniel’s explanation of physical register mapping was excellent so I’m not going to repeat that.
I apologise for the confusion and misinformation. This is quite a complex topic and takes a bit of mind bending for me to understand, and it was a long time ago.
James
This explanation makes a lot more sense to me. I think it would make sense to document this mental model, but I agree that this interpretation does not seem to require changes to the IR semantics.
Just to check, this implies that DSE is legal right?
Philip
From: "Philip Reames" <listmail@philipreames.com>
To: "James Molloy" <james@jamesmolloy.co.uk>, "Daniel Sanders" <Daniel.Sanders@imgtec.com>, "Hal Finkel"
<hfinkel@anl.gov>
Cc: llvm-dev@lists.llvm.org
Sent: Thursday, January 14, 2016 3:48:37 PM
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selectionThis explanation makes a lot more sense to me. I think it would make
sense to document this mental model, but I agree that this
interpretation does not seem to require changes to the IR semantics.
The semantics, no. But we still may want to update the language reference. It says, "It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2." And, what we've learned, is that this second sentence does not always imply the first (the bits might, in fact, change).
-Hal
Hi,
“It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2.”
I think a simple “as-if” in there should be sufficient;
“It is always a no-op cast because it acts as if no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2.”
What do you think?
James
Hi,
I think we just need to draw attention to the fact that other IR’s may vary.
I’m thinking we should add something like this to the ISD::BITCAST doxygen comment:
This is subtly different from the bitcast instruction from LLVM-IR since this node may change the bits
in the register. For example, this occurs on big-endian NEON and big-endian MSA where the layout
of the bits in the register depends on the vector type and this node acts as a shuffle operation for
some vector type combinations.
And have LangRef say something like:
The conversion is done as if the value had been stored to memory and read back as type ty2. This is equivalent to a no-op cast where no bits change with this conversion.
… caution::
This equivalence does not necessarily apply to other IR’s in LLVM. See ISD::BITCAST for an example.
The ‘… caution::’ should render in the same way as the ‘Rationale’ box in http://llvm.org/docs/LangRef.html#volatile-memory-accesses.
From: "James Molloy" <james@jamesmolloy.co.uk>
To: "Hal Finkel" <hfinkel@anl.gov>, "Philip Reames" <listmail@philipreames.com>
Cc: llvm-dev@lists.llvm.org, "Daniel Sanders" <Daniel.Sanders@imgtec.com>
Sent: Friday, January 15, 2016 2:45:32 AM
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selectionHi,
> "It is always a no-op cast because no bits change with this
> conversion. The conversion is done as if the value had been stored
> to memory and read back as type ty2."I think a simple "as-if" in there should be sufficient;
"It is always a no-op cast because it acts as if no bits change with
this conversion. The conversion is done as if the value had been
stored to memory and read back as type ty2."What do you think?
I think this sounds confusing (and, regardless, we always get to apply 'as if'). I see you're point, however, that any changes in the bits are unobservable at the IR level. Is it true that int -> floating-point -> int bitcasts round-trip cleanly for all possible values on all hardware? I was under the impression that this was not true. I think that the best solution might just be to delete the first sentence.
-Hal
Hi,
I think we just need to draw attention to the fact that other IR’s may vary.
I’m thinking we should add something like this to the ISD::BITCAST doxygen comment:
This is subtly different from the bitcast instruction from LLVM-IR since this node may change the bits
in the register. For example, this occurs on big-endian NEON and big-endian MSA where the layout
of the bits in the register depends on the vector type and this node acts as a shuffle operation for
some vector type combinations.