Reaching definitions on Machine IR post register allocation

Hi,

Just to clarify I am looking for a whole machine function analysis not just something restricted to within a machine basic block.

Thanks.

Regards,

Venu.

Hexagon has RDF that does exactly that. At the moment it's under lib/Target/Hexagon, but it meant to be target-independent. It won't work with X86 due to a known issue related to register units, but it should work fine for other targets. See https://reviews.llvm.org/D29295 about moving it to lib/CodeGen.

-Krzysztof

Hi Krzysztof,

I did look at the other link you have mentioned in your reply but did not quite understand the register units issue. If it is not too difficult, can you briefly summarize what the issue was?

Thanks.

Regards,
Venu.

RDF needs to know when an assignment to a register is overwritten by another assignment, or by a sequence of assignments. This is needed to determine whether the original assignment is still live or not. RDF uses register units to represent the building blocks of registers, and assumes that if all units of a register are overwritten, then the original value of that register is completely overwritten. This assumption is true for all targets (that I have tested it with) except X86. On X86, registers AX and EAX both have only one register unit, but when you assign a value to AX, the upper half of EAX is preserved (that is, the original value of EAX is not completely overwritten).

-Krzysztof

Hi Krzysztof,

Thanks for your explanation - I think I understand the issue now.

I do know if this is a hack, but, in reaching definitions, when we check if a definition of AX kills a previous definition of EAX, can we not check the actual physical register? In this case since EAX and AX are different (although I understand AX is a sub-register of EAX), can we not conclude that the definition of AX does not kill EAX? Since, in reaching definitions, it is conservative if we over-estimate the set of reaching definitions (but not unsafe), should not this strategy be adequate to avoid incorrect optimizations?

Or, is there an optimization that can behave incorrectly with an overestimated set of reaching definitions?

Regards,
Venu.

It would be much easier to add extra reg units to EAX, EBX, etc. to represent the upper half of the register. That would fix the issue the right way (and would actually make sense without the context of RDF).

Regarding the sub-register check---consider this case:

   AX = ...
   ...
   AL = ... // AX partially overwritten
   AH = ... // AX completely overwritten

Each of the assignments to AL/AH does not overwrite AX by itself, but together they do. This would be very difficult to detect if we added such special treatment of sub-registers that you propose.

I could try to make a case for the extra register units to get them added.

-Krzysztof

Hi Krzysztof,

Thanks for your reply.

I agree that adding extra register units for x86 would be the right way to fix this. Do you know if there is a plan to fix this?

The case that you have pointed out involving partial writes to a register together completely killing a wider register is interesting and did not occur to me. But, for reaching definitions, even if you do not detect this case, shouldn't this be safe (although conservative) since you would only end up over-estimating the set of reaching definitions at a point?

But for other scenarios such as copy propagation, you may need to detect this scenario because otherwise you could assume that a copy reaches a use point (when it actually does not) and you may end up propagating the copy when it is incorrect to do so.

Regards,
Venu.

Hi Venu,

Hi Krzysztof,

Thanks for your reply.

I agree that adding extra register units for x86 would be the right way to fix this. Do you know if there is a plan to fix this?

No concrete plan, no. We've been thinking about for quite some time now, but never got at it.

Cheers,
-Quentin

I can definitely look into it.

-Krzysztof

A quick check reveals that adding extra subregisters, or extra aliases changes the register pressure sets. Instead of the current 30, there are 90. This causes differences in register pressure calculation, which in turn causes the scheduler to do slightly different things.

-Krzysztof

Hi Krzysztof,

I am returning to this thread after a gap in time. I took some time to study the code in RDFGraph.cpp in the Hexagon directory.

I have a few questions. I hope you will not find it too much of a bother to answer them.

1) Does use node only support a single reaching def? If the RDF graph is constructed post-RA, the code is no longer in SSA form and is it not possible that multiple reaching defs reach a use node? For example, the same physical register RAX may be defined in the if branch and the else branch and can reach a common use node resulting in multiple reaching defs for a use node. How is this handled?
2) I am unable to locate the code that handles re-definition of the same register. I though "clobber" refers to such situations but the isClobbering() function which controls the setting of the "Clobbering" attribute seems to mainly handle function calls and not the re-definitions of registers through other types of instructions. How are such re-definitions handled?
3) My mental picture of reaching definitions analysis is the classical iterative data flow analysis approach which involves using the union operator as the join operator for the data flow information, but I do not see anything in the code that corresponds to such a notion. I am obviously way off the mark here, so, can you briefly explain the idea behind the reaching definitions analysis in the RDF code?
4) Assuming I have somehow fixed the register units issue for x86 (which you mentioned in a previous reply) in the TableGen code by adding extra register units for x86, I am not sure how the preservation of the previous defined value through a partial over-write by a sub-register definition is handled in the RDF code. I looked at the isPreserving() function but that function seems to check whether the instruction is predicated or not which is a little confusing to me since the x86 case can create preservation of bits through non-predicated partial over-writes. So, how is the x86 situation handled?

I realize that I have asked too many questions, but, I hope you can briefly provide me some clarifications.

Thanks.

Regards,
Venugopal Raghavan.

Hi Raghavan,
Thanks for asking the questions, it's not a problem at all.

The RDF graph simulates SSA. It does so by having "artificial" PHI nodes that account for multiple reaching defs.

Consider this example (for Hexagon). The CFG is a simple triangle which has a split block (BB#0), which either goes to the side block (BB#1), or to the join block (BB#2). The register R0 is defined in BB#0 and then redefined in BB#1. Both definitions can reach the use of R0 in BB#2:

Hi Krzysztof,

Thanks a lot for taking the time to write a detailed explanation. I think I understand things better now.

I am trying to see if I can use RDF for X86 assuming I can add more register units for X86 so that the partial re-definition issue you pointed earlier is fixed. I am not yet sure whether it is easy or not, but I have identified a portion of code in TableGen which I can modify to do this.

Once I have done this I plan to use the RDG graph construction and RDF copy propagation code to try to propagate copies in the code after RA.

I am hoping that this plan would work.

Regards,

Venu.

Hi Venu,

FWIW, I have a pass that does copy propagation after RA [1] (currently only within a basic block) that should be enabled some time in the not-too-distant future. It has been reviewed and accepted, but I'm currently working on getting a slight change to the MachineOperand representation [2] that should make the copy propagation change much simpler. I believe this change to MachineOperand (or something like it) would also be needed for any pass that wants to rename registers after RA (unless the renaming is done right after RA when virtual registers are still present, which is what my current patch does, and is the source of complexity that I'm trying to eliminate).

[1] https://reviews.llvm.org/D30751
[2] https://reviews.llvm.org/D39400 D39400 WIP: [MachineOperand][MIR] Add isRenamable to MachineOperand.

Hi Geoff/Krzyssztof,

Wouldn't the isRenamable() change be required even for the RDF based copy propagation? Maybe Hexagon does not impose ABI/ISA restrictions which require specific registers to be used in specific contexts.

Also, if Geoff's copy propagation pass is invoked post-RA wouldn't it need to handle the x86 ISA feature which allows 8 bit/16 bit values to be moved into a register without over-writing the other bits in the register being defined? I am assuming this needs to be handled for this pass to work for x86. Is this part of Geoff's patch?

I have seen cases where we need copy propagation across basic blocks. Given that RDF works on the whole function, would it not be a good idea to just use that with the X86 partial-definition issue being handled?

Considering everything, would the following not be a good approach?

1) Use the isRenamable() change for MachineOperand which allows us to adhere to ABI/ISA constraints
2) Fix the partial register definition issues as seen in x86
3) Try to get the RDF based analysis and copy propagation and other passes built on RDF to work for all targets including x86 (we get the advantage of passes which can work on the whole machine function assuming that there are no scalability issues by doing so).

I myself implemented a rudimentary copy propagation pass which tries to copy propagate on the whole machine function but was put off from doing further work on this because of the X86 partial definition issue mentioned above to which Krzysztof drew my attention.

Regards,
Venu.

RDF has its own tests to detect the ability to rename any given register. RefNodes that have "Fixed" flag are those where the register cannot be changed. Having the isRenamable flag in MachineOperand would help remove these tests.

-Krzysztof

Hi,

For a test case that I ran I am seeing something in the RDF graph that I do not quite understand. I think there is an data flow edge that is missing but most likely I am wrong.

The relevant portion of IR looks like this:

BB#0:

%R10 = MOVSX64rr32 %EDX; dbg:FastBoard.cpp:186:26 @[ FastBoard.cpp:1938:21 ]

.

.

.

TEST32rr %ESI, %R8D, %EFLAGS; dbg:FastBoard.cpp:1940:10

JNE_1 <BB#1>, %EFLAGS<imp-use,kill>; dbg:FastBoard.cpp:1940:9

BB#1:

%CL = COPY %R9B, %ECX<imp-use,kill>, %ECX; dbg:FastBoard.cpp:1938:38

.

.

.

CMP32mr %RDI, 4, %RSI, 1776, %noreg, %ECX, %EFLAGS; mem:LD4%arrayidx.i49 dbg:FastBoard.cpp:1947:26

The relevant portion of the RDF graph that is constructed is shown below:

BB#0:

s3: MOV32r0 [d4(,d50,u245):, d5!(,d7,):]

.

.

.

BB#1:

s48: COPY [d49(d4,):, d50(d4,d184,u246):d49, u51(d11):u23, u52(d4):]

.

.

.

s61: CMP32mr [d62!(d58,d80,u205):, u63(+d194):u55, u64(d57):, u65(d50):]

This is a test case I was interested in because it shows the partial register redefinition scenario in X86 for which more register units needed to be added. I have a hacky fix for this in TableGen code which adds more units for X86 registers. I am not completely sure that my fix is correct and I was trying to test it.

You can see that register ECX is defined by s3 above and is used in the compare instruction which is s61. In between there is a partial re-definition of the lower 8 bits of ECX through the definition of register CL in s48. This definition of CL should leave the high order bits defined by s3 untouched.

Now consider the reaching definition for the use ref node u65 in s61. It has the d50 def ref node as a reaching definition. My question is: shouldn’t we also be able to infer that the d49 ref node is a reaching def node for u65? Starting from u65 there does not seem to be a way to reach d49.

I debugged the code in X86RDFGraph.cpp. It appears that when we start from u65, we hit the def ref node d50 which covers u65 since both involve ECX. Hence there is no need to look for other reaching defs.

The node d50 itself is created as a def node because ECX appears as an implicit def in the COPY instruction in the machine IR and the code in the function buildStmt() processes such implicit defs and creates def nodes for them in the RDF graph.

I understand that the virtual SSA nature of the RDF graph does not allow multiple reaching defs to be specified at a use ref node, but then how does one handle situations such as this where we need to identify all reaching defs for a use node? Or, is this situation being created because my hacky fix for the X86 partial redefinition case is not doing the right thing?

Thanks.

Regards,

Venu.

Hi Venu,

This is happening because there is an implicit def of ECX on the COPY instruction. This was an issue on Hexagon as well. Let me give you some background. There are two kinds of implicit defs (and implicit uses, but I'll refer only to defs for brevity):
(1) Those that indicate that some physical register (that is not an operand) is modified by a given instruction (EFLAGS is a good example on X86).
(2) Those that are there for the purpose of tracking liveness, such as the one on the COPY instruction here.

The first kind is strictly necessary (or otherwise, some physical register modifications would not be reflected in the code), but the second kind is only there to assist with liveness tracking: if register liveness was no longer tracked, these implicit defs/uses could be removed and everything would be fine. The first kind of implicit defs would have to stay regardless of whether we're still interested in liveness or not.

Having the second kind of implicit defs around will pessimize many things (like scheduling, for example), because the general algorithms that detect dependency between instructions will take the implicit defs into account. For example, instructions modifying AL and AH will appear to be dependent if they have implicit defs of AX/EAX, even though AL and AH do not overlap. This was a big problem on Hexagon, and the original RDF implementation ignored the second kind of implicit defs/uses, only paying attention to the first. Later on, subregister liveness tracking was enabled on Hexagon, which got rid of the second kind of implicit defs/uses altogether. After that, RDF was changed to no longer ignore any implicit operands: this happened in r300369.

Since X86 does not use subregister liveness tracking, it still has the extra implicit operands. I don't know whether it has any special treatment of these to avoid the dependency issues, but the presence of these will reduce the precision of RDF. You can try to revert r300369 and see if it helps. It won't hurt Hexagon, so if helps X86, we could consider bringing that code back.

-Krzysztof

Hi Krzysztof,

Thanks for the reply.

Since a data flow edge is missing, I thought that it could be a correctness issue and not just a precision issue. But, on second thoughts, maybe it isn’t because the edge from the implicit def to the use is present and the register associated with this edge covers the register associated with the missing edge.

I will try to revert the patch you mentioned and check if the precision issue is fixed. I am seeing some other failure in a test case after RDF copy propagation for X86 code and I was attributing it to this missing edge, but, I guess there is some other issue which I will try to debug.

Thanks.

Regards,
Venu.

Hi Krzysztof,

In one of your earlier emails in this thread you mentioned that you had some changes which add extra aliases for subregisters. Did you mean for X86? And is it extra register units that you added or aliases?

I tried adding extra register units for X86 through some changes in CodeGenRegisters.cpp in TableGen but I am seeing a runtime error in one of my test cases possibly due to the fact that my change is incorrect or incomplete. Apart from adding extra register units, do you need to do anything else?

If you have the changes you made, I would be grateful if you can share it.

Thanks.

Regards,
Venu.