Some of the optimizations that the first DAG combine performs is counter
productive for our 8-bit target. For example in:
// I dropped the types because they are irrelevant.
// Excuse me for changing the syntax...
store %tmp1, %var
%tmp2 = load %var
%tmp4 = add %tmp3, %tmp2
Since load is the only user of var and since var has just be stored to,
it assumes that %tmp1 is alive and it goes ahead and removes the load
and does:
store %tmp1, var
tmp4 = add %tmp3 , %tmp1
This is great for architectures that have more than one registers
because it is likely that value of %tmp1 is already in a physical
register, hence saving an instruction. However for our 8-bit
architecture with only one register, this kind of assumptions will just
result in extra overhead because "add" operates only on memory, so we
have to generate more instructions to store tmp1 to memory and then use
that memory location for "add". But without the optimizations, we could
just use var and everything would work out just fine.
So I propose to add a bit mask and a method to TargetLowering class so
targets can individually select some of the optimizations to be turned
off.
Thoughts?
Alireza Moshtaghi
Senior Software Engineer
Development Systems, Microchip Technology
I'd find this useful too.
Disabling this optimization in the DAG combiner isn't going to
eliminate the problem; instcombine, GVN, and maybe even others also
happen to perform this optimization. You may find it more effective
to look for ways for codegen to recover in these kinds of situations.
Dan
I can’t think of any workaround? this optimization eliminates so much information that if we want to retrieve back, it will take a lot of processing and may not necessarily be able to retrieve the lost information for all cases.
Besides, why does the generic part of llvm have to force an optimization that is counter productive to some targets?
If there are other phases that do the same optimization, I think we should also be able to disable them in those phases as well.
A.
I can’t think of any workaround? this optimization eliminates so much information that if we want to retrieve back, it will take a lot of >processing and may not necessarily be able to retrieve the lost information for all cases.
Besides, why does the generic part of llvm have to force an optimization that is counter productive to some targets?
If there are other phases that do the same optimization, I think we should also be able to disable them in those phases as well.
A.
I agree. The problem in this case is that combine1 is messing up things before we want to do something. And also there seems no way of doing target specific things like TLI.PerfromDAGCombine before combine1.
How about adding a call to TLI.PeformDAGCombine() in DAGCombiner::combine() as the first thing (before each node is visited)? We have one after the visit, so that will make it before and after.
The code sequence:
store %tmp1, var
tmp4 = add %tmp3 , %tmp1
can happen even if you eliminate the specific dag combine in question. The real solution lies elsewhere. To me, this seems more like a register allocation problem.
Evan
Remember, our target does not have registers like ordinary processors do. So register allocation is really not going to do much for us. What we have to do is to exploit the existing opportunities in the source code and try to generate code based on such opportunities. The dag combination in question is one such opportunity that is being destroyed by the optimization.
You maybe right in that this problem maybe addressed in register allocation but I’m not sure how. Could you shed some light on what you mean?
Thanks,
Remember, our target does not have registers like ordinary processors do. So register allocation is really not going to do much for us. What we have to do is to exploit the existing opportunities in the source code and try to generate code based on such opportunities. The dag combination in question is one such opportunity that is being destroyed by the optimization.
That’s a very fragile system. Other optimization passes can easily create code that cause more than one value to be live at a time.
You maybe right in that this problem maybe addressed in register allocation but I’m not sure how. Could you shed some light on what you mean?
I don’t the existing register allocator is going to be a good fit. I don’t know enough about your architecture. But it seems like an accumulator based machine? I would search for literature on register allocation for that type of architecture.
Evan