Heap Exhaustion during 'DAGCombiner::Run'

Martin_J_O_Riordan · February 25, 2018, 2:36pm

Hi LLVM-Devs,

I am in the process of updating our out-of-tree implementation from v5.0 to
v6.0 RC3, and while it builds and mostly runs, I am having trouble with a
small number of tests where the 'WorklistMap' in 'DAGCombiner::Run' never
becomes empty. This is resulting in a runaway state of continuous heap
allocation until the process exhausts all system memory.

But I can't get a handle on why it is doing this, and it is not obvious to
me that the changes between v5.0 and v6.0 RC3 invalidate our implementation
in a way that might cause this. The only time I see our code entered is
when lowering is called for vector element insert by 'LegalizeOp'. Does
anybody have an advice on how I should approach debugging this?

Thanks,

MartinO

Nirav_Dave · March 1, 2018, 8:44pm

Martin:

I suspect this is an issue with post-DAG legalization store merging in the DAGCombiner. If you have a custom lowered type the DAGCombiner may end up merging a set of stores and immediately splitting them up in legalization. You should be able to disable this pass universally by overriding mergeStoresAfterLegalization() or conditionally for cases that shouldn’t match with canMergeStoresTo.

You should able able to verify by finding the loop of nodes considered with “-debug” on.

-Nirav

Martin_J_O_Riordan · March 5, 2018, 5:13pm

Thanks for this advice Nirav. I have only just returned to working on this after being side-tracked to other tasks.

All the best,

MartinO

Martin_J_O_Riordan · March 6, 2018, 9:04pm

We discovered what is happening.

SDAGCombiner essentially looks at various combinations of nodes to do with vectors, and when it can, it creates a vector shuffle. The problem is, that our vector shuffle lowering builds new trees with vector element, or vector sub-vector insert sequences. The generic DAGCombiner, reconstructs these into a new shuffle, and so the loop continues - we reduce it, and DAGCombiner re-abstracts it.

Our shuffle lowering produces (produced) very optimal code sequences for our target, and has not been changed significantly since LLVM v3.4; but changes between v5.0 and v6.0 have introduced this DAG reduction dependency loop.

Is there any advice to Out-of-Tree implementations about how to re-write their lowering code for shuffle so as to avoid this kind of infinite dependency coupling?

Thanks,

MartinO

Nirav_Dave · March 6, 2018, 9:15pm

Martin:

It sounds like you are doing is more akin to shuffle selection than fusion and therefore it’s a better fit for instruction selection than DAGCombining. Try movign it to ISelDAGToDAG’s Select (or potentially PreprocessISelDAG).

Th

-Nirav

nemanjai · March 6, 2018, 10:29pm

We do something very similar in the PPC back end. The ISA has a general vector permute (vperm) that any shuffle can be lowered to. But there are also a bunch of specialized permutes that don’t require materializing a permute control vector. So what we actually do is a custom legalization for shuffles where we will lower shuffles to various PPC-specific ISD nodes, etc.

Topic		Replies	Views
LoopVectorizer: shufflevectors LLVM Dev List Archives	3	77	September 4, 2018
DAGCombiner, chains and mergeConsecutiveStores? LLVM Dev List Archives	1	286	January 9, 2023
llvm-dev Digest, Vol 166, Issue 22 LLVM Dev List Archives	1	82	April 9, 2018
LLVM X86 backend combineIncDecVector's transform LLVM Dev List Archives	6	86	August 31, 2019
DAGCombiner::MergeConsecutiveStores LLVM Dev List Archives	3	60	February 16, 2015

Heap Exhaustion during 'DAGCombiner::Run'

Related Topics