Leaks in PBQPBuilderWithCoalescing::build ?

Hi Lang,

In PBQPBuilderWithCoalescing::build, around line 360, we have code looking like:

PBQP::Vector newCosts(g.getNodeCosts(node));

addPhysRegCoalesce(newCosts, pregOpt, cBenefit);

g.setNodeCosts(node, newCosts);

I suspect the leak occurs around the setNodeCosts method, and I have trouble understanding how it handles the case where the node already has costs.

It seems to me that:

  • we make of copy of the node’s costs (probably because someone else can refer to it ?)

  • we modify the copy

  • we set the node’s new costs. But what is supposed to happen there, especially in the CostAllocator part ?

Could you shed some light there ?

Thanks,

Arnaud

Hi Arnaud,

You’re right: newCosts should be a copy of the node’s costs, as it may be modified. When we call setNodeCost’s, the PBQP::Vector’s assignment operator should free the old costs, but I may have misimplemented that (or forgotten it entirely).

Cheers,
Lang.

Hi Arnaud,

Out of interest - what makes you suspect the vector cost changes, rather than the matrix ones? Is the tool indicating a leak due to those lines?

The memory operations for the vector costs should be fairly straightforward, but the matrix operations use a value-pool system behind the scenes to reduce the PBQP allocator’s memory overhead. The extra complexity of that makes it a prime suspect for memory leaks.

Cheers,
Lang.

While I’m not sure where the leak is, using some pre-canned memory management might help…

Attached is a patch that changes this allocation to use shared_ptr, perhaps it’ll address the bug?

(ideally we shouldn’t need the intrusive ref counting (std::enable_shared_from_this) but instead have a weak_set that has std::weak_ptr in it & implicitly removes elements as they become null (probably on a harvesting schedule, rather than with a direct callback as is currently implemented))​

pbqp_leak.diff

Oooh. Neat. Thanks Dave. Please go ahead and commit that.

Arnaud - I have no idea whether Dave’s patch will help with this bug, but it’s certainly worth testing.

  • Lang.

FTR, similar leak was recently reported by LSan on our bootstrap bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/4506/steps/check-llvm%20asan/logs/stdio

Thanks Alexey!

Arnaud - apologies for misleading you: It looks like the vectors are pooled too.

Hopefully Dave’s patch fixes this up, otherwise I’ll try to take a look at this soon.

  • Lang.

Oooh. Neat. Thanks Dave. Please go ahead and commit that.

Committed in 217563.

Running the PBQP test without my change under valgrind didn't show any
memory leak, but hopefully with Arnaud's change and valgrind he can see the
leak and then verify that my change addresses it... *fingers crossed*

I confirm the shared_ptr change fixes the issue. I will recommit the aarch64/pbqp test then.

Thanks David !

Excellent. Thanks again Dave. And thanks for the testing Arnaud.

For the curious: I went back to look for the leak and it’s obvious with hindsight. PoolRef’s destructor was deleting the PoolEntry when the ref-count hit zero, but the assignment operator was not. This was an oversight left over from when I moved responsibility for deleting PoolEntries from PoolEntry itself out to PoolRef. It’s all moot now though: Dave’s solution fixes this, and is much cleaner to boot.

  • Lang.