Testcases where GVN uses too much memory?

I've heard a few times, "GVN uses too much memory." The real fix
is probably a rewrite of some sort, but that's not what this email
is about.

I have a few patches that should *incrementally* reduce its memory
usage. Keyword being "should", because I haven't observed an
improvement... in fact, I haven't seen it using much memory at all.

Does anyone have a specific testcase where GVN uses gobs of memory?

I heard a rumour that gambitc can repro this, but I don't have time
right now to look for and isolate a testcase. If someone else has
one handy, I'd appreciate if you could send it across.

0001-GVN-Avoid-BumpPtrAllocator-leak-in-performPRE.patch (4.44 KB)

0002-GVN-Don-t-shuffle-entries-in-replaceInLeaderTable.patch (1.95 KB)

0003-GVN-Optimize-Expression-for-size.patch (8.04 KB)

0004-GVN-Change-from-a-linked-list-to-a-vector.patch (8.15 KB)

Hi Duncan,

Each element in `LeaderTable` was an intrusively linked list of
`LeaderTableEntry`, with each element of the list costing 24B (assuming
64-bit pointers).

  - Remove the link from `LeaderTableEntry`, slimming it down to 16B.

  - Replace the linked list with a hand-rolled vector, with size and
    capacity each represented using `uint32_t`.

If the list has a single entry, it still costs 24B and does not require
a dereference. If it has more than one element, it mallocs an array and
costs 24B plus 16B per entry.

Since the entries will be next to each other in memory, we get a
locality savings here.

I don’t follow the logic here. My recollection is that the common case is a single element in the list, for which the data structure is already optimized: the head element (not a pointer, but the actual element) is stored in-line with the LeaderTable itself, requiring no additional allocation at all.

—Owen

Each element in `LeaderTable` was an intrusively linked list of
`LeaderTableEntry`, with each element of the list costing 24B (assuming
64-bit pointers).

- Remove the link from `LeaderTableEntry`, slimming it down to 16B.

- Replace the linked list with a hand-rolled vector, with size and
   capacity each represented using `uint32_t`.

If the list has a single entry, it still costs 24B and does not require
a dereference. If it has more than one element, it mallocs an array and
costs 24B plus 16B per entry.

Since the entries will be next to each other in memory, we get a
locality savings here.

I don’t follow the logic here. My recollection is that the common case is a single element in the list, for which the data structure is already optimized: the head element (not a pointer, but the actual element) is stored in-line with the LeaderTable itself, requiring no additional allocation at all.

Right, it's already optimized for the common case. My goal was to leave
the common case about the same, but improve the rare case. When GVN
"blows up", is it just a really big common case, or are entries in
`LeaderTable` with 3+ elements dominating the memory footprint?

After this patch:

  - The memory footprint of the common case is unchanged: if there's
    only one element, then it's still 24B stored in-line with no
    additional allocation. However, runtime speed is probably slower
    because of the branch to check where the storage is. (How much
    slower?)

  - In the rare two element case, there's one allocation of 32B instead
    of one allocation of 24B. Depending on the malloc, this might be
    equivalent, or might be more expensive. Runtime speed to visit them
    should be dominated by the single dereference in both cases.

  - In the rare 3+ element case, there's 1 x N*16B allocation instead of
    (N-1) x 24B allocations. This is where the patch might add value
    (both memory footprint and runtime speed).

But it's speculative enough that it's needs supporting data.

I'm not even sure that `LeaderTable` is a problem (having no data!), but
if it is, it wouldn't be hard to shave 8B off of the static storage,
helping even the common case. `Value` is always word-aligned, so we
could steal its LSB to indicate short vs. long mode. In long mode, the
rest of the `Value` pointer can used for size and capacity, and the
`BasicBlock` pointer can point to external storage.

I agree. It seems like it has the potential to be worthwhile, but we need at the very least a histogram of the frequencies of various list lengths. Ideally, we want to measure both the memory footprint and the runtime impact of this change.

—Owen