Global ISel match table

The Global ISel match table is a sequence of 64-bit entries. For the AMDGPU, there are about 261,000 entries. for the PowerPC, about 32,000.

A significant amount of memory could be saved by reducing the entries to 32 bits. This would require reworking the matcher to deal with 32-bit entries, and then re-reworking some of the entries to pick up and assemble two 32-bit entries into one 64-bit integer.

I won't be surprised if there are compelling reasons to keep the entries at 64 bits. I'd like to hear those reasons.

Hi Paul,

The Global ISel match table is a sequence of 64-bit entries. For the AMDGPU, there are about 261,000 entries. for the PowerPC, about 32,000.

A significant amount of memory could be saved by reducing the entries to 32 bits. This would require reworking the matcher to deal with 32-bit entries, and then re-reworking some of the entries to pick up and assemble two 32-bit entries into one 64-bit integer.

I was thinking the same thing recently.

By comparison, SelectionDAG’s table is even byte-based, it would be interesting to make the comparison with that as well. Going to byte granularity surely saves even more space, which is itself a performance benefit, but may also have a performance cost due to misalignment. (SelectionDAG also uses dynamic encoding length for integers in places, which seems a more dubious choice.)

Cheers,
Nicolai

Hi Paul,

The Global ISel match table is a sequence of 64-bit entries. For the AMDGPU, there are about 261,000 entries. for the PowerPC, about 32,000.

A significant amount of memory could be saved by reducing the entries to 32 bits. This would require reworking the matcher to deal with 32-bit entries, and then re-reworking some of the entries to pick up and assemble two 32-bit entries into one 64-bit integer.

I was thinking the same thing recently.

By comparison, SelectionDAG’s table is even byte-based, it would be interesting to make the comparison with that as well. Going to byte granularity surely saves even more space, which is itself a performance benefit, but may also have a performance cost due to misalignment. (SelectionDAG also uses dynamic encoding length for integers in places, which seems a more dubious choice.)

Cheers,
Nicolai

I won’t be surprised if there are compelling reasons to keep the entries at 64 bits. I’d like to hear those reasons.

Most of the reasons boil down to not having the time needed to implement something better. There is one bit I’m aware of that snowballs a bit when using uint32_t or smaller. Labels and JumpTargets are currently absolute which allows them to be recorded in a simple lookup table during the first pass to encode on the second. JumpTargets will need to become relative (and know where they are in the table) to cope with a large ruleset and reduced range but that also means you have to start measuring distances between two points in the table. For fixed-sized commands that’s not too bad but if the range drops below the minimum then you also have to go to variable length commands (or waste space on padding). That in turn means that you need to determine the encoded size of commands on the first pass (including JumpTargets which have unknown encoding until the label is seen later) so that the labels know their position for the second pass when we encode the table. We’ll have to go from the two-pass approach to a relaxation one.

Indeed, this is certainly a complication. If we go to uint32_t, then we could use table indexes for those commands, correct? Surely no table will ever have more than 4 billion entries.

In anticipation of working on this, can you tell me what is the easiest way to test such changes? I haven't gotten my head around the testing procedures yet.

Looking through the commands, AFAICT going to uint32_t should be safe except for GIM_CheckLiteralInt, GIR_AddImm which will need some changes to store their 64-bit immediates as two 32-bit halves.

For testing, making sure llvm-lit and test-suite don’t change behaviour will be the main one but there’s a couple other things that can make you fairly confident without that:

  • Continue using uint64_t in TableGen and convert to 32-bit at the last moment. Then assert that there was no loss of information in the cast
  • Diff the state machine table before and after. Labels and JumpTargets aside it should be almost identical with just the changes to GIM_CheckLiteralInt GIR_AddImm. The Labels and JumpTargets will be thrown out by the extra elements but it’s possible to script the corrections to end up with matching tables. For example, in vim I’d probably use macros to find and increment all the Labels/JumpTargets following a GIM_CheckLiteralInt/GIR_AddImm