Auto-generate the memory folding tables

KanRobert · March 21, 2022, 5:07am

Hi LLVM devs

As time goes on, manually adding instructions to the X86 memory folding table becomes a big burden for me to support the new ISA, so I‘d like to automate the generation of this memory fold table and replace the existing tables in X86InstrFoldTables.cpp.

I noticed the following comments were added in commit 614f1924716f [X86] Add comment about the sorting of the memory folding tables added in r335501.

// We also have a tablegen emitter that tries to autogenerate these tables

// by comparing encoding information. This can be enabled by passing

// X86_GEN_FOLD_TABLES=ON to cmake which fill produce X86GenFoldTables.inc

// in the build area. There are currently some bugs in the autogenerated table

// that require a manual review to copy them from the autogenerated table into

// this table. It is unclear if we will ever be able to fully automate this

// because as new instruction are added into holes in the X86 opcode map they

// potentially pair up with old instructions and create new entries in the

// tables that would be incorrect. The manual review process allows us a chance

// to catch these before they become observable bugs.

This patch was added in 2018 and there was no Phabricator link for it. So I don’t know how to find more details about it …

It would be very useful for me to support the automation by default if I could know what these bugs are.

Could anyone provide some information about this? Any other suggestions are also welcome.

Best

Shengchen

topperc · March 21, 2022, 5:40am

Here’s a sampling.

These entries from X86InstrFoldTables.cpp have a different version in the generated table

{ X86::MMX_MOVD64from64rr, X86::MMX_MOVD64from64rm, TB_FOLDED_STORE | TB_NO_REVERSE },
{ X86::MMX_MOVD64grr, X86::MMX_MOVD64mr, TB_FOLDED_STORE | TB_NO_REVERSE }
{ X86::MOVPQIto64rr, X86::MOVPQI2QImr, TB_FOLDED_STORE | TB_NO_REVERSE }

{ X86::VMOVPQIto64Zrr, X86::VMOVPQI2QIZmr, TB_FOLDED_STORE | TB_NO_REVERSE },
{ X86::VMOVPQIto64rr, X86::VMOVPQI2QImr, TB_FOLDED_STORE | TB_NO_REVERSE },
{ X86::MMX_MOVD64to64rr, X86::MMX_MOVQ64rm, 0 },
{ X86::MOV64toPQIrr, X86::MOVQI2PQIrm, TB_NO_REVERSE }

These entries are missing from the generated table

{ X86::MOV64toSDrr, X86::MOV64mr, TB_FOLDED_STORE | TB_NO_REVERSE | 0 }
{ X86::MOVDI2SSrr, X86::MOV32mr, TB_FOLDED_STORE | TB_NO_REVERSE | 0 }
{ X86::MOVSDto64rr, X86::MOVSDmr, TB_FOLDED_STORE | TB_NO_REVERSE | 0 },
{ X86::MOVSS2DIrr, X86::MOVSSmr, TB_FOLDED_STORE | 0},

{ X86::TCRETURNri, X86::TCRETURNmi, TB_FOLDED_LOAD | TB_NO_FORWARD },
{ X86::TCRETURNri64, X86::TCRETURNmi64, TB_FOLDED_LOAD | TB_NO_FORWARD },
{ X86::VMOV64toSDZrr, X86::MOV64mr, TB_FOLDED_STORE | TB_NO_REVERSE },
{ X86::VMOV64toSDrr, X86::MOV64mr, TB_FOLDED_STORE | TB_NO_REVERSE }
{ X86::VMOVDI2SSZrr, X86::MOV32mr, TB_FOLDED_STORE | TB_NO_REVERSE },
{ X86::VMOVDI2SSrr, X86::MOV32mr, TB_FOLDED_STORE | TB_NO_REVERSE },
{ X86::MOV64toSDrr, X86::MOVSDrm_alt, TB_NO_REVERSE },
{ X86::MOVDI2SSrr, X86::MOVSSrm_alt, 0 },
{ X86::MOVSDrr, X86::MOVLPDrm, TB_NO_REVERSE },

This entry in the generated table is clearly wrong
{ X86::SENDUIPI, X86::VMXON, TB_FOLDED_LOAD | 0 },

I think these entries in the generated table might be wrong or at least the B case because the register spill size in 16-bits but the load is only 8-bits.

{ X86::KMOVBkk, X86::KMOVBkm, 0 },
{ X86::KMOVDkk, X86::KMOVDkm, 0 },
{ X86::KMOVQkk, X86::KMOVQkm, 0 },
{ X86::KMOVWkk, X86::KMOVWkm, 0 },

The VPEXPAND entries don’t have TB_NO_REVERSE in the generated table

{ X86::VPEXPANDBZ128rr, X86::VPEXPANDBZ128rm, TB_NO_REVERSE },
{ X86::VPEXPANDBZ256rr, X86::VPEXPANDBZ256rm, TB_NO_REVERSE },
{ X86::VPEXPANDBZrr, X86::VPEXPANDBZrm, TB_NO_REVERSE },
{ X86::VPEXPANDDZ128rr, X86::VPEXPANDDZ128rm, TB_NO_REVERSE },
{ X86::VPEXPANDDZ256rr, X86::VPEXPANDDZ256rm, TB_NO_REVERSE },
{ X86::VPEXPANDDZrr, X86::VPEXPANDDZrm, TB_NO_REVERSE },
{ X86::VPEXPANDQZ128rr, X86::VPEXPANDQZ128rm, TB_NO_REVERSE },
{ X86::VPEXPANDQZ256rr, X86::VPEXPANDQZ256rm, TB_NO_REVERSE },
{ X86::VPEXPANDQZrr, X86::VPEXPANDQZrm, TB_NO_REVERSE },
{ X86::VPEXPANDWZ128rr, X86::VPEXPANDWZ128rm, TB_NO_REVERSE },
{ X86::VPEXPANDWZ256rr, X86::VPEXPANDWZ256rm, TB_NO_REVERSE },
{ X86::VPEXPANDWZrr, X86::VPEXPANDWZrm, TB_NO_REVERSE },

INSERTPS is in the generated table, but not in X86InstrFoldTables.cpp and I think that’s intentional because the memory form doesn’t have the Count_S field.
{ X86::INSERTPSrr, X86::INSERTPSrm, TB_NO_REVERSE },

Masked instructions are generated without TB_NO_REVERSE. You can’t unfold a load from a masked instruction because the unfold will produce an unmasked load. The TB_NO_REVERSE prevents that.

I should also point out that autogenerating the table without also autogenerating the tests increases the test coverage hole we already have on theses tables.

~Craig

KanRobert · March 21, 2022, 7:33am

Thanks, craig! The information is pretty useful. I think coverage hole of this table is a minor issue, we don’t need to have a test for each entry in the table b/c it’s for optimization. Enumerating all the cases that could be optimized in tests seems a little rigid and I belive we only need to test the correctness of the design, e.g, TB_NO_REVERSE, TB_FOLDED_LOAD, TB_NO_FORWARD.

topperc · March 21, 2022, 7:46am

I guess my big concern is how we find out when new rules need to be added to the generator to create correct tables. There are multiple different bugs already in just the examples I gave. I didn’t go through all of the tables.

KanRobert · March 22, 2022, 10:18am

I think the memory folding/unfolding table should not be unique, namely, one->many should be allowed. We can define a priority for mappings with the same key value, e.g

{X86::MMX_MOVD64grr,X86::MMX_MOVD64mr,TB_FOLDED_STORE},
{X86::MMX_MOVQ64rr,X86::MMX_MOVQ64mr,TB_FOLDED_STORE},

Both of the entries could be inserted into folding table. And if we prefer to unfold X86::MMX_MOVD64mr to X86::MMX_MOVD64grr, we could put the X86::MMX_MOVD64grr before X86::MMX_MOVQ64rr in the unfolding table.

{X86::MMX_MOVD64mr,X86::MMX_MOVD64grr},
{X86::MMX_MOVQ64mr,X86::MMX_MOVQ64rr},

topperc · March 22, 2022, 4:16pm

I’m confused. All 4 of the instruction names in this example are unique. What’s the conflict?

{X86::MMX_MOVD64grr,X86::MMX_MOVD64mr,TB_FOLDED_STORE},
{X86::MMX_MOVQ64rr,X86::MMX_MOVQ64mr,TB_FOLDED_STORE},

The TB_NO_REVERSE and TB_NO_FORWARD are the priority mechanisms already.

KanRobert · March 23, 2022, 4:25am

Ah, I may have fainted from hunger yesterday

Let me check the details of TB_NO_REVERSE and TB_NO_FORWARD

topperc · March 23, 2022, 4:33am

TB_NO_REVERSE prevents insertion into the reverse table. So that can be used to guarantee uniqueness on that table.

TB_NO_FORWARD only suppresses lookups it isn’t factored into the uniqueness checks for the forward table, but that’s fixable by replacing llvm::lower_bound with equal_range. And getting rid of the adjacent_find checks.

KanRobert · March 25, 2022, 1:16pm

That’s my plan:

Gradually fix the gap between auto-generated folding table and the hand-written one
When the two are totally same, I will use the auto-generated table by default add a LIT test to check each entry of the table (sth like llvm/utils/update_llc_test_checks.py would be added)

KanRobert · March 25, 2022, 1:17pm

Then we can prevent the possible regression.

KanRobert · April 7, 2022, 5:14am

@topperc I noticed that you added TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not.
rG9f2f12700960

I think we can do better now because it is possible to know if the target has a 64bit feature, etc
Could we check the predicates for these entries in functions llvm::lookup*Table? (I noticed that we already did similar things CheckVEXInstPredicate in X86GenEVEX2VEXTables.inc)

topperc · April 7, 2022, 6:10am

I think that’s doable, but as noted in that patch there’s no reason to ever unfold any of those instructions. Keeping TB_NO_REVERSE saves space in the unfold table.

Could we mark all instructions that only store a register or only load into a register as TB_NO_REVERSE through the generator?

KanRobert · April 7, 2022, 7:55am

Good idea! Let me do some experiment.

KanRobert · April 28, 2022, 5:20am

Should the instruction like MOVUPDmr, MOVUPSmr be unfoldable? The bit isMoveReg is set to 1 for MOVUPDrr, MOVUPSrr.

topperc · April 28, 2022, 5:27am

I don’t know of any places where we unfold a store instruction. So I don’t think it matter is MOVUPDmr, MOVUPSmr are unfoldable or not.

KanRobert · April 28, 2022, 5:36am

How about the pairs like

X86::VMOVAPDrr,            X86::VMOVAPDrm,
X86::VMOVAPSYrr,           X86::VMOVAPSYrm

X86::VMOVAPDrm and X86::VMOVAPSYrm are not store instructions but are MOV-like instructions.

topperc · April 28, 2022, 3:07pm

I don’t think there’s a good reason to unfold them. I think the unfolds are in MachineLICM to pull loads out of loops and InstrEmitter when it needs to copy an EFLAGS producing instruction.

For MachineLICM, unfolding VMOVAPDrm would leave VMOVAPDrr inside the loop which doesn’t make sense.

The InstrEmitter case only happens on instructions that write EFLAGS I think. If the EFLAGS from one node are used twice and get clobbered between the two users, the producer will get copied so it can recreate the EFLAGS result. If the EFLAGS producer has a folded load, we need to unfold it first so that it doesn’t get copied. This case doesn’t apply to VMOVAPDrm or any instruction that just loads a value into a register.

yubingex007-a11y · October 11, 2022, 7:37am

Hi, @topperc :
i have a question about rG29f22d7baa34.
seems you added “{ X86::VEXPANDPDZrr, X86::VEXPANDPDZrm, TB_NO_REVERSE }” according to auto-generated results previously. But now i found it auto-generated “0” for folding strategy instead of TB_NO_REVERSE.
Do you know why it happens and why we set "TB_NO_REVERSE " previously? should we set “0” in llvm/lib/Target/X86/X86InstrFoldTables.cpp?

PLS: i have a small question, why we do not allow unfolding if the register’s size is greater than the memory’s operand size?

Thanks
Bing Yu

topperc · October 12, 2022, 7:01am

It might not have ever had TB_NO_REVERSE in the table. I might have only used it as a guide and decided it needed TB_NO_REVERSE. There’s no reason to unfold it anyway. It’s only source is a pointer. One of the uses of unfolding is LICM, but the whole instruction could be hoisted without unfolding.

I think TB_NO_REVERSE is needed for the masked forms. The unfolding code won’t create a masked load. It would create a plain load and a masked VEXPANDPDZrrk. That would load elements that should be masked off.

Though that applies to every masked instruction in the unfold table and we don’t have TB_NO_REVERSE on them. I think maybe that’s not an issue because we never fold masked load intrinsics into masked arithmetic. So they always start as a whole register load which makes unfolding ok.

That’s not true for expand since we have masked expand load intrinsics.

PLS: i have a small question, why we do not allow unfolding if the register’s size is greater than the memory’s operand size?

If I remember right, the unfolded load size will be based on the register size. If that’s bigger than the memory operand size, the unfolded load will load more memory and potentially cause a memory fault.

yubingex007-a11y · October 13, 2022, 7:20am

I think TB_NO_REVERSE is needed for the masked forms. The unfolding code won’t create a masked load. It would create a plain load and a masked VEXPANDPDZrrk. That would load elements that should be masked off.

eh, i didn’t understand why we can’t unfold into plain load for mask instruction. is it because the LICM mentioned by you? it will be more clear if you could give a example and more details.
I could find in llvm’s unfolding code that only plain load is created instead of a maskload. And, when we are doing folding, could we fold maskload and masked instruction? or, we can only fold raw load and masked instruction.

Topic		Replies	Views
[X86] Memory folding tables in x86 backend LLVM Dev List Archives	4	169	July 25, 2017
Multi-Instruction Patterns LLVM Dev List Archives	7	234	September 25, 2008
Folding instructions LLVM Dev List Archives	5	149	August 16, 2006
Let's get rid of neverHasSideEffects LLVM Dev List Archives	14	162	September 5, 2012
TableGen: spring cleaning, new features for "functional programming" LLVM Dev List Archives	8	169	March 19, 2018

Auto-generate the memory folding tables

Related topics