Instruction Constraints Question

We've discovered a problem in the AVX2 gather patterns X86InstrSSE.td.

According to the AVX2 manual, no two of the destination register, vector
index register and mask register can be the same. The patterns in
X86InstrSSE.td are missing this constraint and it's possible to generate
an illegal instruction.

It doesn't look like TableGen supports Constraints beyond EARLY_CLOBBER
and TIED_TO. We would need to add a constraint such as "$dst != $src1,
$dst != $mask, $src1 != $mask" to the current patterns to enforce the
rules.

Is there another mechanism to suport a constraint like this or is
hacking TableGen the best way to do it? If the latter, does anyone have
a sense of how difficult this would be to implement?

                       -David

It doesn't look like TableGen supports Constraints beyond EARLY_CLOBBER
and TIED_TO. We would need to add a constraint such as "$dst != $src1,
$dst != $mask, $src1 != $mask" to the current patterns to enforce the
rules.

You can emulate such constraints via early clobbing. Just mark dst as
early clobbing.

Is there another mechanism to suport a constraint like this or is
hacking TableGen the best way to do it? If the latter, does anyone have
a sense of how difficult this would be to implement?

It's not a tablegen problem, the regalloc part is the hardest.

It doesn't look like TableGen supports Constraints beyond EARLY_CLOBBER
and TIED_TO. We would need to add a constraint such as "$dst != $src1,
$dst != $mask, $src1 != $mask" to the current patterns to enforce the
rules.

You can emulate such constraints via early clobbing. Just mark dst as
early clobbing.

How would that (or any early clobbering) enforce $src1 != $mask? Or is
it a fortuitous side-effect of implementation?

Tim.

In this case, $src1 is also the destination register. A masked gather will merge the conditionally selected elements into the input vector.

-Cameron

How would you emulate src1 != mask via early clobbing?

-Krzysztof

Ah, I misread Tim's question. Still---what if two input registers must be different?

-Krzysztof

Anton Korobeynikov <anton@korobeynikov.info> writes:

It doesn't look like TableGen supports Constraints beyond EARLY_CLOBBER
and TIED_TO. We would need to add a constraint such as "$dst != $src1,
$dst != $mask, $src1 != $mask" to the current patterns to enforce the
rules.

You can emulate such constraints via early clobbing. Just mark dst as
early clobbing.

Actually, I've always wondered what early clobbering is. Can you
explain it?

I don't think that it will work for the src1 != mask constraint, though,
right?

Is there another mechanism to suport a constraint like this or is
hacking TableGen the best way to do it? If the latter, does anyone have
a sense of how difficult this would be to implement?

It's not a tablegen problem, the regalloc part is the hardest.

Sure, but we'll want to express the constraint in TableGen, I think.

Any hints on a direction for regalloc? How are such constraints
modeled in the current iteration of the register allocator?

Thanks!

                        -David

Cameron McInally <cameron.mcinally@nyu.edu> writes:

How would that (or any early clobbering) enforce $src1 != $mask? Or is
it a fortuitous side-effect of implementation?

In this case, $src1 is also the destination register. A masked gather will
merge the conditionally selected elements into the input vector.

Oh yes, that's right. Thanks for the reminder, Cameron. :slight_smile:

                       -David

It's an output operand that is written before all the input operands are read.

The register allocator makes sure that early clobber outputs are never allocated the same register as any input operands.

/jakob

Jakob Stoklund Olesen <stoklund@2pi.dk> writes:

Actually, I've always wondered what early clobbering is. Can you
explain it?

It's an output operand that is written before all the input operands
are read.

The register allocator makes sure that early clobber outputs are never
allocated the same register as any input operands.

Aha! Thanks very much for that explanation Jakob!

                           -David

Sorry, I was looking at the gather pattern and not at Dave’s example. EARLY_CLOBBER should be sufficient.

For a masked gather, all source operands are live entering the instruction. No problem there.

For an unmasked gather, the mask and index vector are live entering the instruction. The input vector can be undefined though, which leads to the problem of having the index vector being reused as the destination. Marking the destination as EARLY_CLOBBER should sort that out.

-Cameron