Let's get rid of neverHasSideEffects

All,

TableGen likes to infer the MCID::UnmodeledSideEffects flag from an instruction's pattern. When an instruction doesn't have a pattern, it is assumed to have side effects.

It's possible to override this behavior by setting neverHasSideEffects = 1.

It was originally the intention that most instructions have patterns, but that's not the way it worked out. It is often more convenient to use def : Pat<>, and sometimes custom instruction selection is required.

As a result, many instructions are defined without a pattern, and we often forget to set neverHasSideEffects = 1.

$ grep -c UnmodeledSideEffects lib/Target/*/*GenInstrInfo.inc
lib/Target/ARM/ARMGenInstrInfo.inc:727
lib/Target/X86/X86GenInstrInfo.inc:920

I don't think more than half of those UnmodeledSideEffects flags should be there.

I want to stop inferring instruction properties from patterns in TableGen. It has become very hard to read instruction definitions when some properties are inferred, but only half the time.

We can still use any patterns to emit TableGen warnings. If an instruction has a sideeffecting pattern, but hasSideEffects isn't set, TableGen should complain.

I can't just kill off the neverHasSideEffects flag, that could cause miscompilations by clearing the UnmodeledSideEffects bit on instructions that should have it.

Any suggestions for a clean transition? I was thinking about first adding a -Winfer flag that warns whenever TableGen infers a property that wasn't already set in the .td file.

/jakob

All,

TableGen likes to infer the MCID::UnmodeledSideEffects flag from an
instruction's pattern. When an instruction doesn't have a pattern, it is
assumed to have side effects.

It's possible to override this behavior by setting neverHasSideEffects = 1.

It was originally the intention that most instructions have patterns, but
that's not the way it worked out. It is often more convenient to use def :
Pat<>, and sometimes custom instruction selection is required.

As a result, many instructions are defined without a pattern, and we often
forget to set neverHasSideEffects = 1.

$ grep -c UnmodeledSideEffects lib/Target/*/*GenInstrInfo.inc
lib/Target/ARM/ARMGenInstrInfo.inc:727
lib/Target/X86/X86GenInstrInfo.inc:920

I don't think more than half of those UnmodeledSideEffects flags should be
there.

I want to stop inferring instruction properties from patterns in TableGen.
It has become very hard to read instruction definitions when some
properties are inferred, but only half the time.

We can still use any patterns to emit TableGen warnings. If an instruction
has a sideeffecting pattern, but hasSideEffects isn't set, TableGen should
complain.

For what little it's worth (as I rarely touch the target td files), I like
this plan. =]

I just wanted to comment on the migration bit:

I can't just kill off the neverHasSideEffects flag, that could cause
miscompilations by clearing the UnmodeledSideEffects bit on instructions
that should have it.

Why not mass-update all the td files in the tree to explicitly set the flag
exactly as it is being inferred today, then kill the inference. The
behavior is the same, but now explicit, and we can go through and remove
all the places that make no sense.

This email (or a more clearly announce-y email) can update folks w/
out-of-tree targets.

Add the warning in the same go so that people immediately get complaints
from tablegen until they update.

Add a big red note to the release notes for those with out-of-tree targets
that don't track trunk.

I'd personally like to see a moratorium on changes to tablegen until someone writes a reference manual for it that covers the application specific backends for code generation.

All,

TableGen likes to infer the MCID::UnmodeledSideEffects flag from an instruction's pattern. When an instruction doesn't have a pattern, it is assumed to have side effects.

It's possible to override this behavior by setting neverHasSideEffects = 1.

It was originally the intention that most instructions have patterns, but that's not the way it worked out. It is often more convenient to use def : Pat<>, and sometimes custom instruction selection is required.

As a result, many instructions are defined without a pattern, and we often forget to set neverHasSideEffects = 1.

$ grep -c UnmodeledSideEffects lib/Target/*/*GenInstrInfo.inc
lib/Target/ARM/ARMGenInstrInfo.inc:727
lib/Target/X86/X86GenInstrInfo.inc:920

I don't think more than half of those UnmodeledSideEffects flags should be there.

I want to stop inferring instruction properties from patterns in TableGen. It has become very hard to read instruction definitions when some properties are inferred, but only half the time.

Yes, please. This does get a bit more complicated for properties like mayLoad, mayStore and such, but I still like requiring them to be explicit, personally. When there's a pattern on the instruction, we can check for semantic mismatches and throw an error if we find one.

We can still use any patterns to emit TableGen warnings. If an instruction has a sideeffecting pattern, but hasSideEffects isn't set, TableGen should complain.

I can't just kill off the neverHasSideEffects flag, that could cause miscompilations by clearing the UnmodeledSideEffects bit on instructions that should have it.

Any suggestions for a clean transition? I was thinking about first adding a -Winfer flag that warns whenever TableGen infers a property that wasn't already set in the .td file.

I like that. Possibly with the addition that we can filter by a specific property. -Winfer=neverHasSideEffects, e.g., would only show when that specific property is inferred.

Beyond that, I don't see an alternative to an audit of the instructions that get flagged by such a warning. :frowning:

-Jim

All,

TableGen likes to infer the MCID::UnmodeledSideEffects flag from an instruction's pattern. When an instruction doesn't have a pattern, it is assumed to have side effects.

Hi Jakob,

I don't understand what you're saying. Are you proposing that all properties (may load, store, side effects) be explicitly added to all instructions, and the pattern only be used to produce warnings?

Won't this hugely bloat the .td files?

-Chris

All,

TableGen likes to infer the MCID::UnmodeledSideEffects flag from an instruction's pattern. When an instruction doesn't have a pattern, it is assumed to have side effects.

Hi Jakob,

I don't understand what you're saying. Are you proposing that all properties (may load, store, side effects) be explicitly added to all instructions, and the pattern only be used to produce warnings?

Won't this hugely bloat the .td files?

-Chris

For mips16, I have NO patterns in instructions.

The instructions all stand by themselves independent of how they are used in code generation.

The patterns are all separated.

I think this is better from the point of view of data abstraction and information hiding.

I don't want to know about the details of instruction layout when I'm thinking about the higher level use of the instruction.

All,

TableGen likes to infer the MCID::UnmodeledSideEffects flag from an instruction's pattern. When an instruction doesn't have a pattern, it is assumed to have side effects.

Hi Jakob,

I don't understand what you're saying. Are you proposing that all properties (may load, store, side effects) be explicitly added to all instructions, and the pattern only be used to produce warnings?

Yes.

The side effect inference is worse than the load/store inference, but features like this work best when they work all the time. If all instructions had patterns, and we could accurately infer properties, it wouldn't be a problem.

Won't this hugely bloat the .td files?

Not really, TableGen has a fairly convenient syntax for bulk flagging:

let mayLoad = 1 in {
let Defs = [AL,EFLAGS,AX], Uses = [AX] in
def DIV8m : I<0xF6, MRM6m, (outs), (ins i8mem:$src), // AX/[mem8] = AL,AH
               "div{b}\t$src", [], IIC_DIV8_MEM>;
let Defs = [AX,DX,EFLAGS], Uses = [AX,DX] in
def DIV16m : I<0xF7, MRM6m, (outs), (ins i16mem:$src), // DX:AX/[mem16] = AX,DX
               "div{w}\t$src", [], IIC_DIV16>, OpSize;
let Defs = [EAX,EDX,EFLAGS], Uses = [EAX,EDX] in // EDX:EAX/[mem32] = EAX,EDX
def DIV32m : I<0xF7, MRM6m, (outs), (ins i32mem:$src),
               "div{l}\t$src", [], IIC_DIV32>;
// RDX:RAX/[mem64] = RAX,RDX
let Defs = [RAX,RDX,EFLAGS], Uses = [RAX,RDX] in
def DIV64m : RI<0xF7, MRM6m, (outs), (ins i64mem:$src),
                "div{q}\t$src", [], IIC_DIV64>;
}

It is a complete coincidence that these instructions happen to be missing neverHasSideEffects = 1 :wink:

Often, you can set the mayLoad flag as part of a multiclass when you define the *rm variant. X86 is the worst case because so many instructions can fold loads and stores. You can handle most of them by setting the mayLoad and mayStore flags in the big ArithBinOp multiclasses.

You can see the problem in the big ArithBinOp_RF multiclass in X86InstrArithmetic.td. It's not obvious at a glance which of those instructions have patterns, and as a result we forgot to set neverHasSideEffects on the *_REV and *8i8 variants. I don't think it would detract to add "let mayLoad = 1 in {…}" around the *rm groups etc.

Or you could set the flags on the various BinOpRMW super classes.

/jakob

This would work for the mayLoad / mayStore flags. You’ll get a bunch of warnings on your loads and stores.

For hasSideEffects, I will be changing what happens when you do nothing (i.e., set neither neverHasSideEffects nor hasSideEffects). We need a tool that can point out those instructions so they can be audited, as Jim suggested.

I think we should make sure that such a tool is available at least in the the next release.

/jakob

I don't understand what you're saying. Are you proposing that all properties (may load, store, side effects) be explicitly added to all instructions, and the pattern only be used to produce warnings?

Yes.

The side effect inference is worse than the load/store inference, but features like this work best when they work all the time. If all instructions had patterns, and we could accurately infer properties, it wouldn't be a problem.

Ok, I agree with that. I think it's crazy that we still can't write a pattern for copy/move instructions, so we have to smatter attributes on them. What are the other cases that we can't write a pattern for them?

I agree that it is sometimes convenient to write Pat<> patterns instead of putting a pattern on an instruction: why not make tblgen also infer predicates from Pat patterns that expand to a single instruction?

Won't this hugely bloat the .td files?

Not really, TableGen has a fairly convenient syntax for bulk flagging:

let mayLoad = 1 in {
let Defs = [AL,EFLAGS,AX], Uses = [AX] in
def DIV8m : I<0xF6, MRM6m, (outs), (ins i8mem:$src), // AX/[mem8] = AL,AH
              "div{b}\t$src", [], IIC_DIV8_MEM>;
let Defs = [AX,DX,EFLAGS], Uses = [AX,DX] in
def DIV16m : I<0xF7, MRM6m, (outs), (ins i16mem:$src), // DX:AX/[mem16] = AX,DX
              "div{w}\t$src", [], IIC_DIV16>, OpSize;
let Defs = [EAX,EDX,EFLAGS], Uses = [EAX,EDX] in // EDX:EAX/[mem32] = EAX,EDX
def DIV32m : I<0xF7, MRM6m, (outs), (ins i32mem:$src),
              "div{l}\t$src", [], IIC_DIV32>;
// RDX:RAX/[mem64] = RAX,RDX
let Defs = [RAX,RDX,EFLAGS], Uses = [RAX,RDX] in
def DIV64m : RI<0xF7, MRM6m, (outs), (ins i64mem:$src),
               "div{q}\t$src", [], IIC_DIV64>;
}

It is a complete coincidence that these instructions happen to be missing neverHasSideEffects = 1 :wink:

It's also a serious tblgen deficiency that we can't write patterns for these. Look at the terrible code in X86ISelDAGToDAG.cpp that is required to match these.

Often, you can set the mayLoad flag as part of a multiclass when you define the *rm variant. X86 is the worst case because so many instructions can fold loads and stores. You can handle most of them by setting the mayLoad and mayStore flags in the big ArithBinOp multiclasses.

You can see the problem in the big ArithBinOp_RF multiclass in X86InstrArithmetic.td. It's not obvious at a glance which of those instructions have patterns, and as a result we forgot to set neverHasSideEffects on the *_REV and *8i8 variants. I don't think it would detract to add "let mayLoad = 1 in {…}" around the *rm groups etc.

Personally, I don't like the direction of making everything be redundantly specified with "let" clauses and in the patterns. I agree that it is a problem that we're not inferring from Pat<> patterns, and that not all instructions are expressible with patterns, but I'd rather we solve *those* problems than throw out inference.

-Chris

This proposal would certainly make my life easier by having an easier to maintain insn table.

For instance, we had to create two store insn classes: one which defined mayStore and one that didn't. The former was to be used when an insn had no pattern and the latter, when mayStore was implied in the pattern.

If only TableGen wouldn't warn about non-conflicting attributes at least...

I don't understand what you're saying. Are you proposing that all properties (may load, store, side effects) be explicitly added to all instructions, and the pattern only be used to produce warnings?

Yes.

The side effect inference is worse than the load/store inference, but features like this work best when they work all the time. If all instructions had patterns, and we could accurately infer properties, it wouldn't be a problem.

Ok, I agree with that. I think it's crazy that we still can't write a pattern for copy/move instructions, so we have to smatter attributes on them. What are the other cases that we can't write a pattern for them?

Off the top of my head, I'm sure Jim and Owen could pile on:

- Copy/move instructions. We specifically don't want isel to emit these instructions. We prefer a generic COPY which can be recognized quickly and completely by SSA passes and coalescing/regalloc. The only attribute that needs to be set on a copy/move instruction is neverHasSideEffects - precisely the default I want to change.

- Big chunks of X86InstrSystem.td. Some of these have intrinsics, some don't. We want accurate attributes on all of them.

- Assembly-only instructions. Some instructions have variants that can be written in assembly, but that isel would never produce.

- Disassembly-only instructions. Some variants can't even be written in assembly, but a disassembler can decode it, and we want accurate attributes for binary analysis.

- Instructions that are produced by later codegen passes. Opcodes for branch relaxation, cbz formation on ARM, the 27 variants of xmm loads and stores on x86, etc.

- The non-pseudo versions of the x87 instruction set.

- Instructions that produce non-existant types. ARM's VLD1D3 instruction class loads 3 consecutive D-registers. There is no v3f64 or v3i64 type, and I don't think we want there to be.

- Instructions that produce illegal types. ARM's VLD1D4 instruction class could be modeled as a v4f64 load, but we don't support other operations on that type, and it wouldn't make sense to make it legal. If I put a pattern on such an instruction, I would rather have TableGen tell me that the pattern will never be able to match.

- Instructions that don't map 1-1 to their intrinsics. Some NEON intrinsics expand to two instructions.

- Instructions that read or write specific physregs, like DIV and MUL. Clearly a TableGen defect, as you mentioned.

- Instructions that define multiple values. Another TableGen shortcoming.

In many of these cases, we don't ever want isel emitting the instructions. I think adding disabled patterns to such instructions as a way of specifying properties would be backwards.

I agree that it is sometimes convenient to write Pat<> patterns instead of putting a pattern on an instruction: why not make tblgen also infer predicates from Pat patterns that expand to a single instruction?

I would love to treat Pat<> and instruction patterns equally. Keep reading.

Personally, I don't like the direction of making everything be redundantly specified with "let" clauses and in the patterns. I agree that it is a problem that we're not inferring from Pat<> patterns, and that not all instructions are expressible with patterns, but I'd rather we solve *those* problems than throw out inference.

We need to fix the current approach. It appears capricious to someone who isn't closely familiar, and that is most of us. Two features are directly harmful: Inference silently falls back to plain guessing when it fails, and when it works, you get a warning about any redundantly specified properties.

I would compare it to a clang warning: "'x' is initialized, but I can prove that the value is always overwritten before being used." You remove the initializer on x to get rid of the warning. When the code later changes to read the uninitialized value, clang silently guesses that you wanted x to be 7. We would drive Doug out of town with pitchforks if clang did something like that.

We could improve the inference coverage to include Pat<>'s, and I think we should. However, it would only add caprice. Somebody changes a single-instruction Pat<> to a multi-instruction pattern, and we expect him to know that he needs to decorate the instruction with attributes? That's not a reasonable expectation.

A tolerable inference approach is possible, but it absolutely needs to:
- Don't guess when it fails, but complain loudly if some attributes are not given explicitly, and
- Permit and verify explicitly set attributes when it does work.

I don't want to do that because I think we can do better. I want to:

- Explicitly specify all instruction properties that differ from the sane defaults (not a load, not a store, no side effects). Exploiting the regularity in an instruction set to do this easily is actually one of TableGen's strong sides.

- Use the current inference code to emit warnings instead. This would significantly improve TableGen diagnostics.

- Analyze single- and multi-instruction Pat<>'s the same way instruction patterns are analyzed. Notice how easy it is to analyze multi-instruction patterns compared to the near-impossible task of using them for inference.

This has real benefits. If you factor your instruction classes well, you can tag entire classes of instructions with mayLoad/mayStore very easily. Obscure opcodes that would never get a pattern are likely to be tagged correctly, and you get free verification of the instructions that do have patterns. Any kind of pattern.

It also improves readability by making in-instruction patterns and Pat<>'s equals, and by treating all instruction properties equally. There will be much fewer surprises for the casual visitor.

/jakob

After discussing this, we have reached a workable compromise. Here's what we'll do:

The inferred instruction properties will become tristate and default to Unset:

  bit hasSideEffects = ?;
  bit mayLoad = ?;
  bit mayStore = ?;

TableGen will attempt to infer these properties from any instruction patterns.

If inference fails and a property is unset, TableGen will issue an error instead of guessing.

If inference succeeds and a property is set to an inconsistent value, TableGen will issue an error.

If inference succeeds and a property is set to a consistent value, TableGen will happily and silently go about its business.

This will also handle migration for out-of-tree targets. They can replace 'neverHasSideEffects = 1' with 'hasSideEffects = 0', and any instructions without patterns will be pointed out for audit.

/jakob

Sounds great, thanks Jakob. Notably, this should also remove unnecessary complexity in the .td files that stems from tblgen *refusing* instructions that have an explicitly set property that it was also able to infer. I'm also in favor of making tblgen stop rejecting things like "(add imm, reg)" which are non-canonical, but come up with multi-patterns. It is better to have isel ignore the pattern than to have it generate an error, forcing crazy factoring of patterns to avoid the error.

-Chris

That warning has been removed now, and you should be able to clean up your classes.

I also filed PR13693 for you. It's a bug to lower atomic_load to an instruction without mayStore - atomic loads can't be reordered.

/jakob

Jakob,

That's great! I'll add this as part of my efforts refactoring the Hexagon insn table.

BTW, I think that I came about your way modifying it. I think that only after going a little further some of your remarks made about my initial approach. I'm using an approach that doesn't hinder multi-classes anymore.

Thanks,