Bug prone ODS builder

Hi all,

There is an op in the RTL dialect that is defined like this, note the required $lowBit attribute:

def ExtractOp : RTLOp<"extract", [NoSideEffect]> {
  let summary = "Extract a range of bits into a smaller value, lowBit "
                "specifies the lowest bit included.";

  let arguments = (ins RTLIntegerType:$input, I32Attr:$lowBit);
  let results = (outs RTLIntegerType:$result);

  let assemblyFormat =
    "$input `from` $lowBit attr-dict `:` functional-type($input, $result)";

  let hasFolder = true;
  let verifier = "return ::verifyExtractOp(*this);";
}

ODS is generating this pile of build methods:

  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result,
      ::mlir::Value input, ::mlir::IntegerAttr lowBit);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes,
       ::mlir::Value input, ::mlir::IntegerAttr lowBit);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type result,
       ::mlir::Value input, uint32_t lowBit);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes,
       ::mlir::Value input, uint32_t lowBit);
  static void build(::mlir::OpBuilder &, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes,
       ::mlir::ValueRange operands,
       ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {});

We had a bug where some code accidentally forgot the attribute. It was:

Value replacement = rewriter.create<ExtractOp>(
    op.getLoc(), resType, extract.input());

instead of:

Value replacement = rewriter.create<ExtractOp>(
    op.getLoc(), resType, extract.input(), extract.lowBit());

This is super scary to me - this should be a static compiler error.

The question: Is there a reason that ODS is generating the = {} on the attribute set for the last builder? This seems like it should only happen on ops that have no required attributes.

Another less important question, but why generate the TypeRange version of the builder for something that has known to have a single result type? This seems like just code size bloat, anyone that has a type range could extract the single type and pass it instead.

-Chris

This is one of the “generic” build signatures ODS generates for all ops and it is useful when we need some genericity in op creation to avoid dealing with OperationState directly. IIRC, this form is used for one-to-one op conversions in std-to-llvm, but there are likely other uses. I suppose the reason for doing this was that we have many ops with no arguments and, given how the builder API is organized (you know my issues with it :wink: ), one has to pass ArrayRef<NamedAttribute>() in such cases leading to annoying verbosity (since we just forward the arguments through a sequence of variadic templates, it is impossible to deduce the type for {} which we would have otherwise used). It was introduced in ⚙ D83517 [MLIR] Change ODS collective params build method to provide an empty default value for named attributes so maybe @jurahul can tell us more. That being said, I agree that it’s better not to have this for ops that are known to have mandatory attributes.

When we want to (re)create ops generically, it is much easier to just do builder.create<TemplateParameter>(loc, original.getResultTypes(), ...) than try to extract one result based on TemplateParameter being a single-result op.

As @ftynse said, I added the default value to eliminate verbosity of using the “generic/collective params” versions of the builder (lot of the calls in out-of-tree code like in tensorflow used ArrayRef<NamedAttribute>() as the last argument and this helped eliminate that). I can definitely explore what happens if we limit this to ops without any mandatory attributes (i.e, atleast one attribute that not optional and does not have a default value).

+1, the others are special forms which would require special handling in code that would otherwise be general.

This is caught by verification though, isn’t it? So any/all unit test exercising this path would catch it.

I hit this exact same issue the recent days as I was doing large scale changes internally in the TensorFlow dialect.

My intuition right now is that the underlying problem is to be mixing the ODS “safe” API which exposes the requirements of ODS, with the “generic” API which is “unsafe” in nature (regardless of the default value).
I’d be in favor of looking into making these using different entry points entirely so that the creation call is unambiguous in which API is targeted.

+1. It looks to me that we should at least drop the = {} for ops that have attributes defined in the grammar.

I think this is a great idea. I think we should drop the ={} from ops with required attributes, because there really is no usecase for it :-).

I think that Mehdi’s point is also very good. Instead of build being the generic thing, we can have a new buildGeneric<X> thing. This would also push the code bloat with type range stuff out of the way.

-Chris