Ease of defining new types

Thanks for iterating and driving this! This is a really integral piece of infra.

For these you could actually just use normal tablegen, I did the same with interfaces which have the same need to use the raw C++ type:

let contents = (ins "int":$widthOfSomething, "SimpleTypeA":$exampleTdType, ...);

If you are going to have a possible mnemonic, I would likely make it optional. Opt-out could be fine. There are classes of types/attributes that don’t have a leading mnemonic.

Slight nit, but I prefer that boolean type values in tablegen use bit and are prefixed with something like has(e.g. hasCustomParsePrint). Otherwise it can be ambiguous as to whether this is an overridable code block, or a flag.

One unfortunate aspect is that this prevents members from being merged together, unless tablegen did that automatically? For IntegerType it likely wouldn’t matter that much, but it is something to seriously consider for attributes/types with a large number of instances. We should encourage more optimized representations when possible.

Before post edit: After looking further, it seems like you allow for providing a custom storage class. That should cover the cases I was concerned about, thanks.

How do you intend to handle reference parameters should as ArrayRef/StringRef? Given their lifetime, the data needs to be copied into the allocator when the storage type is constructed. I may have missed it, but do you require that the user provide the construction function for the storage when tablegen is generating it? I think that would be a very reasonable thing to require given the many different situations that can arise.

I would structure this similarly to how OpBuilders are structured for operations. More specifically, it would be nice if the user could enumerate the different signatures for the verify method that they want and optionally provide a code block.

What about providing a code block instead? You could then effectively generate the block of code that
dispatches between types in Dialect::parse/print(Attribute|Type). This would also match the behavior of the corresponding tablgen Op methods.

I don’t think I need to see the generated C++ at this point. I can follow along(I think) with where you are going, and it looks good. I mostly have questions about various corner cases and what exactly tablegen is responsible for generating. IMO if we can generate a majority of the boilerplate and leave it up to the user to define the rest(e.g. storage class construction methods) that would already be a huge win.

– River

Sure thing! I’m still not promising any code to implement this, but that’d be the easier part vs fleshing out the tablegen syntax/semantics. This would make my life easier, though, so I’ll probably do it.

let contents = (ins "int":$widthOfSomething, "SimpleTypeA":$exampleTdType, ...);

I didn’t know this syntax existed! What would the type (in the class) be for contents? Is this just the literal for a class (which ins is)?

If you are going to have a possible mnemonic, I would likely make it optional. Opt-out could be fine. There are classes of types/attributes that don’t have a leading mnemonic.

I’ll do that. Having the mnenomic default name almost never makes sense IME.

One unfortunate aspect is that this prevents members from being merged together, unless tablegen did that automatically?

I don’t understand ‘merged together’.

How do you intend to handle reference parameters should as ArrayRef/StringRef? Given their lifetime, the data needs to be copied into the allocator when the storage type is constructed. I may have missed it, but do you require that the user provide the construction function for the storage when tablegen is generating it? I think that would be a very reasonable thing to require given the many different situations that can arise.

Shit. I’m not used to C++ – I’m used to garbage collected languages. I’d like to be able to auto-generate the construction method in the common case as this is one of the PITA boilerplates. ArrayRefs and StringRefs are fairly common IME. Got any ideas? I’m thinking adding a per-element flag indicating that an allocation is necessary for this copy. If so, how would I modify the above let contents = ... syntax to support this? I’ll add an option to let the user provide a custom construction function.

I would structure this similarly to how OpBuilder s are structured for operations. More specifically, it would be nice if the user could enumerate the different signatures for the verify method that they want and optionally provide a code block.

Are you talking about this sort of thing?

  let builders = [OpBuilder<
    "OpBuilder &builder, OperationState &result, MemRefType memrefType", [{
       result.types.push_back(memrefType);
     }]>,
    OpBuilder<
    "OpBuilder &builder, OperationState &result, MemRefType memrefType, " #
    "ValueRange operands, IntegerAttr alignment = IntegerAttr()", [{
       result.addOperands(operands);
       result.types.push_back(memrefType);
       if (alignment)
         result.addAttribute(getAlignmentAttrName(), alignment);
     }]>];

If so, I personally don’t think this saves much while adding complexity. I also don’t like having too much C++ code in tablegen (since it’s not compatible with my syntax highlighter and code completion). But since it’s an existing paradigm, there’s significant value in continuing that. I am concerned about trying to jam too much into this first version. Would you object to keeping the way it is in my proposal and adding your suggestion in subsequent versions?

What about providing a code block instead? You could then effectively generate the block of code that dispatches between types in Dialect::parse/print(Attribute|Type). This would also match the behavior of the corresponding tablgen Op methods.

Good idea, though I would also like to support generating just the declaration so users can implement it in a C++ file for complex cases. Also, not generating anything should be supported. Null for nothing, an empty code block for just the decl, and a non-empty for the code block for the def? Or should I provide an explicit option to control this behavior? Actually, this could be a convention for this sort of thing (the custom construction function above being a good example).

~John

No worries, and no pressure. A lot of stuff that I’ve done is just because it annoyed me enough.

It’s just a dag. This is how the arguments for interface methods are defined(and used with the example I gave):

Sorry about that. Merged here refers to things like bitpacking. For example, IntegerType has two elements: width and signedness semantics. IntegerType has a maximum width, meaning that it doesn’t need the full 32-bits of an unsigned so we pack the signedness in the unused bits.

I think I agree with you here in practice, but I haven’t seen what we would be able to generate. The main win that this has over just using extraClassDecls, is that if you don’t need anything else in the extraClassDecls you don’t have to define it. For operations that is extremely useful, for attributes/types its not likely going to be that much of a win.

Right now we just have a convention of calling a user defined method, e.g. verify(*this)/parse(*this), to allow for defining in the .cpp file. That ends up being fine for the most part because you can put it into the base tablegen class and be done with it. I don’t really have that big of a preference here though, whatever ends up being the easiest/cleanest.

This has always been one of the sticking points when I’ve thought about this. I haven’t given it enough thought to have a great suggestion. This is also something that can be solved afterwards though, given that auto-generating the other things is a pretty big win IMO.

– River

I thought about this over my OOO, and I think we can just allow for type elements to optionally define a code block that should be used during allocation. This would let the ArrayRef/StringRef elements specify let allocator = "$allocator.copyInto($self)".

– River

Good idea – this would also allow container fields which require allocations for stuff they contain via custom function calls. Given that it should be on a per-member basis, I’m thinking:

class TypeBase;
def members : TypeBase;
def member : TypeBase;
...
    let contents = (
        members
        "int":$widthOfSomething,   // Simple case
        (member "ArrayRef<Type>":$types, [{$allocator.copyInto($self)}]),
        (member "ComplexCppClass":$bizarreField, [{$self.allocate($allocator)}])
    );

Am I using dag properly? I don’t really understand what purpose the dag operator serves.

While we’re talking about it, what do you think of ‘contents’? I’d prefer a more specific word, but nothing comes to mind. Naming stuff is hard.

~John

Yeah, that is how you can structure a dag. The operator is only really useful if you need it to, e.g. when declaring pattern matches the root is often a specific operation type.

For this I think we could let the user define a TypeMember<> or just use the string for simple types. This is similar to one the classes you had originally:

class TypeMember<string type> {
  code allocator = ?;
  string cppType = type;
}
class ArrayRefMember<string type> : TypeMember<type> {
  let allocator = "$allocator.copyInto($self)";
}

def MyType : ... {
  let members = (ins
    "int":$widthOfSomething
    ArrayRefMember<"Type">:$types
  );
}

I don’t really have a good suggestion, contents, members, elements, fields all seem fine.

– River

Hi River-

Sorry about the radio silence. I was diverted to a different (less fun) project for the last six weeks.

Here’s my latest proposal. I’ll be working on implementing it (if I’m not diverted again) this week and next. I think it incorporates all your feedback. I kept the code blocks for print/parse since I’d like to accommodate very simple print/parse functions, which I suspect are common.

// RUN: mlir-tblgen -gen-op-defs -I %S/../../include %s | FileCheck %s

include "mlir/IR/OpBase.td"
include "mlir/Dialect/StandardOps/IR/Ops.td"

// *****
// This would go in OpBase.td

// Define a new type belonging to a dialect and called 'name'
class TypeDef<Dialect dialect, string name> {
    // Name of storage class to generate or use
    string storageClass = name # "Storage";
    // Namespace (withing dialect c++ namespace) in which the storage class resides
    string storageNamespace = "detail";
    // Should we generate the storage class? (Or use an existing one?)
    int genStorageClass = 1;
    // Should we generate the storage class constructor?
    int hasStorageCustomConstructor = 0;

    // This is the list of fields in the storage class (and list of parameters
    // in the creation functions). If empty, don't use or generate a storage class
    dag members;

    // Customization settings

    // If null, don't generate any methods.
    // If an empty code block, generate just the declarations.
    // If a non-empty code block, just use that code as the definition code.
    code printer;
    code parser;
    // Use the lowercased name as the keyword for parsing/printing. Only used in
    // generated parse/print methods
    string mnemonic = ?;

    // If set, generate accessors for each TypeMember
    int generateAccessors = 1;
    // Generate the verifyConstructionInvariants declaration and getChecked method
    int verifyInvariantsDecl = 0;
    // Extra code to include in the class declaration
    code extraDecls;
}

// 'Members' should be subclasses of this or simple strings (which is a
// shorthand for TypeMember<"C++Type">)
class TypeMember<string type> {
  code allocator = ?;
  string cppType = type;
}

// For standard ArrayRefs, which require allocation
class ArrayRefMember<string type> : TypeMember<type> {
  let allocator = [{$allocator.copyInto($self)}];
}

// For classes which require allocation and have their own allocateInto method
class SelfAllocationMember<string type> : TypeMember<type> {
  let allocator = [{$self.allocateInto($allocator)}];
}

// ************
// Test defs
def Test_Dialect: Dialect {
    let name = "TestDialect";
}

// Base class for other typedefs. Provides dialact-specific defaults
class TestType<string name> : TypeDef<Test_Dialect, name> { }

// SimpleTypeA is a simple type. A storage class will not be generated.
def SimpleTypeA : TestType<"SimpleA"> { }

// A more complex parameterized type
def CompoundTypeA : TestType<"CompoundA"> {
    // Override the default mnemonic
    let mnemonic = "cmpnd_a";

    // What types do we contain?
    let members = (
        ins
        // A standard c++ int
        "int":$widthOfSomething,
        // The simple type defined above
        "SimpleTypeA": $exampleTdType,
        // Some C++ type
        "SomeCppStruct": $exampleCppType
    );
}

// Base class for standard types
class StdType<string name> : TypeDef<StandardOps_Dialect, name> { }

def IndexType : StdType<"IndexType"> {
    let mnemonic = "index";
}

def IntegerType : StdType<"IntegerType"> {
    let parser = [{}];
    let printer = [{}];
    let mnemonic = "int";
    let verifyInvariantsDecl = 1;
    let members = (
        ins
        "SignednessSemantics":$signedness, 
        "unsigned":$width
    );
    
    let extraDecls = [{
  /// Signedness semantics.
  enum SignednessSemantics {
    Signless, /// No signedness semantics
    Signed,   /// Signed integer
    Unsigned, /// Unsigned integer
  };

  /// This extra function is necessary since it doesn't include signedness
  static IntegerType getChecked(unsigned width, Location location);

  /// Return true if this is a signless integer type.
  bool isSignless() const { return getSignedness() == Signless; }
  /// Return true if this is a signed integer type.
  bool isSigned() const { return getSignedness() == Signed; }
  /// Return true if this is an unsigned integer type.
  bool isUnsigned() const { return getSignedness() == Unsigned; }

    }];
}

def TupleType : StdType<"TupleType"> {
    let mnemonic = "tuple";
    let members = (
        ins
        "int":$widthOfSomething,
        ArrayRefMember<"Type">:$types,
        SelfAllocationMember<"ComplexCppClass">:$bizarreField
    );

    let extraDecls = [{
  /// Accumulate the types contained in this tuple and tuples nested within it.
  /// Note that this only flattens nested tuples, not any other container type,
  /// e.g. a tuple<i32, tensor<i32>, tuple<f32, tuple<i64>>> is flattened to
  /// (i32, tensor<i32>, f32, i64)
  void getFlattenedTypes(SmallVectorImpl<Type> &types);

  /// Return the number of held types.
  size_t size() const;

  /// Iterate over the held elements.
  using iterator = ArrayRef<Type>::iterator;
  iterator begin() const { return getTypes().begin(); }
  iterator end() const { return getTypes().end(); }

  /// Return the element type at index 'index'.
  Type getType(size_t index) const {
    assert(index < size() && "invalid index for tuple type");
    return getTypes()[index];
  }
    }];
}

This is, of course, subject to changes during implementation since I’ll doubtless discover things which don’t make sense or that are infeasible for a first take.

~John

No worries, it happens to everyone.

All of this looks great. +1 from me.
– River

Excellent!

Before I start, I’d like to write up and have reviewed a list of classes which I’d modify/create just to make sure I understand the internals properly. Are you willing to review that also?

~John

Of course. You can put me as a reviewer for anything related to this.

– River

Thanks!

This actually looks relatively straightforward:

  • Create TypeDef and TypeMember classes to wrap the llvm::Records for convenience
  • The classes in Format.h will handle the proper substitutions in the code blocks
  • Plain old functions to emit the code (using formatv when appropriate) and GenRegistration to register the generation functions.

I think I’ll need 2 generators:

  • The header generator (decls)
  • The cpp file generator (defs). It’ll have a few #ifdef sections for:
    • The code for the TypeDefs
    • The parser dispatch snippet
    • The printer dispatch snippet
    • The list of types for addTypes<…>();

Am I right?

I also see a rather useful helper class IfDefScope in OpDefinitionsGen.cpp which I’d like to use. Should I put it into a header file in the same directory?

The generator will also need to know which dialect it is generating for in case there are typedefs from multiple dialects in the record set. Should I piggyback on GenDialect’s --dialect? (And put the decl in a header file in the same directory?)

In terms of CMake support, should I add an mlir_tablegen() to the add_mlir_dialect function or create a new one?

Are there some other useful classes you’d recommend I use? Have I missed something?

~John

You could probably get away with just two, and just wrap the parser/printer methods in a function with a known name(similarly to how the DRR creates a populateWithGenerated method to add autogenerated patterns to a pattern list. One for the list of types, and one for everything else.

Yeah, this is a useful utility that we should just make available in a header.

I would use the same option name, but probably keep the logic separate. Given that the type specifies which dialect it will be a part of, the handling would likely be slightly different anyways.

Given that a dialect may or may not define types/attributes, I could go either way. I’m fine with just adding it there though.

I think you’ve covered most of the ones that will be useful here. Two things that should probably be explicitly mentioned again are: generating documentation, and making sure this also works for attributes. For documentation, we should have the common summary and description fields which can be used to autogen documentation just like we do for everything else. For attributes, the underlying storage and specification mechanism is pretty much exactly the same as types so when doing this for types keep in mind which parts you would need to add an if/else for when adding support for attributes.

Looks good.

– River

I think I’m like 70% done with a draft patch EXCEPT for writing the unit tests. I’ve been testing against my MLIR project.

How should I write the tests? I see that FileCheck is used and I’ll do that. Does MLIR have any checks that the code being produced compiles and makes sense?

Also, what is required to support attributes? I’ve only ever written types. Is it literally just this?

  • Type::TypeBase -> Attribute::AttrBase
  • TypeStorageAllocator -> AttributeStorageAllocator
  • addTypes -> addAttributes

Yeah, FileCheck is generally used. When I’m writing tablegen things I generally end up writing two types of tests: testing the actual tablegen backend (error handling, etc.), and testing that the generated thing actually works as intended in MLIR. The first of which generally involves executing the tablegen backend with the right generator, i.e. what the build system normally does. The second of which generally adds uses the Test dialect to replicate what a user would do for their own dialect. An example of this would be the declarative assembly format, which has tests both for the tablegen backend and the usage on actual operations(which are defined in the Test Dialect).

Aside from what you’ve listed, the only other real difference is that all attributes have an associated Type. E.g., IntegerAttr has an IntegerType, FloatAttr has a FloatType, etc. If an attribute doesn’t explicitly provide its type, it defaults to NoneType. Realistically, the Attribute def would provide some way to construct the type; either from its field members, or statically. This could just be a format string, that provides format specifiers for accessing the field members or MLIRContext. Attribute support should be added after types, so the exact form that would take can wait until after Type support lands.

Also, I just remembering another missing aspect in that Attributes and Types now support adding Traits and Interfaces(identical in nature to those on operations). Doesn’t need to be in the initial patch, but it is an aspect of Attributes/Types.

– River

I’ve made significant progress, which ordinarily I would share at this point but my OSS contribution request has yet to make it through the Microsoft bureaucracy. Hopefully sometime next week we can start the review. I’ve got the tests working, and I think I’m mostly code complete except for the documentation generation and code documentation.

One aspect I’ve had to get creative on is automated generation of parsers/printers which appear to tblgen like strings. The solution I’ve come up with is to use templates and specialization to create parsers/printers for common data types (int, float, mlir::Type, ArrayRef, StringRef) then require fields with non-standard types to either define their own template specialization of use a custom printer/parser for the whole type.

  template<typename T, typename Enable = void>
  struct parse {
    static ParseResult go(MLIRContext* ctxt, DialectAsmParser& parser, llvm::BumpPtrAllocator& alloc, StringRef typeStr, T& result);
  };

  template <typename T>
  using enable_if_integral_type = typename std::enable_if<
                                    std::is_integral<T>::value &&
                                    is_not_type<T, bool>::value >::type;
  template<typename T>
  struct parse<T, enable_if_integral_type<T>> {
    static ParseResult go(MLIRContext* ctxt, DialectAsmParser& parser, llvm::BumpPtrAllocator& alloc, StringRef typeStr, T& result) {
      return parser.parseInteger(result);
    }
  };

so the generated parser would look like:

Type CompoundAType::parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser) {
  llvm::BumpPtrAllocator allocator;
  if (parser.parseLess()) return Type();
  int widthOfSomething;
  if (::mlir::tblgen::parser_helpers::parse<int>::go(ctxt, parser, allocator, "int", widthOfSomething))
    return Type();
  if (parser.parseComma()) return Type();
  SimpleAType exampleTdType;
  if (::mlir::tblgen::parser_helpers::parse<SimpleAType>::go(ctxt, parser, allocator, "SimpleAType", exampleTdType))
    return Type();
  if (parser.parseComma()) return Type();
  float f;
  if (::mlir::tblgen::parser_helpers::parse<float>::go(ctxt, parser, allocator, "float", f))
    return Type();
  if (parser.parseComma()) return Type();
  double d;
  if (::mlir::tblgen::parser_helpers::parse<double>::go(ctxt, parser, allocator, "double", d))
    return Type();
  if (parser.parseComma()) return Type();
  ArrayRef<int> arrayOfInts;
  if (::mlir::tblgen::parser_helpers::parse<ArrayRef<int>>::go(ctxt, parser, allocator, "ArrayRef<int>", arrayOfInts))
    return Type();
  if (parser.parseComma()) return Type();
  ArrayRef<Type> arrayOfTypes;
  if (::mlir::tblgen::parser_helpers::parse<ArrayRef<Type>>::go(ctxt, parser, allocator, "ArrayRef<Type>", arrayOfTypes))
    return Type();
  if (parser.parseComma()) return Type();
  StringRef simpleString;
  if (::mlir::tblgen::parser_helpers::parse<StringRef>::go(ctxt, parser, allocator, "StringRef", simpleString))
    return Type();
  if (parser.parseComma()) return Type();
  ArrayRef<StringRef> arrayOfStrings;
  if (::mlir::tblgen::parser_helpers::parse<ArrayRef<StringRef>>::go(ctxt, parser, allocator, "ArrayRef<StringRef>", arrayOfStrings))
    return Type();
  if (parser.parseGreater()) return Type();
  return get(ctxt, widthOfSomething, exampleTdType, f, d, arrayOfInts, arrayOfTypes, simpleString, arrayOfStrings);
}

The problem I’ve run into with this approach is memory allocation in the parsers. Say one of a TypeDef’s fields is ArrayRef<x>. What you’d ordinarily do is have a SmallVector in the parse function, populate it appropriately, then let it go out of scope after you call the get() function, which will properly handle the re-allocation in the storage class. (Right?) This approach, however, doesn’t compose well since it requires the top-level parser function to know about the ArrayRef and indeed all the memory allocation requirements of all the recursive types. (e.g. ArrayRef<StringRef> would require the top-level parse function to provide memory for all of the strings.) What should I do? Right now, I’m just doing a heap allocation and leaking memory. The obvious solution is to use an LLVM Allocator, yes?

  template<typename T>
  struct parse<T, enable_if_arrayref<T>> {
    using inner_t = get_indexable_type<T>;

    static ParseResult go(MLIRContext* ctxt, DialectAsmParser& parser, llvm::BumpPtrAllocator& alloc, StringRef typeStr, ArrayRef<inner_t>& result) {
      std::vector<inner_t>* members = new std::vector<inner_t>();
      if (parser.parseLSquare()) return mlir::failure();
        if (failed(parser.parseOptionalRSquare())) {
        do {
          inner_t member;// = std::declval<inner_t>();
          parse<inner_t>::go(ctxt, parser, alloc, typeStr, member);
          members->push_back(member);
        } while (succeeded(parser.parseOptionalComma()));
        if (parser.parseRSquare()) return mlir::failure();
      }
      result = ArrayRef<inner_t>(*members);
      return mlir::success();
    }
  };

I tried the allocator solution but ran into problems. Figured I’d ask for feedback here before I sank more time into it. Am I overthinking this and trying to create too general of a solution?

As I wrote this, I came up with a solution: put the storage in the parse struct then make the methods non-static (As usual, writing about the problem gets me thinking clearly about the problem.) So I’m now looking for feedback on the overall parsing technique (with templates).

~John

I’ve gotten the final approval from Microsoft for this and all future contributions. I’ll be submitting a patch later today. You don’t have to try to parse the above.

~John

1 Like

https://reviews.llvm.org/D86904

Haven’t finished the FileCheck definition tests or doc generation support and there are a few other rough edges. Should be more-or-less ready for review.

Of course now that ‘members’ is scattered throughout the code, I’m realizing that ‘parameters’ (as in parametric types) would have been more appropriate!

FYI: I’m at significantly reduced capacity due to an injury. Possibly for several weeks. Dunno how long it’ll take me to wrap this up or respond to comments.

Oww, take care @jdd!