I’ve made significant progress, which ordinarily I would share at this point but my OSS contribution request has yet to make it through the Microsoft bureaucracy. Hopefully sometime next week we can start the review. I’ve got the tests working, and I think I’m mostly code complete except for the documentation generation and code documentation.
One aspect I’ve had to get creative on is automated generation of parsers/printers which appear to tblgen like strings. The solution I’ve come up with is to use templates and specialization to create parsers/printers for common data types (int, float, mlir::Type, ArrayRef, StringRef) then require fields with non-standard types to either define their own template specialization of use a custom printer/parser for the whole type.
template<typename T, typename Enable = void>
struct parse {
static ParseResult go(MLIRContext* ctxt, DialectAsmParser& parser, llvm::BumpPtrAllocator& alloc, StringRef typeStr, T& result);
};
template <typename T>
using enable_if_integral_type = typename std::enable_if<
std::is_integral<T>::value &&
is_not_type<T, bool>::value >::type;
template<typename T>
struct parse<T, enable_if_integral_type<T>> {
static ParseResult go(MLIRContext* ctxt, DialectAsmParser& parser, llvm::BumpPtrAllocator& alloc, StringRef typeStr, T& result) {
return parser.parseInteger(result);
}
};
so the generated parser would look like:
Type CompoundAType::parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser) {
llvm::BumpPtrAllocator allocator;
if (parser.parseLess()) return Type();
int widthOfSomething;
if (::mlir::tblgen::parser_helpers::parse<int>::go(ctxt, parser, allocator, "int", widthOfSomething))
return Type();
if (parser.parseComma()) return Type();
SimpleAType exampleTdType;
if (::mlir::tblgen::parser_helpers::parse<SimpleAType>::go(ctxt, parser, allocator, "SimpleAType", exampleTdType))
return Type();
if (parser.parseComma()) return Type();
float f;
if (::mlir::tblgen::parser_helpers::parse<float>::go(ctxt, parser, allocator, "float", f))
return Type();
if (parser.parseComma()) return Type();
double d;
if (::mlir::tblgen::parser_helpers::parse<double>::go(ctxt, parser, allocator, "double", d))
return Type();
if (parser.parseComma()) return Type();
ArrayRef<int> arrayOfInts;
if (::mlir::tblgen::parser_helpers::parse<ArrayRef<int>>::go(ctxt, parser, allocator, "ArrayRef<int>", arrayOfInts))
return Type();
if (parser.parseComma()) return Type();
ArrayRef<Type> arrayOfTypes;
if (::mlir::tblgen::parser_helpers::parse<ArrayRef<Type>>::go(ctxt, parser, allocator, "ArrayRef<Type>", arrayOfTypes))
return Type();
if (parser.parseComma()) return Type();
StringRef simpleString;
if (::mlir::tblgen::parser_helpers::parse<StringRef>::go(ctxt, parser, allocator, "StringRef", simpleString))
return Type();
if (parser.parseComma()) return Type();
ArrayRef<StringRef> arrayOfStrings;
if (::mlir::tblgen::parser_helpers::parse<ArrayRef<StringRef>>::go(ctxt, parser, allocator, "ArrayRef<StringRef>", arrayOfStrings))
return Type();
if (parser.parseGreater()) return Type();
return get(ctxt, widthOfSomething, exampleTdType, f, d, arrayOfInts, arrayOfTypes, simpleString, arrayOfStrings);
}
The problem I’ve run into with this approach is memory allocation in the parsers. Say one of a TypeDef’s fields is ArrayRef<x>
. What you’d ordinarily do is have a SmallVector in the parse function, populate it appropriately, then let it go out of scope after you call the get()
function, which will properly handle the re-allocation in the storage class. (Right?) This approach, however, doesn’t compose well since it requires the top-level parser function to know about the ArrayRef and indeed all the memory allocation requirements of all the recursive types. (e.g. ArrayRef<StringRef>
would require the top-level parse function to provide memory for all of the strings.) What should I do? Right now, I’m just doing a heap allocation and leaking memory. The obvious solution is to use an LLVM Allocator, yes?
template<typename T>
struct parse<T, enable_if_arrayref<T>> {
using inner_t = get_indexable_type<T>;
static ParseResult go(MLIRContext* ctxt, DialectAsmParser& parser, llvm::BumpPtrAllocator& alloc, StringRef typeStr, ArrayRef<inner_t>& result) {
std::vector<inner_t>* members = new std::vector<inner_t>();
if (parser.parseLSquare()) return mlir::failure();
if (failed(parser.parseOptionalRSquare())) {
do {
inner_t member;// = std::declval<inner_t>();
parse<inner_t>::go(ctxt, parser, alloc, typeStr, member);
members->push_back(member);
} while (succeeded(parser.parseOptionalComma()));
if (parser.parseRSquare()) return mlir::failure();
}
result = ArrayRef<inner_t>(*members);
return mlir::success();
}
};
I tried the allocator solution but ran into problems. Figured I’d ask for feedback here before I sank more time into it. Am I overthinking this and trying to create too general of a solution?
As I wrote this, I came up with a solution: put the storage in the parse struct then make the methods non-static (As usual, writing about the problem gets me thinking clearly about the problem.) So I’m now looking for feedback on the overall parsing technique (with templates).
~John