I’ve been mulling what it would take to have a parser generator dialect which would take something like an ANTLR grammar and generate the lexer and parser bits for a dialect. This is probably alot more than what you’re thinking about, though.
For the record, my default is to shy away from copypasta, but if we can articulate and record in this forum a reason why this is a good solution, then that’s fine for me as well. I just want to make sure that the discussion takes place and is codified.
I don’t think that splitting this out and pretending it is reusable is a good idea - too much of it is specific to decisions in the MLIR syntax. I would much rather see a little parser generator framework for defining grammars and generating a lexer/parser from a declarative specification.
That said, the existing MLIR syntax is very hackable if you are flexible about the details. Defining a dialect with customer parser rules can already do much of what you want. I have toyed with the idea of having a flag that changes the default dialect from “std” to something else, which would make this even more interesting.
It’s only difficult if IR/ depends on the generated thing, which isn’t the case here or for most things. As an example, the dynamic patter rewriter work that is upcoming will depend on several different dialects which is a much larger dependency than IR/.
+1. The Parser has so many different MLIR specific decisions and assumptions that it doesn’t really make sense to expose. The custom assembly format parser wouldn’t benefit at all from exposing it. The thing it may benefit from is exposing the parts of the lexer, but that is mostly the base as the allowed tokens and error handling differ.