I am writing a tutorial about PDL/PDLL and I wanted to get a clear answer about why it was decided to interpret PDL at runtime rather than generate C++ code and compile in advance.
I recall someone telling me it was more efficient, but I don’t remember why. I also want to understand what about the additional efficiency is actually implemented in the byte code interpreter, vs what could be done but hasn’t been done yet. (The byte code interpreter is 2k+ lines and it’s hard for me to spot where the optimizations are happening)
It’s been a long time since I’ve looked at it, but iirc, the three primary benefits to having an interpreter such as this are:
It is implemented to coalesce match automata across patterns. It would be very difficult to do this in hand written code. Such a thing could be written as a C code emitter, but it wouldn’t be just “normal” patterns but would target some matching automata like the bytecode side does.
For large pattern libraries, there would be a significant binary size implication in most ways of generating the code and building it into the compiler. Domain specific representations of such things are almost always easier to minimize size of.
Patterns can be built and evaluated at runtime or apart from the compiler build.
The first could be done with some form of C code generation (but it would be quite different from the approach used by hand written patterns). The second and third become increasingly difficult to do in a pure build time process.
As an aside, it is because PDL takes this very specific design decisions (which are neither good nor bad but non exclusive choices), that I wish it had been decoupled more from the core IR infra. There tends to be a lot of approaches to such things over time. Even though it is “built in”, I view it as such an optional/opinionated library vs anything like the one true way to do this kind of thing.