SSA and CFG reconstruction from a register-based bytecode

Hi folks, I’m trying to convert a bytecode into an MLIR dialect.
The bytecode is not in SSA form and is register-based, e.g.:

r1 = 42
r2 = 15
r1 = add r1, r2
r2 = call foobar(r1)
return r2

What comes to mind is to convert it into a “raw” dialect, e.g.:

raw.constant {value = 42, dst = r1}
raw.constant {value = 15, dst = r2}
raw.add {dst = r1, lhs = r1, rhs = r2}
raw.call {name = "foobar", dst = r2, argv = [r1]}
raw.return {src = r2}

Then run some hand-crafted passes and analyses to reconstruct CFG and then SSA (all in terms of the “raw” dialect), and then lower the “raw” dialect into the one I actually want to use for the fun stuff.

I’m curious if there are some components I can reuse or if there is a better approach that I’m missing?

Any hints would be highly appreciated!

Cheers,
Alex.

I’d suggest something like:

%1, %2 = mc.reg("r1", "r2")
mc.add (%1, %1, %2)
mc.call (%r2, %r1) {args=1, returns=1}
mc.return(%1)

This keeps most of the benefit of value references without too much extra machinery. I’d put effort into some of the basic machinery in the LLVM machine code layer, like validation of liveins/liveouts/killed values.

Hi,

What is the register-based bytecode called?
Converting bytecode to cfg and ssa can be very difficult to do depending on what bytecode it is.

Kind Regards

James

@stephenneuendorffer are you referring to an existing dialect here? Or is it a different form of the “raw” dialect?
Could you please expand on the benefits of this approach? I must be missing something as I don’t yet see how it would help.

Thank you!

In this case, it’s bytecode from mRuby VM (not to be confused with the MRI Ruby bytecode, which is different).
@jcdutton do you have any references (or keywords even) describing the problems of CFG/SSA construction for bytecode? As far as I can see, mRuby’s bytecode is “trivially” convertible in those forms, but I could be missing something :slight_smile:

Thank you!