[RFC] New `stack` dialect

I’m not sure the intent of this proposal was to model local-variable stacks of programs but instead to model stack-based programming models like Java Bytecode or WebAssembly. While both are important things to model, as you illustrated local-variable stacks need random access to the stack while stack-based programming models do not, so I think those may need two different abstractions because they are I think not at the same level of abstraction.

I think a general-purpose alloca-like mechanism would be nice for your use case, as a lazy stack approach like you describe may cause issues for example in loops, I think (unless I’m missing something you’d want to do). But this is probably for another discussion.

Maybe I should post it as another RFC, but it was an idea that I thought of while reading through this RFC. So I didn’t think about it in depth just yet, but I still think this feature would be useful for language developers.
In order to avoid stack overflow inside loops, lowering pass would require to hoist the allocation out of the loop. It would be required process for lowering stack dialect into llvm dialect.

For example,

...
scf.for ... {
  stack.push %some_value
  ...
}

would be required to be translated to something like

entry:
  ...
  %c4 = llvm.constant 4
  %ptr = llvm.alloca %c4

cond(%arg : index):
 ...

loop_body : 
  llvm.store %some_value, %ptr : i32, !llvm.ptr

However, from the view of compiler writers who uses stack dialect, push operation inside the loop or any other scoped based operation would look like it’s freed when scope terminates. Handling actual allocations or hoisting would be done under the hood.

Indeed the kind of stack I’m trying to model with this proposal seems different from the one described by @jwkim98, which is related to the automatic allocation of space for local variables in languages like C.

As mentioned by @Moxinilian my intention is to model languages where no random access is allowed to the stack.

My latest thoughts on this are the following:
A dialect intended as the last stage of code generation is going to be very different from a dialect intended as an intermediate representation.

What I mean by this is that compiling an SSA-form module down to stack-based instructions has fundamentally different requirements than doing the opposite, which is what my proposal was originally targeting.

So what I am saying is that maybe we could split this into two.
A “stack machine language” has two sides: one side is the semantics of the instructions, separate from their stack behavior. An example of this would be to model the JVM instruction athrow as jvm.athrow : (jvm.objectref) -> (), without the detail of it actually getting its operand from the stack.

The other side is the language that is actually interpreted by the machine, in which all instructions push and pop from the stack, like the one I presented in my initial example. In this case, the operation would look like jvm.athrow : () -> ().
A user would need to provide a dialect for modelling this side if they wanted to lower their SSA form into this language.
This dialect would need to implement some form of interface that allows a generic pass to translate SSA operations into stack operations, and to translate the stack operations into the generic stack dialect + the language dialect without stack operations.

I feel like the second dialect is redundant somehow? If one has a real Java athrow bytecode operation and wants to import it into MLIR, wouldn’t it make sense to directly emit an SSA-based operation + stack dialect operations? Have you seen or do you expect a use for a higher-level dialect where the stack operations are merged with the actual logic operation?

The “analysis workflow” would look something like this:

  1. You import the binary into fused operations
  2. A pass converts the fused operations into stack + regular SSA operations
  3. The rewrites described in the first post eliminate the stack operations and leave a clean SSA module

If you don’t need anything else, you can actually just ignore the first two steps, not provide a fused dialect and just emit stack + SSA directly.

The issue comes when you want to turn SSA into stack language for codegen purposes, without a fused dialect: you would be stuck trying to pattern match against raw push and pops, or against the SSA directly. It’s possible, but it would be nice to have a way to possibly optimize the stack form in MLIR directly

1 Like

Are you considering lowering your dialect towards LLVM? or is it just intended to be used only for stack-based VMs? Your proposal looks reasonable when it comes to model stack based language such as WebAssembly or JVM as you mentioned, but I cannot think of straightforward way to model it towards LLVM which can run natively.

If you’re trying to lower this into LLVM, then we would need some logic for converting stack push & pops back to SSA based operations, where we can model local variable array as alloca operation.

I intended the stack dialect to be purely temporary – once you have gotten rid of it via the rewrites and your module is in standard SSA form, you can do whatever you want with it, maybe lower it to LLVM. Point is, I was not imagining this dialect to be lowered to LLVM directly.

This RFC seems to be having a nice discussion of the topic in general, which is great. When it comes to actually contributing code, I too would find it better to evaluate a concrete proposal as part of a real implementation or prototype, versus just looking at a piece in isolation.

These mid level dialects have a tendency to not be as universally applicable as generic infra as it is thought at the outset (ie. When you want them, you want to build them precisely for the thing you are solving), or they are widely applicable in theory but are very complicated to build generically in detail. A concrete use as part of a real implementation can shed a lot of light on that.

I think this is also why some are asking here “does this lower to llvm?” It doesn’t have to, but working that out in detail is a well traveled path that puts a lot of concrete design pressure on a concept.

Our team was working on a short-term experimental project where we built MLIR from Java bytecode and performed simple IPO analysis to infer the read-only method property on this representation.

We implemented special bytecode dialects:

  • ByteCodeDialect (each Java bytecode instruction is mapped to the appropriate dialect operation)
  • ByteCodeToStackMachine (dialect introduces explicit stack operations)

And then we created a lowering for the Java bytecode dialect that

  • Introduces explicit control flow;
  • Replaces stack and local operations with SSA;

This representation preserves all the semantics of the original bytecode, but it’s much easier to work with. So on this SSA form MLIR representation we were able to infer some trivial properties of the translated Java method with a potential for other new types of analysis.

2 Likes