[RFC] New `stack` dialect

Moxinilian · May 11, 2024, 7:32am

I’m not sure the intent of this proposal was to model local-variable stacks of programs but instead to model stack-based programming models like Java Bytecode or WebAssembly. While both are important things to model, as you illustrated local-variable stacks need random access to the stack while stack-based programming models do not, so I think those may need two different abstractions because they are I think not at the same level of abstraction.

I think a general-purpose alloca-like mechanism would be nice for your use case, as a lazy stack approach like you describe may cause issues for example in loops, I think (unless I’m missing something you’d want to do). But this is probably for another discussion.

jwkim98 · May 11, 2024, 9:14am

Maybe I should post it as another RFC, but it was an idea that I thought of while reading through this RFC. So I didn’t think about it in depth just yet, but I still think this feature would be useful for language developers.
In order to avoid stack overflow inside loops, lowering pass would require to hoist the allocation out of the loop. It would be required process for lowering stack dialect into llvm dialect.

For example,

...
scf.for ... {
  stack.push %some_value
  ...
}

would be required to be translated to something like

entry:
  ...
  %c4 = llvm.constant 4
  %ptr = llvm.alloca %c4

cond(%arg : index):
 ...

loop_body : 
  llvm.store %some_value, %ptr : i32, !llvm.ptr

However, from the view of compiler writers who uses stack dialect, push operation inside the loop or any other scoped based operation would look like it’s freed when scope terminates. Handling actual allocations or hoisting would be done under the hood.

frabert · May 11, 2024, 10:22am

Indeed the kind of stack I’m trying to model with this proposal seems different from the one described by @jwkim98, which is related to the automatic allocation of space for local variables in languages like C.

As mentioned by @Moxinilian my intention is to model languages where no random access is allowed to the stack.

My latest thoughts on this are the following:
A dialect intended as the last stage of code generation is going to be very different from a dialect intended as an intermediate representation.

What I mean by this is that compiling an SSA-form module down to stack-based instructions has fundamentally different requirements than doing the opposite, which is what my proposal was originally targeting.

So what I am saying is that maybe we could split this into two.
A “stack machine language” has two sides: one side is the semantics of the instructions, separate from their stack behavior. An example of this would be to model the JVM instruction athrow as jvm.athrow : (jvm.objectref) -> (), without the detail of it actually getting its operand from the stack.

The other side is the language that is actually interpreted by the machine, in which all instructions push and pop from the stack, like the one I presented in my initial example. In this case, the operation would look like jvm.athrow : () -> ().
A user would need to provide a dialect for modelling this side if they wanted to lower their SSA form into this language.
This dialect would need to implement some form of interface that allows a generic pass to translate SSA operations into stack operations, and to translate the stack operations into the generic stack dialect + the language dialect without stack operations.

Moxinilian · May 11, 2024, 10:35am

I feel like the second dialect is redundant somehow? If one has a real Java athrow bytecode operation and wants to import it into MLIR, wouldn’t it make sense to directly emit an SSA-based operation + stack dialect operations? Have you seen or do you expect a use for a higher-level dialect where the stack operations are merged with the actual logic operation?

frabert · May 11, 2024, 10:46am

The “analysis workflow” would look something like this:

You import the binary into fused operations
A pass converts the fused operations into stack + regular SSA operations
The rewrites described in the first post eliminate the stack operations and leave a clean SSA module

If you don’t need anything else, you can actually just ignore the first two steps, not provide a fused dialect and just emit stack + SSA directly.

The issue comes when you want to turn SSA into stack language for codegen purposes, without a fused dialect: you would be stuck trying to pattern match against raw push and pops, or against the SSA directly. It’s possible, but it would be nice to have a way to possibly optimize the stack form in MLIR directly

jwkim98 · May 11, 2024, 11:07am

Are you considering lowering your dialect towards LLVM? or is it just intended to be used only for stack-based VMs? Your proposal looks reasonable when it comes to model stack based language such as WebAssembly or JVM as you mentioned, but I cannot think of straightforward way to model it towards LLVM which can run natively.

If you’re trying to lower this into LLVM, then we would need some logic for converting stack push & pops back to SSA based operations, where we can model local variable array as alloca operation.

frabert · May 11, 2024, 12:18pm

I intended the stack dialect to be purely temporary – once you have gotten rid of it via the rewrites and your module is in standard SSA form, you can do whatever you want with it, maybe lower it to LLVM. Point is, I was not imagining this dialect to be lowered to LLVM directly.

stellaraccident · May 11, 2024, 4:27pm

This RFC seems to be having a nice discussion of the topic in general, which is great. When it comes to actually contributing code, I too would find it better to evaluate a concrete proposal as part of a real implementation or prototype, versus just looking at a piece in isolation.

These mid level dialects have a tendency to not be as universally applicable as generic infra as it is thought at the outset (ie. When you want them, you want to build them precisely for the thing you are solving), or they are widely applicable in theory but are very complicated to build generically in detail. A concrete use as part of a real implementation can shed a lot of light on that.

I think this is also why some are asking here “does this lower to llvm?” It doesn’t have to, but working that out in detail is a well traveled path that puts a lot of concrete design pressure on a concept.

aleks-tmb · May 13, 2024, 2:49pm

Our team was working on a short-term experimental project where we built MLIR from Java bytecode and performed simple IPO analysis to infer the read-only method property on this representation.

We implemented special bytecode dialects:

ByteCodeDialect (each Java bytecode instruction is mapped to the appropriate dialect operation)
ByteCodeToStackMachine (dialect introduces explicit stack operations)

And then we created a lowering for the Java bytecode dialect that

Introduces explicit control flow;
Replaces stack and local operations with SSA;

This representation preserves all the semantics of the original bytecode, but it’s much easier to work with. So on this SSA form MLIR representation we were able to infer some trivial properties of the translated Java method with a potential for other new types of analysis.

Topic		Replies	Views
MLIR News, 66th edition (27th May 2024) Newsletter llvm-weekly	0	448	May 27, 2024
MLIR News, 10th edition (6/26/2020) Newsletter	0	1424	June 15, 2020
[RFC] Splitting the Standard dialect MLIR	13	2852	December 5, 2020
[RFC] `index` dialect MLIR	52	2088	October 18, 2022
MLIR News, 27th edition (2/19/2020) Newsletter	0	1072	February 8, 2021

[RFC] New `stack` dialect

Related Topics