FIFO channel in CIRCT

Hi, everyone! I’m looking for a fifo channel like thing in CIRCT project.
Firstly I saw something in ESI dialect. I don’t know if I’m correctly understanding the concept of two ESI ops, which are esi.wrap.vr and esi.unwrap.vr. Can we consider them as send/get value to/from an esi channel? If it’s true, is it possible to send/get value to/from a single channel multiple times? For example, what if I want to get two values from a channel %0, then after running other ops I want to send three values to a channel %1.
Secondly, I saw the BufferOp of handshake dialect, which seems like to be the one. But the question stays the same: can one send/get multiple data to/from this?

Sorry for the delayed reply.

These two ops [un]wrap channels into wires – specifically valid/ready/data handshake wires. The concept of sending and receiving messages is an imperative notion. These ops are hardware/structural.

This is true of nearly all the dialects in CIRCT at the moment. You can’t think of (most of) them in procedural terms.

So, say, if I want to create a FIFO buffer to fullfill my design requirements for a Dataflow model. I need to handcraft it.

The bufferop in the handshake dialect should work for you, but I’m not an expert in the handshake (dynamic dataflow) dialect. @mortbopet ?

What I’m trying to understand is this. Say, in a KPN model, a process should be able to pull/push data from/to a FIFO channel multiple times (different data here, not any replica using ForkOp). But with handshake I don’t see any way to do such a thing. So basically I need to handscraft a FIFO component from scratch?

CIRCT doesn’t (yet) explicitly model processes, so the idea pulling and pushing data to a FIFO as a procedural op doesn’t make any sense. In the handshake dialect (being a dynamic dataflow model), FIFOs are implicit between ops and usually they’re not necessary for correctness. (If I’m understanding correctly.) A pass automatically determines where they are necessary and inserts bufferops. Pulling from and pushing to the bufferops is implicit (in handshake).

Yes, I’ve seen the insertions of BufferOp in the debug info.

Yet another problem, if this is implicit. How does it realize the behavior of multiple pulling and pushing? Let’s say, now I have something like this:

mydialect.func @add(%arg0 : !mydialect.channel<i32>) -> !mydialect.channel<i32>
{
  %0 = the first data in the channel
  %1 = the second data in the channel
  %2 = arith.addi %0, %1 : i32
  somehow push %2 into output channel
}

In this pseudo code snippet, if we want to take two different data from the channel, how could handshake dialect do something like this?

The handshake dialect seems not a Process Network abstraction to me, if the buffer can only be pulled or pushed only once inside one handshake.func.

It seems like maybe your mental model of the Handshake IR is not quite right. The IR inside the body of a handshake.func represents a dataflow graph, where tokens flow through infinitely sized buffers according to valid/ready handshakes. There is no imperative operation to “push” or “pull” a buffer. Pushing and pulling happens constantly, according to the valid/ready semantics, which are implicit in the handshake IR. This is more like hardware IRs than software IRs: values flow through the circuit, there is no von neumann machine stepping through each op in the body of a function.

To get to your specific example, and hopefully help push towards how this could work in handshake, here’s a slightly different example:

handshake.func @add(%arg0: i32, %arg1: i32) -> i32 {
  %0 = arith.addi %arg0, %arg1 : i32
  return %0 : i32
}

At this level of representation, the IR models two input channels of infinite size. One item from each channel is pulled and added together, according to the ready/valid handshake semantics. In this case, addi is a unit rate actor, so it will pull values when both arg0 and arg1 are valid, the output channel is ready, etc.

You can run circt-opt -handshake-insert-buffers to make buffering explicit in the IR, since infinite channels do not exist in hardware:

handshake.func @add(%arg0: i32, %arg1: i32) -> i32 {                                                                                                                                                       
  %0 = buffer [2] seq %arg1 : i32                                                                                                                                                                                                                                             
  %1 = buffer [2] seq %arg0 : i32                                                                                                                                                                                                                                             
  %2 = arith.addi %1, %0 : i32                                                                                                                                                                                                                                                
  %3 = buffer [2] seq %2 : i32                                                                                                                                                                                                                                                
  return %3 : i32
}                                                                                                                                                                                                                                                             

There are flags to control how buffers are materialized, and this is an interesting area of research.

To finally attempt to answer your question, what you really want is the ability to split one handshake channel of values into two channels which contain every other value from the source channel. So if such a hypothetical operation existed, you could write:

handshake.func @add(%arg0: i32) -> i32 {
  %0, %1 = handshake.split_every 1 %arg0 : (i32, i32)
  %2 = arith.addi %0, %1 : i32
  return %2 : i32
}

Handshake has a merge which does sort of the opposite: (nondeterministically) merge values from multiple channels into one. What you want is sort of that in reverse: split every n values from one channel into multiple channels (deterministically).

One approach here might be to add such an op, define its semantics, and implement a lowering into hardware dialects. Another approach might be to change how your dialect represents this kind of thing, so that it can explicitly lower into handshake dialect IR that has two channels like the first example I gave.

2 Likes

Thanks for your explanation! I’m currently putting my hope on a lowering into FIRRTL instead of handshake :wink:

Feel free to use whatever dialect(s) in CIRCT seem appropriate, but I would suggest looking at the HW, Comb, and Seq dialects if you plan to lower directly to a hardware representation. The FIRRTL dialect has made some choices based on how Chisel and the original FIRRTL compiler worked, and if you are coming into CIRCT from a fresh project, you might find it simpler to work with HW, Comb, and Seq directly. Is there something specific about FIRRTL that makes it attractive?

In my dialect, there are pull/push Ops like I said above. So the idea is to implement the fifo channel myself since handshake is not really what I want. Then I saw the Chisel3’s Queue module and I could easily get an already existed FIFO by re-using the firrtl code it generates. Once I lowered my dialect into firrtl then I could use firtool to generate the Verilog code. Besides, I think firrtl has a big community so that I may get answers to “whatever” confuses me. I wouldn’t say there is anything really special, but now it looks a pretty good approach that I could benefit from.

I’m glad to hear you’re interested in something like this! This was part of the original idea we had, which was to define another new operation (say, “handshake.process”) which would encapsulate a sequential region of code, along with new operations that would exist within the sequential region to read and write individual values from a stream. To my knowledge, nobody has actually done this.
My hope is that this would not require a new dialect to describe the handshake-style stream connections, but instead would focus on new operations that compose neatly with the existing ones.

For inspiration, you might want to look at:
https://ptolemy.berkeley.edu/books/Systems/
and:
https://kastner.ucsd.edu/hlsbook/

This might be a good time to mention that the dfg dialect Jiahong is helping lower to hardware is part of our EVEREST efforts, and that we have strong arguments for why that one in particular should remain separate from handshake. However, its lowering successors don’t, and if a lowering through handshake can be achieved using some minimal extension, that could be a path we take.
In short, I don’t think that handshake can nor should take up the responsibility of being ‘the’ KPN dialect considering its current design. But I do agree that we’d want that lowering in particular.

@stephenneuendorffer FYI, EVEREST is due to release a video on dfg and how it is consumed by PDM’s hardware generator soon. I had planned on bringing this up with you then, considering we might be talking about the TUD AIE flow soon.

I’m looking forward to hearing more :slight_smile:.