Latency-Insensitive Finite State Machine

I am new in CIRCT. In calyx dialect, I encountered the concept latency-insensitive FSM which I am not familiar with. I have tried my best to search it on the internet but found nothing helpful. Can someone here provide some papers or reading materials about how latency-insensitive FSM is implemented?

I think the latency-insensitive FSM’s implementation is helped with centralized FSM controller and done signals just like Fig2c in the paper that proposed calyx. Is this Correct?

Hey, Calyx author here. By default, components and groups execute in a latency-insensitive manner, that is, we do not assume anything about how many cycles they take to finish executing (i.e., their latency).

Because of this, when Calyx lowers control operators like seq, par, if, etc., it needs to generate finite state machines (FSMs) that do not make assumptions about the latency of the groups executing within. Because of this, the FSM transitions purely based on the done signals of the group. For example, we if have:

seq { one; two; three; } 

We generate an FSM that looks like:

// activations
one[go] = fsm.in == 0 ? 1;
two[go] = fsm.in == 1 ? 1;
three[go] = fsm.in == 2 ? 1;
// transitions
fsm.in = fsm.out == 0 & one[done] ? 1;
fsm.in = fsm.out == 1 & two[done] ? 2;
fsm.in = fsm.out == 2 & threedone] ? 3;

Note how the FSM transitions need to wait for the group to signal that it is done before it can move to the next state. This is what we mean by latency-insensitive FSMs.

In contrast, if we know exactly how many cycles each group in the seq takes, we can generate a latency-sensitive FSM:

seq {
  A; // one cycle
  B; // two cycles
  C; // two cycles
}

We generate:

fsm.in = fsm.out + 1; // increment fsm every cycle
A[go] = fsm.out == 0;
B[go] = fsm.out > 0 & fsm.out <= 2;
C[go] = fsm.oit > 2 & fsm.out <= 4;

Hopefully this answers your question but feel free to ask more!

1 Like

Perfect! Another question to make it sure:
If one FSM state contains more than one group executing in parallel, will the latency of this state depend on the longest latency of these parallel groups? For example:

If we have:

seq { 
     par{one;another;}
     two; 
     three; 
} 

We will get a FSM like this:

// activations
one[go] = fsm.in == 0 ? 1;
another[go] = fsm.in == 0 ? 1;
two[go] = fsm.in == 1 ? 1;
three[go] = fsm.in == 2 ? 1;
// transitions
fsm.in = fsm.out == 0 & one[done] & another[done]? 1;
fsm.in = fsm.out == 1 & two[done] ? 2;
fsm.in = fsm.out == 2 & threedone] ? 3;

The first control step’s latency depends on the longer latency of one and another. Is this correct?

Yup, par implements fork-join parallelism so the par block is not done until all the groups within it have executed. In this case, if one takes 1 cycle and another takes 15 cycles, the par block will take 15 cycles to execute. two is not allowed to execute in the meantime.

The best way to figure out what the generated FSM looks like is using the native compiler. You can run the following command to dump out the representation for the latency-insenstive FSM generated for any Calyx program:

cargo run -- <file> -p tdcc -x tdcc:dump-fsm

Thanks! I’ll try it.