Integrate logic synthesis and optimisation capabilities

Hi there,

I’m thinking of putting my logic synthesis and optimization knowledge to good use by integrating some techniques in CIRCT. I believe that adding such capabilities are very much in scope with the project—giving the willingness to use ABC as a library to map and optimize the comb dialect.

I see two possible paths to get this project going:

  1. Using mockturtle as a library, implement a translations between CIRCT comb dialect and mockturtle’s logic network, in-memory, format. (I’m extremely biased to use mockturtle instead of ABC because I was deeply involved with the development—but I have done my share of contributions to ABC as well)

  2. Implement the techniques to work directly with CIRCT IR. (I’m quite curious to see how these would compare against using the libraries, which have highly specialized data structures)

Either way, I think that the first step requires the ability to isolate the combinational parts in a hw.module and bit blasting them. Then we can either pass it to a library or apply the techniques directly.

So, I wanted to know if there are people already working on something in these directions. If yes, I can help. Otherwise, I can kickstart this, in which case, I would like to know if there is someone who is willing to guide me a bit. (Basically, I would need someone to help me aligning my design decisions with how things are done in CIRCT and point me to right directions wrt the codebase)

3 Likes

As far as I know, no efforts have begun in this space, but there have been some discussions and general interest. I’m curious what others think, but path 1. might be a good place to start. We have examples of building CIRCT with optional dependencies on third party C++ libraries (e.g. OR tools), so this might be the fastest path to get a PoC going. This could also shed some light on what path 2. might look like. I’m curious what information would be needed in the IR, what new IRs for this part of the stack could look like, etc. I can volunteer some spare time to help guide this effort, and I bet others will be interested as well.

I too would be very interested in this! I agree that option 1 would be the best way forward in the short term. (Though I too would be curious as to what an IR amenable to logic optimization would look like). I too would be happy to provide guidance!

As for ABC vs Mockturtle, I’ve heard since that conversation that ABC is somewhat difficult to use as a library. Mockturtle is far easier (as I’m told). And it seems to be comparable in results (based on some extremely limited internal testing).

As for output IR, what we’ve discussed in the past are different dialects for device primitives e.g. Intel ALMs, standard cell libraries, etc. For FPGAs (what I care about), I think a vendor agnostic kLUT (much like ABC/Mockturtle have?) dialect is a good starting place.

Thank you both for the replies.

I agree that the first path will be the fastest to get a POC.

I’m curious what information would be needed in the IR, what new IRs for this part of the stack could look like, etc.

My really high-level idea is to start by creating a pass that will isolate the combinational parts of a hw.module that a user is interested in optimizing. I suppose these combinational parts will be using the set (or a subset) of Comb operations. In order to isolate these parts and allow parallel optimization, we could create a structural operation, say comb_circuit, which will have a region where we throw in these combinational operations.

When I mentioned the possibility of the interesting parts being composed of subset of Comb operations, it is because a user might not want to optimize high-level operations, such as comb.add, together with the random logic (basically the control/glue logic)—since their structure will disappear.

Once we are done with isolating the parts, we need to do bit blasting. We could do this while translation from IR to the in-memory representation, but I’m not sure if it wouldn’t be easier to do it in the IR.

Then we do optimization, and translate things back. The result will be either a IR without its high-level structure (basically a sea of gates), or, if we map things to LUTs, a bunch of LUT operations. (I suppose we will need to introduce a LUT operation, e.g., comb.lut, which can take as an attribute a string that defines the truth table, say 0x8 (AND-2), and take as operands an appropriate number of i1 inputs).

Keep in mind that this is really a sketch taken from the top of my mind. I would be interested in knowing which direction previous discussion on this took. Also, it would be nice to have a few examples of not-so-trivial modules represented using CIRCT dialects.

2 Likes

I love this idea @boschmitt! There’s been a lot of interest to experiment with some form of logical synthesis and optimization for a long time, and also to lower comb ops like add/mul to a simpler form for analysis and synthesis.

I like your suggestion of introducing a new region op to encapsulate the comb logic we’d want to lower and optimize. That way we split the concerns of finding parts of the design we want to optimize/synthesize from the mechanics of actually doing that. And it would also allow us to be more selective on what we synthesize – maybe we only want to have parts of the circuit, or a subhierarchy, run through logic synthesis. (DUT vs. testbench code.)

Also :+1: on having a dedicated op that represents a piece of logic by its truth table. I’m pretty sure we could leverage that to speed up simulations in some case, by turning logic that is nasty to run on x86 into some simple lookup table.

Very excited! Let me know if I can help :slight_smile:

We have talked about having pure non-cyclic data-flow regions. I’ve been encouraging this to happen for a while. Graph regions (hw.module) are a bit hard to analyze with classic SSA algorithms. The various procedural regions have annoying side-effect related ordering which would be nice to avoid. This discussion is sometimes framed in terms of what the middle-dialect would be that lowering passes through before hitting hw/comb/sv and at which point we would do most transformations.

On library usage, there are obvious license questions, but beyond that, anything pulling in an external library probably has to be optional (conditionally compiled). This isn’t a huge deal, but the major functionality shouldn’t impose external requirements.

I don’t know if LUT operations should go in comb, or in a synthesis dialect. I suspect we’d quickly want target-specific operations.

I think @fabianschuiki 's arc dialect work does a bunch of the clustering and data-flow graph isolation you want.

A good way forward is to give a talk at a weekly design meeting (Wednesdays at 1 CST).

It’s been a while since I worked in EDA/logic synthesis so bear with me as I’m trying to get some idea of where open source projects in this space are at this point. I’m trying to get a handle on MLIR/CIRCT as it seems important in this space. It looks like a lot of progress has been made but tying things together seems to still be an issue…

As for the IR, I recently found out about firrtl which apparently came out of Chisel. Is that something that’s relevant to this discussion? Apparently CIRCT can already output firrtl? Are all of the firrtl tools in Scala (which would give mockturtle the edge here as it’s already in C++)? Any comparisons of firrtl to mockturtle’s format?

I was apparently dead wrong on the comparable results. Here are some results from two internal benchmarks. There are some additional passes being run before and after the logic optimization (I’m not at liberty to give further details other than this is a non-CIRCT project). This is after running Quartus synthesis.

tc1 tc2
ALMs Regs ALMs Regs
Baseline 9734 100.00% 16745 85061 100.00% 142722
Mockturtle 9539 98.00% 16479 82907 97.50% 142651
ABC 8982 92.30% 14764 81892 96.30% 138848

So I would encourage this new subsystem to be pluggable, though it might be the case that ABC and MockTurtle are sufficiently different that the greatest common denominators are the input comb dialect (+ the region which you are proposing) and the LUT op you propose for the output.

I know nothing about logic synthesis, ABC, or MockTurtle.

1 Like

I think this and the last discourse topics encapsulate all the relevant discussion. (Other than the interest voice in meetings.)

There are plenty of large open-source Chisel designs which could be used as benchmarks. Unfortunately, the only large designs I have in non-FIRRTL dialects are all internal/proprietary.

I also don’t have a strong opinion about this but if it’s a general k-LUT op, I would lean towards the comb dialect. It is the most generic combinational operation after all! Target-specific operations should go in target-specific dialects. I’d be curious to see how many target-specific transforms could be done purely based on the generic LUT op. I’m pretty sure we could do LUT packing (e.g. into Intel FPGA ALMs) just from generic LUTs but it could easily be the case that we get better results if the logic optimizer knows about ALM/LAB carry chains (which would be in a target-specific dialect), for instance.