A Xilinx Primitives Dialect

I have been experimenting with adding xilinx primitive supports in circt. I wonder if people around here thinks there is much value bring it into circt? I think it would make it tremendously easier to write dialects / passes that makes use of xilinx primitives directly.

My experimental branch will allow circt to compile the following IR:

hw.module @Foo(%a: i1) -> (%o: i1) {
  // LUT6 is a xilinx primitive for a single-output LUT
  %1 = xilinxPrimitives.LUT6 %a, %a, %a, %a, %a, %a  { INIT=0x0 } : i1
  // LUT6_2 is a xilinx primitive for a 2-output LUT...
  %2, %3 = xilinxPrimitives.LUT6_2 %a, %a, %a, %a, %1, %1  { INIT=0x8000000000000000 } : (i1, i1)
  hw.output %1 : i1
}

into verilog instantiations of FPGA primitives:

% ./bin/circt-opt --lower-xilinx-primitives-to-hw ./a.mlir | ./bin/circt-translate -export-verilog 
// external module LUT6_2

// external module LUT6

module Foo(
  input  a,
  output o);

  wire prim_1_O5;	// <stdin>:6:30
  wire prim_1_O6;	// <stdin>:6:30
  wire prim_0_O;	// <stdin>:5:17

  LUT6 #(
    .INIT(64'd0)
  ) prim_0 (	// <stdin>:5:17
    .I0 (a),
    .I1 (a),
    .I2 (a),
    .I3 (a),
    .I4 (a),
    .I5 (a),
    .O  (prim_0_O)
  );
  LUT6_2 #(
    .INIT(64'd9223372036854775808)
  ) prim_1 (	// <stdin>:6:30
    .I0 (a),
    .I1 (a),
    .I2 (a),
    .I3 (a),
    .I4 (prim_0_O),	// <stdin>:5:17
    .I5 (prim_0_O),	// <stdin>:5:17
    .O5 (prim_1_O5),
    .O6 (prim_1_O6)
  );
  assign o = prim_0_O;	// <stdin>:5:17, :7:5
endmodule

Note that the pass implicitly inserts hw.module.extern declarations!

The steps from “specification” to (hopefully) synthesizable verilog:

  1. Get the open-source Xilinx Unisim verilog files (Apache license). The verilog modules that simulates Xilinx primitives are annotated with celldefine
  2. Write a simple verilog parser that picks out the relevant information (parameters, I/O ports). The parser will generate a ODS for the XilinxPrimitives dialect, ie: XilinxRawPrimitives.td, and a C++ file that describes all the I/O ports ie: XilinxRawPrimitives.cpp
  3. Create a pass that lowers xilinxPrimitives::{LUT3, LUT4, …} into simple hw.instance (Thank you MLIR for this dynamism)
  4. Generate verilog as usual. In fact, in my experiments, i did not have to modify ExportVerilog.cpp

Notably, there is no need for anyone to manually transcribe anything from pdf → MLIR :slight_smile: I suspect this approach is not hard to extend to other FPGA vendors.

There are still some pending issues from my experimentation.

  • On my laptop, libCIRCTXilinxPrimitives.a is 110MB. I think this can be reduced by half by sharing the verifier, but I can’t see going much lower easily.
  • Compilation times of the generated files are slow
  • I haven’t tested this with xsim yet.
  • For reasons I have not investigate, mlir-tablegen incorrectly handles def PCIE_3_0:... it generates a 3_0 class instead of PCIE_3_0.
  • the HW Dialect requires instantiations to all have names. This doesn’t really mean much for primitives.
  • idk if there are performance implications with having 300+ ops. A small number of them (PCIe blocks, GTH…) have >30 I/O ports. I probably would just filter out a small set of operations for a start…
  • I am unsure if adding a partial verilog parser is a good idea, maybe it’d make sense to clean up the files and put them at a repo somewhere else and be done with it (they probably will never change).
4 Likes

Hey, that’s pretty cool! I’m curious if the LLHD input path would make the parsing path more manageable? I imagine that over time, the details of the tablegen spec might change and it would be easier to regenerate this from scratch rather than munge the generated XilinxRawPrimitives.td

Super cool!

Yeah, we’ve talked about this in the past (Representing device data), but you’re the first to push on it! Is there a particular use-case you’re targeting?

The second problem is figuring out the encodings of the configuration RAM (INIT) bits! Then we could theoretically write a mapper. We’ve always said we want to replace as much of commercial EDA tooling as possible.

We had a talk from Chris Lavin at Xilinx a while back ( CIRCT Open Design Meeting - Zoom) during which he discussed the FPGA Interchange Project (SymbiFlow/fpga-interchange-schema (github.com)). There appears to be a schema intended to encode device primitive and routing data.

There may be a licensing issue if we want to put this in CIRCT mainline. The source data are Apache 2 but CIRCT is licensed under Apache 2.0 with LLVM Exceptions. Presumably we wouldn’t be committing the source data directly, but if it’s automatically converted… I don’t know what the license would end up being.

I don’t follow. Which names specifically?

+1. Tablegen (ODS) is intended to make hand-writing operations easier and breaking changes are not unheard of. If you want try to reduce the size of your dialect library, you can try skipping tablegen by generating the C++ code directly – there’s more flexibility there, though I don’t know how much tablegen fat there is to squeeze.

I’m sure this is something that is easily solved. :slight_smile:

Unless i’m missing something, i don’t think LLHD has a verilog parser checked into llvm/circt. (There is one in moore, but that requires pulling a dependency). I think either depending on something stable like verilog-perl or adding a verilog parser into the tree makes sense. If noone thinks that checking in (a small amount!) of perl code into circt is a code smell, i’ll probably start hacking on that.

My vague goal is to be able to provide enough infrastructure to start hacking on tooling to map with coarse placer annotations (think pblocks, or even clock region in Xilinx) and let the router do the rest, much like what you are suggesting in the linked discourse discussion.

Instead of LUT6 the_lut(.a(a), .b(a)...), we really only need LUT6 (.a(a), .b(b), .... But the HW dialect doesn’t support that (or at least i think it doesnt)

That’s no different from any other instance in SystemVerilog. In my experience, >80% of the time the instance names are technically unnecessary and just for human readability. They’re only necessary when digging into the instance hierarchy and even then CIRCT could autogenerate them and whatever instance hierarchy code poking is associated with the instance names.

I would eventually like to write a router as well! Intel, however, doesn’t provide the necessary wire info to do this and Intel is my primary target.

A thought here if you’re concerned about size of the resulting tool: you might be able to compile your dialect (and presumably conversion which uses it) as a shared object, then dynamically load it (like if the tool determines that it is necessary) at runtime. MLIR does have the ability to register dialects and passes dynamically…