c-like language implementation using llvm

Hi,

I have a project where I am creating a language which is very similar to C, but has some extensions. Among the most important are:

  1. A new 1-bit data-type
  2. Some gates: NOT, AND, etc… that act on arrays of this 1-bit type. (Think like a hardware description language.) These are like external function calls, which don’t have an implementation for now.

I don’t have a particular target, and would like to translate to the LLVM IR format for now. I want to leverage the power of LLVM SSA transformation/optimization framework. I would like to know your opinion about my approach:

  1. I have already added a new type to clang’s lexer/parser for my high-level language. Do I need to add a new type to LLVM IR as well? I read some warnings about this being difficult to implement. As far as I understand, there is structural typing in LLVM. Can I just utilize the i1 type? Is the i1 type already used for something, and thus might create a conflict?

  2. Can I use LLVM’s intrinsic functions for my gates, something like @llvm.NOT(i1 %val)? Would the fact that the implementation of these external functions is not defined, create a problem in translation down to IR? I have used Clang’s “built-ins” in the front-end and that seems to do the job at the parser level.

Thanks,
Ali

PS. Just to clarify again, this is not intended as an addition to LLVM’s codebase. It’s solely for my own purpose.

Can I just utilize the i1 type? Is the
i1 type already used for something, and thus might create a conflict?

I think you are very confused. LLVM's types are meant to be used to
represent *your* program :slight_smile: They can't be "already used".

2) Can I use LLVM's intrinsic functions for my gates, something like
@llvm.NOT(i1 %val)? Would the fact that the implementation of these external
functions is not defined, create a problem in translation down to IR? I have
used Clang's "built-ins" in the front-end and that seems to do the job at
the parser level.

Again, I think you are confused. Just use the operations that are
built into LLVM, such as
<LLVM Language Reference Manual — LLVM 18.0.0git documentation. One
slight point of confusion is that LLVM tries to be minimal so there is
no NOT instruction; `NOT X` is represented as `XOR X, -1`. You
probably won't need to add an intrinsic unless you are doing something
_very_ exotic.

-- Sean Silva

Hi Sean,

Can I just utilize the i1 type? Is the
i1 type already used for something, and thus might create a conflict?

I think you are very confused. LLVM’s types are meant to be used to
represent your program :slight_smile: They can’t be “already used”.

I am keeping all the types of C, and adding my new types. I thought that, for example, I can’t map my new type to i32 because that’s used for C integers. That’s what I meant by already used. Am I missing something?

  1. Can I use LLVM’s intrinsic functions for my gates, something like
    @llvm.NOT(i1 %val)? Would the fact that the implementation of these external
    functions is not defined, create a problem in translation down to IR? I have
    used Clang’s “built-ins” in the front-end and that seems to do the job at
    the parser level.

Again, I think you are confused. Just use the operations that are
built into LLVM, such as
<http://llvm.org/docs/LangRef.html#bitwise-binary-operations>. One
slight point of confusion is that LLVM tries to be minimal so there is
no NOT instruction; NOT X is represented as XOR X, -1. You
probably won’t need to add an intrinsic unless you are doing something
very exotic.

That’s a good point. However simple binary operations such as XOR are not the only ones I’ll be using. I have the Toffoli Gate for example which takes in 3 operands:
http://en.wikipedia.org/wiki/Toffoli_gate
I guess I can use intrinsics for these things? If I do, can I leave its behavior undefined? I would like it to appear just with the name toffoli in the IR.

Thanks,
Ali

I am keeping all the types of C, and adding my new types. I thought that,
for example, I can't map my new type to i32 because that's used for C
integers. That's what I meant by already used. Am I missing something?

If the type behaves like an i32 then use i32. For example, both signed
and unsigned int get lowered to i32.

That's a good point. However simple binary operations such as XOR are not
the only ones I'll be using. I have the Toffoli Gate for example which takes
in 3 operands:
Toffoli gate - Wikipedia
I guess I can use intrinsics for these things? If I do, can I leave its
behavior undefined? I would like it to appear just with the name toffoli in
the IR.

Just create the equivalent IR for the operation that the gate does;
that will give LLVM the ability to optimize the operations. If you use
an intrinsic, then LLVM won't be able to optimize the IR unless you
change the optimization passes to recognize the intrinsic (and even
then, the optimizations probably won't be as good).

-- Sean Silva

Hi there,

I asked this a few days ago and Sean gave a useful answer. Going back to the second question below, I want to clarify something.

Your hardware-level instruction set doesn’t need to be exactly represented in the LLVM IR. For example, when lowering C to LLVM IR, Clang converts ~x to xor %x, -1, this is then pattern matched during instruction selection to turn it into a hardware-level NOT opcode. You can do the same for the toffoli gate: generate equivalent logical operations in the instructions that LLVM supports, and then pattern match Toffoli gates during instruction selection.

From looking at Wikipedia, I see that Toffoli gate is just c XOR (a AND b). By expanding it to this, LLVM will automatically be able to apply its full arsenal of optimizations to optimize the program. For example, it will automatically simplify Toffoli(a,b,c) XOR Toffoli(b,a,c) to 0. If you add a new toffoli intrinsic or instruction, then you will have to teach LLVM to do these kinds of optimizations (and there are lots of them).

– Sean Silva