Questions about LLVM IR encoding

Hi all, I am new to LLVM (even the field of compiler) and currently I am engaged in the work of adapting LLVM IR to M5 simulator to observe the enhancement of the novel architecture we design.
Simply speaking if you know little about M5, my aim is to know how LLVM IR is interpreted and encoded, then try to implement it in the framework of M5.
I have read the LLVM documents, yet I still have some questions as follows:

  1. As IR is target-independent, how can we encode them into bit-code executable files without specific targets’ information?
  2. The “bitstream container format” as the document refers, is XML-like, and I wonder how does LLVM translate it into executable format such as bit-code.
    I have made several tests as the document suggests, but I conclude with nothing helpful to my work.
    The result of my experiment is attached.
    It is a simple add program, and from the bc file generated by llvm-bcanalyzer I cannot relate it to the original IR.
    For example, the expressions of
    <INST_STORE2 op0=6 op1=1 op2=3 op3=0/>
    <INST_STORE2 op0=5 op1=3 op2=3 op3=0/>
    actually represent those of
    store i32 1, i32* %a, align 4
    store i32 2, i32* %b, align 4
    in the .ll file.
    Despite the explicit opcode matches in the two forms, I am confused of the information of op0, op1 etc in the .bc file.
    What does it mean? This is not very clearly clarified in the document, or may I omit something?

Can anybody please lend me a hand?
Thanks very much sincerely!

Hi 张乐,

Hi all, I am new to LLVM (even the field of compiler) and currently I am engaged
in the work of adapting LLVM IR to M5 simulator to observe the enhancement of
the novel architecture we design.
Simply speaking if you know little about M5, my aim is to know how LLVM IR is
interpreted and encoded, then try to implement it in the framework of M5.
I have read the LLVM documents, yet I still have some questions as follows:
1. As IR is target-independent, how can we encode them into bit-code executable
files without specific targets' information?

I don't know what you mean by a "bit-code executable". However if you want to
run bitcode (eg: by compiling to a native program using llc, or using the JIT or
interpreter lli) then at that moment you need to specify the target if you did
not specify it before (the target can be specified in the bitcode).

2. The "bitstream container format" as the document refers, is XML-like, and I
wonder how does LLVM translate it into executable format such as bit-code.

You should have no need to deal directly with the on-disk bitcode format. Again
I am not sure what you mean when you call bit-code an "executable format".

Ciao, Duncan.

张乐 <yueguoguo1024@gmail.com> writes:

Hi all, I am new to LLVM (even the field of compiler) and currently I am engaged in the work of adapting LLVM IR to M5 simulator to observe
the enhancement of the novel architecture we design.
Simply speaking if you know little about M5, my aim is to know how LLVM IR is interpreted and encoded, then try to implement it in the
framework of M5.
I have read the LLVM documents, yet I still have some questions as follows:

1. As IR is target-independent, how can we encode them into bit-code executable files without specific targets' information?
2. The "bitstream container format" as the document refers, is XML-like, and I wonder how does LLVM translate it into executable format such
as bit-code.

If I'm understanding you correctly, you want to treat LLVM IR as an ISA
and simulate that in M5, writing a new M5 target to execute LLVM IR in
bitcode format. Is that right?

I don't think that's a good option. While it can theoretically be done,
you're going to miss very important machine aspects such as:

- limited register set
- alignment issues
- calling sequence
- instruction/text size

There are others, but all of the above can impact performance in major
ways. Not modeling them is going to give you very inaccurate results.

If you can express your architectural enhancement in one of the existing
targets, I would do that. At the very worst you can create new
instructions (say, for x86) and insert them into the asm stream with raw
.byte directives in the .s file. I've done that many times in the past.

Another option is to create a new Target in LLVM and define the ISA you
want to evaluate. This is a lot more work as it requires a new Target
in LLVM and a new implementation for M5 but it gives you more
flexibility and makes future changes easier.

I would go with one of the existing Targets if possible and add any new
instructions you need. That's going to give you the most realistic
results.

                          -Dave