how to map binary code with LLVM IR

Hi,

    I have some applications, which have been compiled into llvm IR and then linked into executable programs. I have some static information got from analysing llvm IR, and some dynamic information, like which binary branch is taken, from hardware sampler. I am wondering whether there are some ways to map binary code with LLVM IR. The only way I know is to use debug info, since both llvm IR and binary code can be mapped to source code. I feel this method is not precise. Are there some other methods?

    Best,

    Thanks a lot!

                                                                                                     Linhai

Hi Linhai,

Hi,

    I have some applications, which have been compiled into llvm IR and then
linked into executable programs. I have some static information got from
analysing llvm IR, and some dynamic information, like which binary branch is
taken, from hardware sampler. I am wondering whether there are some ways to map
binary code with LLVM IR. The only way I know is to use debug info, since both
llvm IR and binary code can be mapped to source code. I feel this method is not
precise. Are there some other methods?

I think you should use debug info, however you aren't obliged to use debug info
coming from the original language source code: you could create artificial debug
info from the IR file. For example for each instruction in the IR file, you
attach debug info to it that says that the "file" is the given IR file, the
"function" is the function the instruction is in, and the "line" is the line
within the IR file of the instruction.

Ciao, Duncan.

Hi all,

i'm just starting out with LLVM (although i've been observing its evolution since that first release some years ago :slight_smile:

I would like to develop a backend for a generic assembly-like language, called NAC (N-Address Code). More info on NAC can be found here:
NAC (N-address code) programming language reference (HTML)
http://www.nkavvadias.com/hercules/nac-refman.pdf (PDF)

You can consider NAC similar to an LLVM subset for hardware synthesis. Although NAC was developed independently, certain decisions taken when designing it, may or may not defer from other textual IRs like LLVM or PTX.

I have a number of questions:

1) It seems to me that a template backend either the C backend or NVPTX are possible starting points. However, only NVPTX makes use of tblgen which removes some burder from the backend developer. I also think that a tblgen-based backend like NVPTX is easier to maintain in the long term. What do you think?

2) NAC uses the following statement form (for any given statement):

dst1, dst2, ..., dstm <= op src1, src2, ..., srcn;

which expresses an operation op with n source operands and m destination operands. Do you think that tblgen supports such form or should I sanitize it?

3) The NAC memory model uses separate address spaces per array. A general-use stack/heap might also be supported. Should I use dot directives to declare each address space in use for a given translation unit/module?

4) What is the maximum that I can get from tblgen? Which C++ source files cannot be generated by .td files and always have to be coded by hand?

In the end, source code for the LLVM->NAC backend and a NAC interpreter will be released, probably as third-party BSD-licensed tools.

Best regards
Nikolaos Kavvadias

Hi all,

i'm just starting out with LLVM (although i've been observing its evolution
since that first release some years ago :slight_smile:

I would like to develop a backend for a generic assembly-like language,
called NAC (N-Address Code). More info on NAC can be found here:
NAC (N-address code) programming language reference (HTML)
http://www.nkavvadias.com/hercules/nac-refman.pdf (PDF)

You can consider NAC similar to an LLVM subset for hardware synthesis.
Although NAC was developed independently, certain decisions taken when
designing it, may or may not defer from other textual IRs like LLVM or PTX.

I have a number of questions:

1) It seems to me that a template backend either the C backend or NVPTX are
possible starting points. However, only NVPTX makes use of tblgen which
removes some burder from the backend developer. I also think that a
tblgen-based backend like NVPTX is easier to maintain in the long term. What
do you think?

Hi Nikolaos,

The C backend (that is by the way deprecated) is the exception rather
than the rule: most targets use TableGen extensively (look at all the
lib/Target/**/*.td).

As for your other questions, I'm no expert, but you might want to take
a look at the available tutorials
(http://llvm.org/docs/WritingAnLLVMBackend.html and
http://jonathan2251.github.com/lbd/).

-- Ahmed Bougacha

> Hi all,
>
> i'm just starting out with LLVM (although i've been observing its
evolution
> since that first release some years ago :slight_smile:
>
> I would like to develop a backend for a generic assembly-like language,
> called NAC (N-Address Code). More info on NAC can be found here:
> NAC (N-address code) programming language reference (HTML)
> http://www.nkavvadias.com/hercules/nac-refman.pdf (PDF)
>
> You can consider NAC similar to an LLVM subset for hardware synthesis.
> Although NAC was developed independently, certain decisions taken when
> designing it, may or may not defer from other textual IRs like LLVM or
PTX.
>
> I have a number of questions:
>
> 1) It seems to me that a template backend either the C backend or NVPTX
are
> possible starting points. However, only NVPTX makes use of tblgen which
> removes some burder from the backend developer. I also think that a
> tblgen-based backend like NVPTX is easier to maintain in the long term.
What
> do you think?

Hi Nikolaos,

The C backend (that is by the way deprecated) is the exception rather
than the rule: most targets use TableGen extensively (look at all the
lib/Target/**/*.td).

It really depends on your goals and the abstractions provided by the target
IR. For NVPTX, we use a traditional LLVM back-end approach (using TableGen
and the SelectionDAG/MachineInstr/MC infrastructure) because PTX is
low-level enough to be treated as an assembly language. There are some
higher-level features available in the language, but we do not use those.
If your IR is more like LLVM IR, then it may make sense to do some kind of
direct translation from LLVM IR to your IR and bypass the traditional
codegen approach.

Another point to consider is how much optimization you want. For PTX, we
have to walk a fine line compared to other back-ends because PTX is not the
machine ISA. A separate assembler is used to convert to the real machine
ISA, and that assembler performs optimizations of its own! We want to
optimize the PTX we generate, but we also do not want to prevent
optimizations that may be performed by the assembler. If your IR is meant
for optimization passes, then it may not make sense to perform all of the
back-end optimizations available through LLVM, but just use the middle-end
optimizations performed on LLVM IR directly.

Hi Ahmed

Hi Nikolaos,

The C backend (that is by the way deprecated) is the exception rather
than the rule: most targets use TableGen extensively (look at all the
lib/Target/**/*.td).

Thanks for the tip, so I remove this from my short list.

As for your other questions, I'm no expert, but you might want to take
a look at the available tutorials
(http://llvm.org/docs/WritingAnLLVMBackend.html and
http://jonathan2251.github.com/lbd/).

I also had an eye on this tutorial. It seems to go to extensive detail for a backend of the "CPU0" architecture which appears to be a reasonable sample RISC.

PTX is also interesting since it is a close fit due to the fact that it is a virtual machine target, as NAC (N-Address Code).

Best regards
Nikolaos Kavvadias

Hi Justin

It really depends on your goals and the abstractions provided by the target
IR. For NVPTX, we use a traditional LLVM back-end approach (using TableGen
and the SelectionDAG/MachineInstr/MC infrastructure) because PTX is
low-level enough to be treated as an assembly language.

I consider NAC (N-Address Code) to be comparably low-level to PTX (AFAICS from PTX ver. 3.1 PDF).

If your IR is more like LLVM IR, then it may make sense to do some kind of
direct translation from LLVM IR to your IR and bypass the traditional
codegen approach.

It seems to be that NAC has more resemblance to PTX than LLVM. That, there is e.g. a PHI instruction that works in the same way as in LLVM.

I gave thoughts of writing an external translator, but i'm afraid that it would be obsoleted if some dramatic change happens to the LLVM infrastructure. A backend would be maintainable in the longer term.

For PTX, we have to walk a fine line compared to other back-ends because PTX is not the machine ISA.

Similarly, NAC is not a specific ISA but a generic, assembly-like, input to a high-level synthesis process, mostly involving CDFG mapping, operation scheduling, state assignments, resource allocation and RTL code generation. The result is a hardware-only (non-programmable) architecture that implements the input program.

Regarding machine-independent optimizations, I would basically use whatever is supported by LLVM's "opt". Target-dependent optimizations, do happen, but as part of the high-level synthesis backend.

If your IR is meant
for optimization passes, then it may not make sense to perform all of the
back-end optimizations available through LLVM, but just use the middle-end
optimizations performed on LLVM IR directly.

I agree, this is my intention. The IR is meant as an LLVM simplification and subset for hardware synthesis.

Best regards
Nikolaos Kavvadias

Hi Justin

I just came across your presentation regarding the LLVM 3.0 (?) PTX backend:
http://llvm.org/devmtg/2011-11/Holewinski_PTXBackend.pdf

Is this the NVPTX backend or the predecessor (in PTX? directory) backend?

Best regards
Nikolaos Kavvadias

Hi Justin

I just came across your presentation regarding the LLVM 3.0 (?) PTX
backend:
http://llvm.org/devmtg/2011-**11/Holewinski_PTXBackend.pdf&lt;http://llvm.org/devmtg/2011-11/Holewinski_PTXBackend.pdf&gt;

Is this the NVPTX backend or the predecessor (in PTX? directory) backend?

It's the predecessor (removed in 3.1), but most of the concepts are the
same. The short version of the history is that I developed the original
PTX back-end with a few other people (PTX back-end in 3.0/3.1) and it was
later replaced by the NVIDIA version of the back-end around May of 2012.

The NVIDIA version is a much more complete version, as it is actually used
in production compilers as opposed to being a small university project. :slight_smile: