Creating a tablegen backend

Hello.

I would like to create a new backend for tablegen that produces code in
an generic assembly language (not for a real processor). The documentation
page for this says "ToDo", but I think what I want to do is something similar
to CodeGenDAGPatterns::ParseInstructions() in CodeGenDAGPatterns.cpp.
Is this even vaguely correct? Any pointers would be appreciated.

I'm specifically interested in translating X86 into this other language, but
getting other processors for free in the process would be great.

Durward,
Why not use LLVM-IR as your 'generic assembly'?

Micah

Why not use LLVM-IR as your 'generic assembly'?

I should have been more specific about my goals. I want to read
a (x86) binary, and output code in this other language. I do not have
access to the source. The target language might be negotiable, but
for the moment is fixed at what it is. My big hope is that I can leverage
the semantics that are stored in the X86*.td files, hopefully by implementing
a small number of primitives, rather than by having to analyze every X86
instruction myself (not going to happen).

If there is a convenient way to convert x86 binary into LLVM-IR, that would
be an interesting avenue to explore. I briefly looked at a post about an
LLVM-based decompiler, and even compiled and ran it, but it did not appear
to be mature enough for our needs yet.

For the record, I am not concerned about parsing ELF, doing relocations,
etc. Right now I am just focused on converting the binary.

Thanks for your response.

Durward McDonell <durward.mcdonell@gmail.com> writes:

Why not use LLVM-IR as your 'generic assembly'?

I should have been more specific about my goals. I want to read
a (x86) binary, and output code in this other language. I do not have
access to the source. The target language might be negotiable, but
for the moment is fixed at what it is. My big hope is that I can leverage
the semantics that are stored in the X86*.td files, hopefully by implementing
a small number of primitives, rather than by having to analyze every X86
instruction myself (not going to happen).

This is a hard problem. By the time you've hit x86 assembly you've lost
a lot of information. That said, if I were going to attempt this, I
would look into the LLVM Disassembler code and work off of that. Then
all of the section identification, instruction decoding, etc. is already
handled.

The main issue with this kind of tool is data that also act as code.
You would have to identify such data (hard enough itself) and then
probably duplicate the instructions into your target instruction stream.

                              -Dave