I’ve looked around in the documentation, and I can’t see anywhere where there is a backend plugin capability for LLVM. I’d like to be able to get the output of the instruction selector along with the LLVM IR, or perhaps instrument that.
Is there any capability to have a backend plugin in LLVM at all?
Perhaps what is necessary is to manually drive the backend from the perspective of a frontend plugin, so that I can turn off the default backend, and obtain the results of the backend within my frontend plugin. I don’t like this as much though, because it requires that I dig into how LLVM stitches together it’s phases.
It sounds like you want to write a MachineFunctionPass as a plugin, and run it in the middle of the pass pipeline of an existing backend? No, there isn't any support for that; RegisterStandardPasses only works on IR.
Yeah, I just discovered MachineFunctionPass. I don't know that I want to
run it in the middle of the pass pipeline of an existing backend so much as
I want the desired target to run and then have my machine function pass
run. If I can get the address of every machine instruction I will be very
If you're inserting instrumentation, you need to emit the instrumented code somehow, which implies the "middle" of the pass pipeline of an existing backend (although maybe pretty close to the end). And if you're not inserting instrumentation, I'm not sure what you mean by the "address" of an instruction.
If you’re inserting instrumentation, you need to emit the instrumented code somehow, which implies the “middle” of the pass pipeline of an existing backend (although maybe pretty close to the end). And if you’re not inserting instrumentation, I’m not sure what you mean by the “address” of an instruction.
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
By address, I mean the selected location in the binary. I need the ground truth for other static analyses.
Kenneth Adam Miller wrote:
By address, I mean the selected location in the binary. I need the
ground truth for other static analyses.
That's not determined until instructions are encoded for the object
file, which is pretty deep down in MC. I can imagine a couple of ways
to get the info you want, but it's not pretty. You could emit a label
for every instruction, and then work out the label offsets later on;
you might also be able to collect section offsets as the encoded
instructions are emitted to the object file.
But really, it seems easier to do a disassembly on the object file,
instead of trying to collect the information during compilation.
Disassembly is undecidable, that's why the ground truth from the compiler
is so desireable. Even with debug symbols, there can be code fragments that
exist beyond the visible terminator of a function boundary, or functions
that are not aligned so that linear sweep can have false negatives. It's
hard to validate binary corpora against anything else, because even with
symbol information, all you can get is the function entrance. The dwarf
debug info is even harder to retrieve as a list of addresses, because
that's such an arbitrary and skeletal API that is used by different
languages very very differently.