CodeExtractor status?

I am working on a pass to extract small regions of code to run somewhere else (different node in a cluster). Basically what I need is the ability to isolate a region of code, get its inputs and outputs, create a new function with the extracted code and code aggregating the in and out parameters as structs that can be cast for a “void*”-based interface.

It looks like the CodeExtractor (include/Transforms/Util/CodeExtractor.h) does nearly all of this, with the exceptions that I need to generate a different “call”, and I need to be able to separate the outputs and inputs.

I think I should be able to do what I want by making modifications after calling the CodeExtractor. But since there was a discussion (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-May/049405.html) about some uses that sound very similar to mine, such as offloading a kernel to an accelerator and branching between the original and extracted code, I wanted to see if any of this functionality had made its way into public branches (or other projects) that I can leverage.

Thanks!

Hi Brandon, That sounds a lot like what I’m doing. I’m not using the code extractor though. Maybe you want to share ideas :slight_smile: I have a tool to extract parts of code into new functions based on a given partitioning. The inputs to the tool are: 1. The sequential code in LLVM IR (we get this from clang). 2. A machine file that contains the specification of a physical architecture. For example, you can specify a single node with two quad-cores. Or a whole cluster with several nodes, each with two quad-cores and a FPGA accelerator board. 3. A file that maps each Basic Block to one of the architecture devices (you can also specify a general mapping for convenience, and only map a few blocks to your accelerators or different CPUs). Based on the partitioning and the architecture file, we extract BBs into functions and move these to different LLVM modules, one for each device of the architecture. Each module is then compiled with a machine-specific backend and against a device-specific communications library. All the executables can be run in a MIMD fashion in a cluster. The inputs and outputs are handled in two ways: a) By means of the virtual registers. When these traverse device boundaries, they are turned into function parameters. The compiler inserts marshalling/unmarshalling code as well as “server” and “client” stubs. b) By means of explicit prefetching (which we plan to compiler-automate in the future as well). This is used for data structures and dynamic memory. Essentially, things that need a “getelementptr” at some point. I never made this code available because it’s still a research thing, but your question awoke my interest. ¿Could you elaborate on what you intend to do? Cheers

Hi Pablo,

Your tool sounds really cool. It goes well beyond what I’m trying to do, which is really just extracting blocks of code, serializing and sending the inputs over to another core, running the code over there, and then sending the outputs back to the original caller (like an automatic RPC). So it sounds like most of the things your tool can do would be overkill for my use case.

What I’ve hacked up right now basically converts the inputs and outputs into their own structs and passes pointers and the sizes of the arguments to my communication library. And then, like the CodeExtractor, I generate the right getelementptr’s and loads.

Did you have issues with the Verifier complaining about function-local metadata after moving blocks? Did you find a good solution for this?

I also seem to be having issues with the Verifier’s DominatorTree analysis claiming that some of the instructions in the new function don’t dominate their uses in the new function, though they look like they do to me. This is probably a bug in my implementation, but if you remember specifically having to deal with regenerating Dominance information, that would be good to know.

I’m also interested in playing with branching between the original code or one of a few variations of the extracted RPC based on some runtime info (such as where global addresses resolve to). Just out of curiosity, have you played with dynamically choosing between different HW mappings based on runtime info?

Hi Brandon,

Did you have issues with the Verifier complaining about function-local metadata after moving blocks? Did you find a good solution for this?

It looks like you are moving blocks or instructions with attached metadata to another function. Some metadata is function-specific, so the verifier complains because the instruction/BB and the metadata don't belong to the same function. Try creating a new metadata node (MDNode) as a copy of the old one (retrieving the old MDNode operands with getOperand()). Then link it to the instruction/BB, and unlink the old MDNode. For this, you have to lookup the MDNode kind ID in the instruction's MDNode list, and use setMetadata(kind, newMDNode). I'm saying this by heart, try googling "replace MDNode" or something like that.

I also seem to be having issues with the Verifier's DominatorTree analysis claiming that some of the instructions in the new function don’t dominate their uses in the new function, though they look like they do to me. This is probably a bug in my implementation, but if you remember specifically having to deal with regenerating Dominance information, that would be good to know.

I didn't have to regenerate dominance info. As far as I recall, this is a matter of 1) fixing the branch instructions, 2) fixing the Phi nodes, 3) not adding BBs in the wrong place. If you send me an IR file and the dominance-related error message, I might spot the mistake.

If you get that error, it's because there is at least a path down the CFG where a virtual register "use" is not preceded by its "def". Are you sure that you don't see anything suspicious? You can see what your passes are doing to the code at any compilation stage with dump() (all "Value"s can be dumped), or viewCFG() in the case of functions. For me, this is like the printf() of LLVM debugging :slight_smile:

I’m also interested in playing with branching between the original code or one of a few variations of the extracted RPC based on some runtime info (such as where global addresses resolve to). Just out of curiosity, have you played with dynamically choosing between different HW mappings based on runtime info?

Yes, this is possible. Essentially, you replicate the outlined function. You then add a block that evaluates the branching condition, and branches to as many "caller" blocks as outlined replicas. Each caller block contains a call to one of the replicas. Of course, you have to know at compile time the condition upon which you'll base the branching decision.

Cheers,
Pablo

Hi Brandon,

Did you have issues with the Verifier complaining about function-local metadata after moving blocks? Did you find a good solution for this?

It looks like you are moving blocks or instructions with attached metadata to another function. Some metadata is function-specific, so the verifier complains because the instruction/BB and the metadata don't belong to the same function. Try creating a new metadata node (MDNode) as a copy of the old one (retrieving the old MDNode operands with getOperand()). Then link it to the instruction/BB, and unlink the old MDNode. For this, you have to lookup the MDNode kind ID in the instruction's MDNode list, and use setMetadata(kind, newMDNode). I'm saying this by heart, try googling "replace MDNode" or something like that.

Thanks for the tip. Having to look up the kind ID was the headache I was having. I ended up solving it rather nicely by just using CloneBasicBlock rather than letting CodeExtractor move it.

I also seem to be having issues with the Verifier's DominatorTree analysis claiming that some of the instructions in the new function don’t dominate their uses in the new function, though they look like they do to me. This is probably a bug in my implementation, but if you remember specifically having to deal with regenerating Dominance information, that would be good to know.

I didn't have to regenerate dominance info. As far as I recall, this is a matter of 1) fixing the branch instructions, 2) fixing the Phi nodes, 3) not adding BBs in the wrong place. If you send me an IR file and the dominance-related error message, I might spot the mistake.

If you get that error, it's because there is at least a path down the CFG where a virtual register "use" is not preceded by its "def". Are you sure that you don't see anything suspicious? You can see what your passes are doing to the code at any compilation stage with dump() (all "Value"s can be dumped), or viewCFG() in the case of functions. For me, this is like the printf() of LLVM debugging :slight_smile:

It did end up being a problem of inserting instructions in the wrong basic block: I was inserting *before* the beginning of the first block in a function, which was putting the instructions in weird places.

I’m also interested in playing with branching between the original code or one of a few variations of the extracted RPC based on some runtime info (such as where global addresses resolve to). Just out of curiosity, have you played with dynamically choosing between different HW mappings based on runtime info?

Yes, this is possible. Essentially, you replicate the outlined function. You then add a block that evaluates the branching condition, and branches to as many "caller" blocks as outlined replicas. Each caller block contains a call to one of the replicas. Of course, you have to know at compile time the condition upon which you'll base the branching decision.

Yeah, what cases to cover is one of the many interesting research questions there.

Thanks for the tips!

I have resorted to mostly making my own version of CodeExtractor that does what I want. If anyone ever sets out to make a generally-useful and parameterizable CodeExtractor module, I would be interested in helping or using it.

-Brandon