I have two very specific questions about LLVM, but first let me give you the general overview of what I am trying to achieve. I have some section of a function that I want to replace with a function call to a brand new function. For example, I want to take the following function:
function foo:
code A
<--- CUT
code B
<--- CUT
code C
and (approximately) convert it into:
function foo:
code A
call label
code C
function label:
code B
I think I have a pretty good idea of how to do this (BasicBlock::split really helps!), but I have a few specific questions:
1. To mark the section of code to be cut out, I'm using "magic" function calls (begin() and end()). In order to locate these calls, I am currently iterating over the basic blocks in a function using the iterator. Is it possible that I could get the blocks "out of order" with respect to the control flow? For example, if I have this code:
blockA
if ( blockB ) {
blockC
} else {
blockD
}
blockE
I expect to get the blocks in alphabetical order. I can see that it is possible that LLVM could give them to me in an arbitrary order. If so, it will complicate how I need to locate the basic blocks that compose "code B" in my example above.
2. I am going to need to pass dependencies for the "code B" block into the new function. I'm hoping that it work if I just create the arguments with the same name and type as the dependencies, then use "setParent" to transfer the basic blocks to the new function. However, after staring at the "instruction" object a little bit, I am starting to think that I may need to actually modify each individual instruction to point to the arguments. Can anyone straighten me out here?
By the way, I have to say that I am *very* impressed with LLVM's documentation and organization. I have been able to get an amazing amount of functionality working *very* quickly. This is a really useful tool for manipulating code.
Evan Jones
I just found the Interval class, and it looks like it could help me out here. I want to locate the Interval that begins with the call to the "magic" begin(), and ends with the call to the "magic" end(), if it exists. If there is no such valid interval, then I want to detect that and return an error. Can I make this happen by using the IntervalPartition pass? It looks like LLVM has some functionality that I could use *some where.* I just can't quite figure out how to make use of it.
Thanks,
Evan Jones
Ah ha! I figured out the succ_begin() and pred_begin() functions. This looks like it *might* be a better way to do things. At the very least, I can verify that I have located the chunk of code between begin() and end() correctly:
1. All predecessors must be in the interval, except for the initial block, which must have a single predecessor: the block that used to contain the begin() call.
2. All successors must be in the interval, except for the last block, which must have a single successor: the block that was created when splitting the block with the end() call.
However, since I am creating the Interval manually, it is possible that one of these errors occurs because the blocks are returned out of order by the Function::begin() iterator. However, since I only need to worry about handling code that is produced by llvm-gcc, I'm not going to worry about it.
On to issue #2.
Evan Jones
Evan,
Have you looked at ExtractCodeRegion() and its siblings in include/llvm/Transform/Utils/FunctionUtils.h? It seems to be what you need.
--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/
Vikram is exactly right. The functions in that header can extract an arbitrary single entry multiple exit set of basic blocks into a new function, and takes care of all of the updating needed for live ins and live outs of the code. If you want to extract something that doesn't start/end on a basic block boundary, use of BB::splitBasicBlock would be appropriate.
I'm glad LLVM is working out for you!! 
-Chris
An Interval (and an Interval partition) is a concept with a lot of compiler theory behind it. If you google for control flow analysis and interval you should find info on it. That said, I don't think it is what you're looking for...
-Chris
I have two very specific questions about LLVM, but first let me give you the general overview of what I am trying to achieve. I have some section of a function that I want to replace with a function call to a brand new function. For example, I want to take the following function:
I think I have a pretty good idea of how to do this (BasicBlock::split really helps!), but I have a few specific questions:
ok
1. To mark the section of code to be cut out, I'm using "magic" function calls (begin() and end()). In order to locate these calls, I am currently iterating over the basic blocks in a function using the iterator. Is it possible that I could get the blocks "out of order" with respect to the control flow? For example, if I have this code:
blockA
if ( blockB ) {
blockC
} else {
blockD
}
blockE
I expect to get the blocks in alphabetical order. I can see that it is possible that LLVM could give them to me in an arbitrary order. If so, it will complicate how I need to locate the basic blocks that compose "code B" in my example above.
The order of BasicBlock's in an LLVM Function match the order that they are printed. If you iterate over them, you are guaranteed to get this order. Note that if you're looking at the output of the C front-end, they could be permuted arbitrarily w.r.t. the source order, due to various optimizations. That said, if you have begin/end markers in the same basic block, they should stay within the same basic block most of the time (enough for it to not matter).
2. I am going to need to pass dependencies for the "code B" block into the new function. I'm hoping that it work if I just create the arguments with the same name and type as the dependencies, then use "setParent" to transfer the
This is trickier. In particular, if you have code like this:
...
a = ...
---cut---
X = add a, 1
---cut---
C = add X, b
Then you need to arrange to pass the value of "a" into the new function and get the value of "x" back out of it. This is exactly the sort of details that ExtractCodeRegion takes care of for you. 
By the way, I have to say that I am *very* impressed with LLVM's documentation and organization. I have been able to get an amazing amount of functionality working *very* quickly. This is a really useful tool for manipulating code.
Great! 
-Chris
Have you looked at ExtractCodeRegion() and its siblings in include/llvm/Transform/Utils/FunctionUtils.h? It seems to be what you need.
This is brilliant. In fact, it is nearly *exactly* what I need.
An Interval (and an Interval partition) is a concept with a lot of compiler theory behind it. If you google for control flow analysis and interval you should find info on it. That said, I don't think it is what you're looking for...
You are right: it's not. As you may be able to tell, compilers are *not* my research area, although I do think they are very interesting.
The order of BasicBlock's in an LLVM Function match the order that they are printed. If you iterate over them, you are guaranteed to get this order. Note that if you're looking at the output of the C front-end, they could be permuted arbitrarily w.r.t. the source order, due to various optimizations. That said, if you have begin/end markers in the same basic block, they should stay within the same basic block most of the time (enough for it to not matter).
Yuck. This is what I was afraid of. I do *not* necessarily have begin/end markers in the same basic block. In fact, it would be quite unusual that they would be in the same basic block. There are likely to be intervening conditionals or loops. From this comment, it seems to me that I need to collect my vector of BasicBlocks in a "smarter" fashion by actually tracing the control flow from the "begin" marker through to the "end" marker. Am I right about this?
This is trickier. In particular, if you have code like this:
[snip]
Then you need to arrange to pass the value of "a" into the new function and get the value of "x" back out of it. This is exactly the sort of details that ExtractCodeRegion takes care of for you. :)``
Right, this is what I was going to hack on myself. I'm glad I don't have to! Like I said, I've been able to implement a surprising amount of functionality very easily using LLVM. It's just an issue of figuring out what APIs to use.
Browsing through the implementation, I see that it will not work for any code that calls alloca or vastart. It seems to me this is not a fundamental limitation, but is a limitation of the current implementation. Am I right about that? I doubt it will be an issue for me anyway.
Evan Jones
The order of BasicBlock's in an LLVM Function match the order that they are printed. If you iterate over them, you are guaranteed to get this order. Note that if you're looking at the output of the C front-end, they could be permuted arbitrarily w.r.t. the source order, due to various optimizations. That said, if you have begin/end markers in the same basic block, they should stay within the same basic block most of the time (enough for it to not matter).
Yuck. This is what I was afraid of. I do *not* necessarily have begin/end markers in the same basic block. In fact, it would be quite unusual that they would be in the same basic block. There are likely to be intervening conditionals or loops. From this comment, it seems to me that I need to collect my vector of BasicBlocks in a "smarter" fashion by actually tracing the control flow from the "begin" marker through to the "end" marker. Am I right about this?
No, I think that should work. As I mentioned, the code extractor only works for single entry regions with (potentially) multiple exits. If you have single entry/single exit, it should work fine.
This is trickier. In particular, if you have code like this:
[snip]
Then you need to arrange to pass the value of "a" into the new function and get the value of "x" back out of it. This is exactly the sort of details that ExtractCodeRegion takes care of for you. :)``
Right, this is what I was going to hack on myself. I'm glad I don't have to! Like I said, I've been able to implement a surprising amount of functionality very easily using LLVM. It's just an issue of figuring out what APIs to use.

Browsing through the implementation, I see that it will not work for any code that calls alloca or vastart. It seems to me this is not a fundamental limitation, but is a limitation of the current implementation. Am I right about that? I doubt it will be an issue for me anyway.
Yup.
-Chris
It sounds that way. Try the df_ext_iterator to trace control flow paths forward from the begin() in linear time. lib/Transforms/Scalar/ADCE.cpp seems to have an example of using it.
--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/