Extracting all BasicBlocks of a Function into new Function

Hi,

I need to find a way to extract all BasicBlocks of a Function (no clones!) into a new Function that has the exact same signature, and adding a call to the new Function in the old one. I tried out lib/Transforms/Utils/CodeExtractor::ExtractCodeRegion(...), but this one unfortunately checks first to see whether there are any allocas and/or va_starts and returns a null pointer in that case. Could this check be omitted? If not, is there another way to do the extraction?

One of the obstacles I face when trying to do the complement (creating new Function and adding call to original in it), is to find out how to pass the varargs argument of the new Function into the call to the old Function. Will passing the sbyte** passed to llvm.va_start do the trick?

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

I think this is the better way to go. If you're concerned about varargs, you need to call va_start in the varargs one, then pass the valist down.

I thin kthe easiest way to do this is to do the 'complement' as you describe, but specially handle the varargs case. Basically you need to call va_start in the original function, and pass the valist pointer down. This shouldn't be too hard.

-Chris

Hi,

I think the easiest way to do this is to do the 'complement' as you
describe, but specially handle the varargs case. Basically you need to
call va_start in the original function, and pass the valist pointer down.
This shouldn't be too hard.

OK.

I've been rethinking my use of lib/Transforms/Utils/CodeExtractor::ExtractCodeRegion(...) to obtain my first goal, but I don't think I need such a complex algorithm after all. Basically, I only need to clone a Function's signature, move the full body to the clone and call the clone from the original Function instead. Determining in- and outgoing variables is unnecessary, as the original Function's signature is the same as the clone's one. Also, both here as in the "complement" case a CallInst needs to be synthesized, but the users of the original Function don't need to change in the former case. That's why I'd like this approach more instead of the latter one (e.g. in the case of function pointers or of external libraries probably not all uses can be found).

It seems that all I'd need would be the following naïve approach:
  1) clone the original Function's signature (clone has empty body)
  2) make mapping of the original Function's Arguments to the clone's ones
  3) iterate over the original Function's BasicBlocks, modifying their parent one by one; the BasicBlocks' mutual connections remain intact
  4) move the symbol table to the clone
  5) replace all uses of the original Arguments by the clone's ones in the clone's body
  6) synthesize a CallInst to the clone

So, I wonder what problems may show up in this algorithm (steps 4, 5 and 6 seem to be crucial). The vararg-case could be a problem in step 5, although the va_start intrinsic isn't tied to the surrounding Function at first sight.

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

Hi,

I've got the algorithm working for non-vararg cases. Tomorrow, I'll have a look at the vararg case (only modify step 6?).

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

Bram Adams wrote:

There is an implicit dependency between vastart and the function it lives in. Specifically, if you have:

void foo(int X, ...) {
   P = va_start();
   use(P);
}

You can't change this to:

void foo(int X, ...) {
   bar(X);
}

void bar(int X, ...) {
   P = va_start();
   use(P);
}

You'd have to change it to something like:

void foo(int X, ...) {
   P = va_start();
   bar(X, P);
}

void bar(int X, valist P) {
   use(P);
}

-Chris

Hi,

You'd have to change it to something like:

void foo(int X, ...) {
   P = va_start();
   bar(X, P);
}

void bar(int X, valist P) {
   use(P);
}

Can the other va_...-intrinsics be used in bar as were the "P = va_start" in bar? The va_start probably is unnecessary now in bar, but where dus the va_end need to be: at the end of foo or of bar or doesn't it matter?

Thanks for the remark,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

All the non-vastart calls can be anywhere. va_end in particular codegens to a noop on all targets llvm currently supports, fwiw.

-Chris

Hi,

Chris Lattner wrote:

All the non-vastart calls can be anywhere. va_end in particular codegens to a noop on all targets llvm currently supports, fwiw.
  

Things go well, except for the following (pathological?) C program:

int va_double_sum(int count,...){
  int i,sum=0;
  va_list ap;

  va_start(ap,count);
  for(i=0;i<count;i++){
    sum+=va_arg(ap,int);
  }
  va_end(ap);

  va_start(ap,count);/* same vararg opened twice!!! */
  for(i=0;i<count;i++){
    sum+=va_arg(ap,int);
  }
  va_end(ap);

  return sum;
}

The problematic issue here is that the va_list is opened and used twice, which should work according to GCC. If I convert the program's LLVM bytecode to the following pseudo-bytecode:

int va_double_sum(int count,...){
  /*declare va_list*/
  /*llvm.va_start va_list*/
  /*call extracted_sum(count,va_list)*/
  /*llvm.va_end va_list*/
}

int extracted_sum(int count,/*vararg-argument*/){
  /*see original va_double_sum WITHOUT the llvm.va_start's and llvm.va_end's and WITH all uses of the original va_list replaced with the extra vararg-argument*/
}

Apparently, the problem is how to "rewind" the vararg-argument back to its first element when the second loop is reached, as it seems that the code just continues at position count and beyond, resulting in overflow.

I'm not sure whether this code snippet is valid ANSI C, and if it is, one probably shouldn't use this approach, but I'm just curious to know how (and if) this problem could be solved.

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

I don't know if that's valid but what about something like this:
int va_double_sum(int count,...){
/*declare va_list for each original va_start*/
/*llvm.va_start for each va_list*/
/*call extracted_sum(count,va_list1, va_list2, ...)*/
}

int extracted_sum(int count,/*va_list1, va_list2, ...*/){
/*see original va_double_sum WITHOUT the llvm.va_start's and WITH all
uses of the original va_list replaced with the correspinding va_list
argument
the extra vararg-argument*/
}

Why not va_copy the input valist?

-Chris

Hi,

I don't know if that's valid but what about something like this:
int va_double_sum(int count,...){
/*declare va_list for each original va_start*/
/*llvm.va_start for each va_list*/
/*call extracted_sum(count,va_list1, va_list2, ...)*/
}

That should work, I think, ...

Why not va_copy the input valist?

... and so would this. The only problem left is that both require an easy way to replace each use of the original va_list situated AFTER a (to be removed) va_end with the second va_list. This seems a task for a control flow graph based function checking the (possibly partial) order between two Instructions, i.e. to check whether a Use is situated after a CallInst to va_end.

Does such a method already exist in one of the transform or analysis passes? Or does a replaceAllUsesAfterThisInstruction() method exist?

A related question: What's the easiest way to replace all uses of a Value, but only within, say, a Function?

Kind regards,

Bram Adams
GH-SEL, INTEC, Ghent University (Belgium)

Why not va_copy the input valist?

... and so would this. The only problem left is that both require an
easy way to replace each use of the original va_list situated AFTER a
(to be removed) va_end with the second va_list. This seems a task for
a control flow graph based function checking the (possibly partial)
order between two Instructions, i.e. to check whether a Use is
situated after a CallInst to va_end.

Does such a method already exist in one of the transform or analysis
passes? Or does a replaceAllUsesAfterThisInstruction() method exist?

nope.

A related question: What's the easiest way to replace all uses of a
Value, but only within, say, a Function?

What sort of value? If the 'old' value is an instruction, argument, or basicblock, you know that the only uses can be within the current function. Thus, replaceAllUsesWith should do the trick.

-Chris