Patch: MSIL backend global pointers initialization

Hi,
Here's a patch that fixes initialization of global pointers (also function pointers).

I'm working on that backend now, so probably I'll send some more patches soon. I'd be grateful if you could give me some suggestions how to add some test for that backend to the test-suite. On Linux the output code could be run on Mono and compared with outputs for other backends but I'm not sure if it's preferable to involve third party apps.

Thanks for help
Artur

globals_init_fix.patch (2.94 KB)

Hi, Artur

I'm working on that backend now, so probably I'll send some more patches
soon. I'd be grateful if you could give me some suggestions how to add
some test for that backend to the test-suite. On Linux the output code
could be run on Mono and compared with outputs for other backends but
I'm not sure if it's preferable to involve third party apps.

Tests usually should be self-contained. So, you can, for example, just
grep the output for "critical" points, which should always be there.

Please refine you patches to be consistent with LLVM's coding style and
I'll apply them.

Thanks!

Anton Korobeynikov wrote:

Hi, Artur

I'm working on that backend now, so probably I'll send some more patches
soon. I'd be grateful if you could give me some suggestions how to add
some test for that backend to the test-suite. On Linux the output code
could be run on Mono and compared with outputs for other backends but
I'm not sure if it's preferable to involve third party apps.
    

Tests usually should be self-contained. So, you can, for example, just
grep the output for "critical" points, which should always be there.
  

Ok, thanks.

Please refine you patches to be consistent with LLVM's coding style and
I'll apply them.
  

I'm sorry, attached again.

So again, one of them makes the global pointers initialization work.
The second one allows executing code with vararg pinvoke functions under Mono. It generates separate function declaration for each call signature.

Artur

varargs.patch (5.18 KB)

globals_init_fix.patch (2.94 KB)

Hi, Artur

So again, one of them makes the global pointers initialization work.
The second one allows executing code with vararg pinvoke functions under
Mono. It generates separate function declaration for each call signature.

Is this mono-only feature? Or pretty "universal" one?

Hi Anton

So again, one of them makes the global pointers initialization work.
The second one allows executing code with vararg pinvoke functions under
Mono. It generates separate function declaration for each call signature.
    

Is this mono-only feature? Or pretty "universal" one?
  

In fact this is needed only for Mono, but with that change the code works with any CLR implementation.

Artur

Hi, Artur

+ if (isa<ConstantExpr>(I->constant)){
+ const ConstantExpr *CE =
dyn_cast<ConstantExpr>(I->constant);
+ printConstantExpr(CE);
+ ty = CE->getType();
+ } else {
+ const Function * F = dyn_cast<Function>(I->constant);
+ printValueLoad(F);
+ ty = F->getType();
+ }

You don't need to check stuff twice here.
1. Use dyn_cast<ConstantExpr> instead of isa + dyn_cast
2. In the "else" path you need to be sure, that I->constant is indeed
Function. Use cast, not dyn_cast.

PS: Also: "if (foo){" => "if (foo) {"

Hi, Artur

Minor comments:

+// Comparision for std::lower_bound used in
MSILWriter::printExternals()
+static bool CompareInstructions(Instruction *A,Instruction *B)
+{

Put brace on the same line as function def.

+ if ( !F->use_empty() ) // Print only if used
+ {

Likewise. Plus use "if (foo)" instead of "if ( foo )". All code around
uses the first variant.

+ if (F->isVarArg()){

Add space before "{"

+ std::vector<Instruction*> ivec;
+ Value::use_const_iterator e = I->use_end();
+ for(Value::use_const_iterator i = I->use_begin(); i!=e;
++i){

Likewise. Also, put space before "("

+ Instruction *instr =
+ const_cast<Instruction*>(dynamic_cast<const
Instruction*>(*i));

Sounds hacky. Why do you need to cast away const? Maybe you just need
use_iterator instead?

+ if(!instr) continue;

See above.

+ //We want each signature just once
+ std::vector<Instruction*>::iterator ins =
+ std::lower_bound(ivec.begin(), ivec.end(), instr,
+ CompareInstructions);
+ if(ins!=ivec.end() && instr->isSameOperationAs(*ins))
continue;

Likewise.

Hi Anton,

Anton Korobeynikov wrote:

Minor comments:
  

Thanks for your comments and your patience, I'll now check the style four times before I send anything :wink:

+ Instruction *instr =
+ const_cast<Instruction*>(dynamic_cast<const
Instruction*>(*i));
    

Sounds hacky. Why do you need to cast away const? Maybe you just need
use_iterator instead?
  

Ok, I've fixed this.

I didn't attached the new patch because I found a bug in my code - in special cases I get doubled declarations.

If I have two instructions like those:

%1 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([16 x i8]* @.str, i32 0, i32 0), i32 %0) nounwind
%10 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([11 x i8]* @.str2, i32 0, i32 0), i32 5) nounwind

Instruction::isSameOperationAs() returns false for those two. Is it a bug or I misunderstood something?
In any case I wrote my custom isSame() operator for this but I'll appreciate any hints.

Thanks!
Artur

Hi Anton,

Anton Korobeynikov wrote:

+ if (isa<ConstantExpr>(I->constant)){
+ const ConstantExpr *CE =
dyn_cast<ConstantExpr>(I->constant);
+ printConstantExpr(CE);
+ ty = CE->getType();
+ } else {
+ const Function * F = dyn_cast<Function>(I->constant);
+ printValueLoad(F);
+ ty = F->getType();
+ }
    

You don't need to check stuff twice here.
1. Use dyn_cast<ConstantExpr> instead of isa + dyn_cast
2. In the "else" path you need to be sure, that I->constant is indeed
Function. Use cast, not dyn_cast.

PS: Also: "if (foo){" => "if (foo) {"
  

OK, I've fixed that.

Thanks,
Artur

globals.patch (2.66 KB)

Hi, Artur

%1 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([16 x
i8]* @.str, i32 0, i32 0), i32 %0) nounwind
%10 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([11 x i8]*
@.str2, i32 0, i32 0), i32 5) nounwind

Instruction::isSameOperationAs() returns false for those two. Is it a
bug or I misunderstood something?

These are two different instructions as you might see, thus - no bug :slight_smile:

In any case I wrote my custom isSame() operator for this but I'll
appreciate any hints.

Why do you need this? Won't be enough just to compare function types?

Hello Anton

%1 = tail call i32 (i8*, …)* @printf(i8* noalias getelementptr ([16 x
i8]* @.str, i32 0, i32 0), i32 %0) nounwind
%10 = call i32 (i8*, …)* @printf(i8* noalias getelementptr ([11 x i8]*
@.str2, i32 0, i32 0), i32 5) nounwind

Instruction::isSameOperationAs() returns false for those two. Is it a
bug or I misunderstood something?
These are two different instructions as you might see, thus - no bug :slight_smile:

OK, I just need the same signature for both of those instructions.

In any case I wrote my custom isSame() operator for this but I’ll
appreciate any hints.
Why do you need this? Won’t be enough just to compare function types?

I do something like this:

static bool areSame(Instruction *A, Instruction *B) {
if (A->getOpcode() != B->getOpcode()) return false;
if (A->getType()->getTypeID() != B->getType()->getTypeID()) return false;
unsigned int a = A->getNumOperands();
if (a != B->getNumOperands()) return false;
for (unsigned int i = 0; i < a; ++i)
if (A->getOperand(i)->getType()->getTypeID()
!= B->getOperand(i)->getType()->getTypeID()) return false;
return true;
}

I’m not sure if it’s too much.
Thanks!

Artur

P.S. Sorry for that [Junk…] in few mails. It’s our “great” antispammer, I will switch to gmail.

Artur,

OK, I just need the same signature for both of those instructions.

Both are callinsts of same function, isn't that enough? Since it's a
variadic function there is also a bitcast to proper type. So, looking
for type of callee (not result, but function type!) you'll obtain the
real "signature" of callee and if you'll strip all pointer cast you'll
obtain the "declaration" (=variadic) type of the callee. This should
be enough for almost all purposes. You should not try to invent some
custom "comparison schemes", almost surely in such situation you're
reinventing the wheel :slight_smile:

  if (A->getType()->getTypeID() != B->getType()->getTypeID()) return false;

TypeID is internal thing for type. All types are unique inside LLVM,
so you should just compare Type*'s.

    if (A->getOperand(i)->getType()->getTypeID()
      != B->getOperand(i)->getType()->getTypeID()) return false;

Same here.

Hi Anton,

OK, I just need the same signature for both of those instructions.

Both are callinsts of same function, isn’t that enough? Since it’s a
variadic function there is also a bitcast to proper type. So, looking
for type of callee (not result, but function type!) you’ll obtain the
real “signature” of callee and if you’ll strip all pointer cast you’ll
obtain the “declaration” (=variadic) type of the callee. This should
be enough for almost all purposes. You should not try to invent some
custom “comparison schemes”, almost surely in such situation you’re
reinventing the wheel :slight_smile:

So I do this:

if (CallInst *cinst = cast(instr)) {
const FunctionType *FTy = cinst->getCalledFunction()->getFunctionType();
}

What I get “i32 (i8*, …)”
How can I get something like “i32 (i8*, i32)”?

Thanks for your help.
Regards,
Artur

Hello,

So, looking for type of callee (not result, but function type!) you’ll obtain the
real “signature” of callee and if you’ll strip all pointer cast you’ll
obtain the “declaration” (=variadic) type of the callee.

Maybe I misunderstood something but I just get the variadic declaration not the real “signature”, like this:

const CallInst *I = dyn_cast(A);

my CallInst looks like:
%1 = tail call i32 (i8*, …)* @printf(i8* noalias %0, i32 123) nounwind

const Value Callee = I->getCalledValue();
my Callee is: declare i32 @printf(i8
nocapture, …) nounwind

const PointerType PTy = cast(Callee->getType());
I get: i32 (i8
, …)*

const FunctionType FTy = cast(PTy->getElementType());
so I get: i32 (i8
, …)

The interesting for me part of the CallInst is printf(i8* noalias %0, i32 123).
I was diging in doxygen documentation but I really can’t see the easy way to compare those instructions and again finish with reinvented (but working) wheel ;).
I would really appreciate help.
Thanks!

Artur

Hi, Artur

The interesting for me part of the CallInst is printf(i8* noalias %0, i32
123).
I was diging in doxygen documentation but I really can't see the easy way to
compare those instructions and again finish with reinvented (but working)
wheel ;).

Ah, sorry. I missed that you're doing variadic calls, not casting
variadic function to definite ones. I think you can construct CallSite
and then iterate over the arguments grabbing their types and construct
function type after that.

Hi Anton

The interesting for me part of the CallInst is printf(i8* noalias %0, i32
123).
I was diging in doxygen documentation but I really can’t see the easy way to
compare those instructions and again finish with reinvented (but working)
wheel ;).

Ah, sorry. I missed that you’re doing variadic calls, not casting
variadic function to definite ones. I think you can construct CallSite
and then iterate over the arguments grabbing their types and construct
function type after that.

So what do you think about that:

// CallSites have equal signatures
bool MSILWriter::cmpCallSite(CallSite A, CallSite B) {
return (getCallSiteFType(A)==getCallSiteFType(B) &&
A.getAttributes()==B.getAttributes());
}

// Comparision for std::lower_bound used in MSILWriter::printExternals()
bool MSILWriter::compareCallSite(CallSite A, CallSite B) {
return getCallSiteFType(A)<getCallSiteFType(B);
}

// Constructs function type from given CallSite
FunctionType* MSILWriter::getCallSiteFType(CallSite CS) {
std::vector<const Type *> params;
CallSite::arg_iterator AI=CS.arg_begin(), AE = CS.arg_end();
for ( ; AI!=AE; ++AI )
params.push_back((*AI)->getType());
return FunctionType::get(CS.getCalledFunction()->getReturnType(),
params,CS.getCalledFunction()->isVarArg());
}

Thanks!
Regards,
Artur

Hi, Artur

// CallSites have equal signatures
bool MSILWriter::cmpCallSite(CallSite A, CallSite B) {
return (getCallSiteFType(A)==getCallSiteFType(B) &&
A.getAttributes()==B.getAttributes());
}

As it is impossible to honour argument attributes in MSIL I don't see
why you should compare attributes. You seems to have the same MSIL
call signature for calls with different param attrs.

// Comparision for std::lower_bound used in MSILWriter::printExternals()
bool MSILWriter::compareCallSite(CallSite A, CallSite B) {
return getCallSiteFType(A)<getCallSiteFType(B);
}

Hrm... You're building type for each comparison, which seems to be
quite inefficient. Why don't iterate over all variadic call sites of
the function and build a single map CS => FunctionType? Also, here it
seems you're trying to rely on (instable) numeric representation of
pointers which can be quite unstable and in general bad for tests,
etc.

// Constructs function type from given CallSite

At least - from arguments of the call :slight_smile:

FunctionType* MSILWriter::getCallSiteFType(CallSite CS) {
std::vector<const Type *> params;
CallSite::arg_iterator AI=CS.arg_begin(), AE = CS.arg_end();

Why don't shorten life of AI, AE and not define them in the for() loop header?

Hello Anton

// CallSites have equal signatures
bool MSILWriter::cmpCallSite(CallSite A, CallSite B) {
return (getCallSiteFType(A)==getCallSiteFType(B) &&
A.getAttributes()==B.getAttributes());
}

As it is impossible to honour argument attributes in MSIL I don’t see
why you should compare attributes. You seems to have the same MSIL
call signature for calls with different param attrs.

I need to check attributes for signed or unsigned ints.
I guess it would be better to check just the sign extension.

// Comparision for std::lower_bound used in MSILWriter::printExternals()
bool MSILWriter::compareCallSite(CallSite A, CallSite B) {
return getCallSiteFType(A)<getCallSiteFType(B);
}

Hrm… You’re building type for each comparison, which seems to be
quite inefficient. Why don’t iterate over all variadic call sites of
the function and build a single map CS => FunctionType?

Good point, thanks.

Also, here it
seems you’re trying to rely on (instable) numeric representation of
pointers which can be quite unstable and in general bad for tests,
etc.

Could you suggest the best way to compare FunctionTypes?

// Constructs function type from given CallSite

At least - from arguments of the call :slight_smile:

FunctionType* MSILWriter::getCallSiteFType(CallSite CS) {
std::vector<const Type *> params;
CallSite::arg_iterator AI=CS.arg_begin(), AE = CS.arg_end();

Why don’t shorten life of AI, AE and not define them in the for() loop header?

for ( ; AI!=AE; ++AI )
params.push_back((*AI)->getType());

Thanks,
Artur