Pointers in Load and Store

When I compile C programs into llvm, it produces load instructions in
two different flavours.

(1) %8 = load i8** %bp, align 8

(2) %1 = load i8* getelementptr inbounds ([4 x i8]* @.str, i64 0,
i64 0), align 1

I know that %bp in first case and the entire "getelementptr inbounds
([4 x i8]* @.str, i64 0, i64 0)" in second case can be obtained by
dump'ing I.getOperand(0)

However, I want to find out which of the two forms of load have been
produced because in the second case, I want to insert checks for array
bounds.

How can I find out when I am in Instruction object I and I.getOpcode()
== 29 whether I am dealing with type (1) or type (2) above.

Thanks.

Surinder Kumar Jain

When I compile C programs into llvm, it produces load instructions in
two different flavours.

(1) %8 = load i8** %bp, align 8

(2) %1 = load i8* getelementptr inbounds ([4 x i8]* @.str, i64 0,
i64 0), align 1

I know that %bp in first case and the entire "getelementptr inbounds
([4 x i8]* @.str, i64 0, i64 0)" in second case can be obtained by
dump'ing I.getOperand(0)

However, I want to find out which of the two forms of load have been
produced because in the second case, I want to insert checks for array
bounds.

How can I find out when I am in Instruction object I and I.getOpcode()
== 29 whether I am dealing with type (1) or type (2) above.

The second load instruction is not really a "special" load instruction. Rather, its pointer argument is an LLVM constant expression (class llvm::ConstExpr). The Getelementptr (GEP), which would normally be a GEP instruction, is instead a constant expression that will be converted into a constant numeric value at code generation time.

So, what you need to do is to examine the LoadInst's operand and see if its a ConstExpr, and then see whether the ConstExpr's opcode is a GEP opcode.

However, there's an easier way to handle this. SAFECode (http://safecode.cs.illinois.edu) has an LLVM pass which converts constant expression GEPs into GEP instructions. If you run it on your code first, you'll get the following instruction sequence:

%tmp = getelementptr inbounds ([4 x i8]* @.str, i64 0,i64 0), align 1
%1 = load i8* %tmp

You then merely search for GEP instructions and put run-time checks on those (which you have to do anyway if you're adding array bounds checking). The only ConstantExpr GEPs that aren't converted, I think, are those in global variable initializers.

Now, regarding the insertion of array bounds checks, SAFECode does that, too (it is a memory safety compiler for C code). It also provides a simple static array bounds checker and some array bounds check optimization passes.

I can direct you to the relevant portions of the source code if you're interested.

-- John T.

John,

I have looked at the SAFECode and thought following sould work

       if (isa<Constant>(I.getOperand(0)))
       { Out << "*** operand 0 is a constant ******";
         if (Instruction *operandI = dyn_cast<Instruction>(I.getOperand(0)))
           { Out << "********operand is an instruction ****";
             if (GetElementPtrInst *gepI =
dyn_cast<GetElementPtrInst>(operandI))
               { Out << "*** operand is a gep instruction ********";
                 if (const ArrayType *ar =
dyn_cast<ArrayType>(gepI->getPointerOperandType()->getElementType()))
                   hi=ar->getNumElements();

But this does not recognize that operand(0) of instruction I is even
an instruction, let alone a get element pointer instruction. I have
taken the code from line 632 and line 757 of
safecode/lib/ArrayBoundsChecks/ArrayBoundCheck.cpp

I must be doing something wrong, what is it?

Surinder Kumar Jain

PS: Yes, I will be using safecode but still I want to know why above
code does not work. I am posting a separate mail wioth the title "OPT
optimizations"

John,

I have looked at the SAFECode and thought following sould work

        if (isa<Constant>(I.getOperand(0)))
        { Out<< "*** operand 0 is a constant ******";
          if (Instruction *operandI = dyn_cast<Instruction>(I.getOperand(0)))
            { Out<< "********operand is an instruction ****";
              if (GetElementPtrInst *gepI =
dyn_cast<GetElementPtrInst>(operandI))
                { Out<< "*** operand is a gep instruction ********";
                  if (const ArrayType *ar =
dyn_cast<ArrayType>(gepI->getPointerOperandType()->getElementType()))
                    hi=ar->getNumElements();

But this does not recognize that operand(0) of instruction I is even
an instruction, let alone a get element pointer instruction. I have
taken the code from line 632 and line 757 of
safecode/lib/ArrayBoundsChecks/ArrayBoundCheck.cpp

I must be doing something wrong, what is it?

The problem is simple: you're looking at the wrong source file.
:slight_smile:

More specifically, you're looking at the very antiquated static array bounds checking pass (it hasn't compiled in several years now). The file you want to look at is in lib/InsertPoolChecks/insert.cpp. This file contains the InsertPoolChecks pass which, in mainline SAFECode, is responsible for inserting array bounds checks and indirect function call checks. In particular, you want to look at the addGetElementPtrChecks() method.

As for Constant Expression GEPs, you want to look at the BreakConstGEP pass, located in lib/ArrayBoundsChecks/BreakConstantGEPs.cpp.

The BreakConstantGEP pass is run first. All it does is find instructions that use constant expression GEPs and replaces the Constant Expression GEP with a GEP instruction. All of the other SAFECode passes that worry about array bounds checks (i.e., the static array bounds checking passes in lib/ArrayBoundsCheck and the run-time instrumentation pass in lib/InsertPoolChecks/insert.cpp) only look for GEP instructions.

-- John T.

John,

I have looked at the real code (instead of the obsolete one) and it
appears to be easy to find if an operand is a getelementptr
instruction.

  if (ConstantExpr * CE = dyn_cast<ConstantExpr>(I.getOperand(0)))
    { Out<< "*** operand 0 is a constant Expr******";
       if (CE->getOpcode() == Instruction::GetElementPtr)
         { Out<< "*** operand 0 is a gep instruction ********";
           if (const ArrayType *ar =
dyn_cast<ArrayType>(CE->getPointerOperandType()->getElementType()))
              hi=ar->getNumElements();

Thank you for that.

I would like to use safecode programs rather than write my own code.
However, the website of safecode says that it works only with version
2.6 or 2.7 of llvm whereas I use version 2.8 of llvm.

To get around the problem, I plan to do as follows :

(1) Do not install safecode with llvm 2.8 (as it may or may not work)

(2) Create a new pass named "unGep", "Breaks Constant GEPs"

(3) The new pass derives from FunctionPass (because safecode does so,
if I had to write it, ModulePass would have been good enough.)

(4) The RunOnFunction method of the unGep pass invokes
addPoolChecks(F) passing it the function F. I will modify
addGetElementPtrChecks so that it produces array bounds in the way I
need. (I need a check that array bounds are being voilated for my
reaserch to detect overflows.)

I will then run opt as

opt -load../unGep.so

to produce llvm code without geps as operands.

Please advise if this will work or if there is an easier way.

Thanks.

Surinder Kumar Jain

John,

I have looked at the real code (instead of the obsolete one) and it
appears to be easy to find if an operand is a getelementptr
instruction.

   if (ConstantExpr * CE = dyn_cast<ConstantExpr>(I.getOperand(0)))
     { Out<< "*** operand 0 is a constant Expr******";
        if (CE->getOpcode() == Instruction::GetElementPtr)
          { Out<< "*** operand 0 is a gep instruction ********";
            if (const ArrayType *ar =
dyn_cast<ArrayType>(CE->getPointerOperandType()->getElementType()))
               hi=ar->getNumElements();

Thank you for that.

You're welcome.

I would like to use safecode programs rather than write my own code.
However, the website of safecode says that it works only with version
2.6 or 2.7 of llvm whereas I use version 2.8 of llvm.

SAFECode already does all the things you mention below. Unless you have a pressing need to use LLVM 2.8, I recommend switching to LLVM 2.7 so that you can re-use the SAFECode passes unmodified.

If you must use LLVM 2.8 or mainline LLVM, then I have the following suggestions below:

To get around the problem, I plan to do as follows :

(1) Do not install safecode with llvm 2.8 (as it may or may not work)

(2) Create a new pass named "unGep", "Breaks Constant GEPs"

The BreakConstantGEP pass is self-contained and should be trivial to update to work with LLVM 2.8. There is no need for you to go through the effort to rewrite it.

(3) The new pass derives from FunctionPass (because safecode does so,
if I had to write it, ModulePass would have been good enough.)

(4) The RunOnFunction method of the unGep pass invokes
addPoolChecks(F) passing it the function F. I will modify
addGetElementPtrChecks so that it produces array bounds in the way I
need. (I need a check that array bounds are being voilated for my
reaserch to detect overflows.)

First, passes should do just one thing. The BreakConstantGEP pass should convert ConstantExpr GEPs into GEP instructions and not do anything else. A separate pass should insert bounds checks. Calling addPoolChecks() from the BreakConstantGEP pass is a bad idea; it prevents the BreakConstantGEP pass from being reusable.

I'm currently in the process of making SAFECode follow this philosophy. For example, for LLVM 2.6, the InsertPoolChecks pass added load/store checks, array bounds checks, and indirect function call checks. I've moved the code that inserts load/store checks into a separate pass in mainline SAFECode and intend to do the same for indirect function call checks. We have also moved various array bounds check optimizations into separate passes.

Second, the code in InsertPoolChecks that inserts checks on GEP instructions is pretty straightforward. If you take the mainline version and remove the call to isGEPSafe(), it should not have any other pass dependencies, and you should be able to easily update it to LLVM 2.8.

As for the implementation of the run-time check, the interface is pretty generic: it takes a pool handle, the source of the GEP, and the result of the GEP and does the run-time check. The only extraneous parameter is the pool handle, and your run-time check can just ignore it if it doesn't need it. You only need to specialize the code for your run-time array bounds check implementation if you require parameters other than these.

I will then run opt as

opt -load../unGep.so

Yes, this is how you could run the passes. We built the SAFECode tool (sc) because SAFECode uses several different libraries; creating a separate tool was easier than trying to load all the libraries into opt.

-- John T.