Need Help With Verifier

While it is great that LLVM has an IR Verifier, its a little troublesome
to use because it separates the point of detection from the source of
the problem. That is, the verifier gets run on a module or function
after its been built. By that point, the compiler's state has moved past
the point at which the error was placed into the module or function.
Trying to track down the source of the problem from the scant
information produced by the verifier is turning out to be tedious for me
(i.e. spent several hours at it, no results).

Now, here's the specific problem. When the verifier runs, I get:

Stored value is not of right type for indices!
void <badref>
type[1024 x int] [1024 x int]

So, I've tracked this down to being one of three StoreInst instructions.
In particular, its one of the two StoreInst instructions I generate that
write into a global array of int (Stacker's global stack).

Here's what I don't understand:

The first operand of the StoreInst is being reported as "void <badref>".
First, I don't really know what this means but I assume its just an
invalid references of some sort. The only thing I store into the array
is an "IntTy". I use a GEP instruction to index into the stack and then
generate the store instruction like this:

ConstantInt* val = ConstantInt::get( Type::IntTy, value );
bb->getInstList().push_back( new StoreInst( val, gep ) );

How could this "ConstantInt" value be reported as "void <badref>" ?

The next thing that troubles me is that the verifier output for the
second operand indicates a two dimensional array. This just isn't right,
its one dimensional. Here's how I create the global array for the stack
(used to form the "gep" in the above code):

ArrayType* stack_type = ArrayType::get( Type::IntTy, stack_size );
                                                                                                                                                             
TheStack = new GlobalVariable(
     /*type=*/ stack_type,
     /*isConstant=*/ false,
     /*Linkage=*/ GlobalValue::AppendingLinkage,
     /*initializer=*/0,
     /*name=*/ "_stack_",
     /*parent=*/ TheModule
);

Shouldn't that create just a single dimensioned array? Note that
"stack_size" is an integer with the value 1024.

Finally, I have some confusion about how SSA works with LLVM. If I
understand the documentation correctly, there is no distinction between
an instruction that creates a value and the value itself. This
understanding comes from the Programmer's Manual:

        "One important aspect of LLVM is that there is no distinction
        between an SSA variable and the operation that produces it.
        Because of this, any reference to the value produced by an
        instruction (or the value available as an incoming argument, for
        example) is represented as a direct pointer to the class that
        represents this value. Although this may take some getting used
        to, it simplifies the representation and makes it easier to
        manipulate."

So, for example, if I want to add two numbers stored in memory, I add
the two load instructions used to fetch the values. Right? Please say
yes or I have to change my whole conception of how this works! :slight_smile:

Okay, so one of the built-in operations in Stacker is to push an integer
onto the stack. This is done by just mentioning the integer value as in
Forth. For example,

: PushOne 1 ;

is Stackerese for defining a function that pushes the value 1 onto the
stack. To implement "PushOne", I have to conceptually do this:

     1. Load the integer stack index (LoadInst)
     2. Increment the index (Add BinaryOperator)
     3. Get the address of the stack element at that index (GEP)
     4. Store the value 1 at that address (StoreInst).

Item 4 represents the StoreInst that the verifier is complaining about.
So, below is how I did the above sequence. I thought it was pretty
straight forward, but I've obviously done something wrong!

In the following, note that "TheIndex" is a GlobalVariable of type
"LongTy", "TheStack" is the [1024xint] array that I mentioned above. The
BasicBlock variable bb is passed in from a higher level in the
compiler's call sequence.

void
StackerCompiler::incr_stack_index( BasicBlock* bb )
{
    ConstantInt* ival = ConstantInt::get(Type::LongTy, 1);
    
    LoadInst* loadop = new LoadInst( TheIndex );
    bb->getInstList().push_back( loadop );

    BinaryOperator* addop = BinaryOperator::create(
        Instruction::Add, loadop, ival);
    bb->getInstList().push_back( addop );

    StoreInst* storeop = new StoreInst( addop,
        TheIndex );
    bb->getInstList().push_back( storeop );
}
                                                                                                                                                             
GetElementPtrInst*
StackerCompiler::get_stack_pointer( BasicBlock* bb )
{
    LoadInst* loadop = new LoadInst( TheIndex );
    bb->getInstList().push_back( loadop );

    std::vector<Value*> indexVec; // Index vector
    indexVec.push_back(loadop); // Insert single index

    GetElementPtrInst* gep = new GetElementPtrInst(
        TheStack, indexVec );
    bb->getInstList().push_back( gep ); // Put GEP in Block

    return gep;
}

void
StackerCompiler::push_integer(BasicBlock* bb, int32_t value )
{
    incr_stack_index(bb);

    GetElementPtrInst* gep = get_stack_pointer( bb );

    ConstantInt* val = ConstantInt::get( Type::IntTy, value );

    bb->getInstList().push_back( new StoreInst( val, gep ) );
}
                                                                                                                                                             Thanks in advance for any light you can shed on this. I'm stumped.

Reid.

While it is great that LLVM has an IR Verifier, its a little troublesome
to use because it separates the point of detection from the source of
the problem. That is, the verifier gets run on a module or function
after its been built. By that point, the compiler's state has moved past

Well, you can run the verifier any time you want: there is a verifier
pass, or you can manually call verifyModule/verifyFunction. Aside from
that, I'm not sure exactly what we could add...

Now, here's the specific problem. When the verifier runs, I get:

Stored value is not of right type for indices!
void <badref>
type[1024 x int] [1024 x int]

WOW. That is an absolutely horrible error message. I've fixed both the
error message itself (it should now read "Stored value type does not
match pointer operand type!"), and the things being printed. The first
one is supposed to print the Store instruction itself (allowing you to
actually find the thing), but didn't due to an unrelated change in the
verifier recently. :frowning:

Try it now.

So, I've tracked this down to being one of three StoreInst instructions.
In particular, its one of the two StoreInst instructions I generate that
write into a global array of int (Stacker's global stack).

Ok.

Here's what I don't understand:

The first operand of the StoreInst is being reported as "void <badref>".
First, I don't really know what this means but I assume its just an
invalid references of some sort.

Actually, that is the store instruction itself. The verifier was
mistakenly passing the store into 'WriteAsOperand', which, since it
returns void, isn't a valid reference. Now it should print the actual
instruction itself.

The only thing I store into the array
is an "IntTy". I use a GEP instruction to index into the stack and then
generate the store instruction like this:

> ConstantInt* val = ConstantInt::get( Type::IntTy, value );
> bb->getInstList().push_back( new StoreInst( val, gep ) );

How could this "ConstantInt" value be reported as "void <badref>" ?

Again, this is should be fixed. Sorry for the confusion. :frowning:

The next thing that troubles me is that the verifier output for the
second operand indicates a two dimensional array.

Actually, another failure, it's printing out the type twice. In LLVM, a
two dimensional array would look like '[ 40 x [10 x int]]' or something.

Shouldn't that create just a single dimensioned array? Note that
"stack_size" is an integer with the value 1024.

Yes, it should.

Finally, I have some confusion about how SSA works with LLVM. If I
understand the documentation correctly, there is no distinction between
an instruction that creates a value and the value itself.

Correct.

So, for example, if I want to add two numbers stored in memory, I add
the two load instructions used to fetch the values. Right? Please say
yes or I have to change my whole conception of how this works! :slight_smile:

yes! :slight_smile:

Okay, so one of the built-in operations in Stacker is to push an integer
onto the stack. This is done by just mentioning the integer value as in
Forth. For example,

: PushOne 1 ;

is Stackerese for defining a function that pushes the value 1 onto the
stack. To implement "PushOne", I have to conceptually do this:

     1. Load the integer stack index (LoadInst)
     2. Increment the index (Add BinaryOperator)
     3. Get the address of the stack element at that index (GEP)
     4. Store the value 1 at that address (StoreInst).

Yup, that sounds good.

    std::vector<Value*> indexVec; // Index vector
    indexVec.push_back(loadop); // Insert single index
    GetElementPtrInst* gep = new GetElementPtrInst(
        TheStack, indexVec );
    bb->getInstList().push_back( gep ); // Put GEP in Block

This is the problem. The deal here is that the 'getelementptr'
instruction actually takes one extra argument than you are giving it (this
is, by far, the most confusing aspect of the getelementptr instruction).
Lets say your stack is of type:

%Stack = ... <[100 x int] *>

If you index into this with a single "long" index, you will get this:

%Ptr = getelementptr [100 x int]* %Stack, long %X ; <[100 x int] *>

Where what you really want is this:

%Ptr = getelementptr [100 x int]* %Stack, long 0, long %X ; <int*>

The deal with the initial zero is that it is specifying how to index
_through the pointer_, which is often forgotten about. This allows LLVM
to translate this C code:

int *P = ...
P = P + 1;

into this LLVM code:

%P.1 = ...
%P.2 = getelementptr int* %P.1, long 1

This probably won't make sense the first time you see it. Take a look at:
http://llvm.cs.uiuc.edu/docs/LangRef.html#i_getelementptr

And remember that we are turning the '->' syntactic sugar into it's
equivalent [0] form.

If you have any other questions about this, please just let me know (and
try out the fixed verifier too :slight_smile:

-Chris

> While it is great that LLVM has an IR Verifier, its a little troublesome
> to use because it separates the point of detection from the source of
> the problem. That is, the verifier gets run on a module or function
> after its been built. By that point, the compiler's state has moved past

Well, you can run the verifier any time you want: there is a verifier
pass, or you can manually call verifyModule/verifyFunction. Aside from
that, I'm not sure exactly what we could add...

I was thinking about a little more immediate verification, like when an
instruction gets added to a basic block or a basic block gets added to a
function. I understand that this could be tricky because of unresolved
references, etc. In any event, moving the detection of the problem
closer to the source of the problem would be a real win. How, is the
question for which I don't have many ideas. *shrug*

> Now, here's the specific problem. When the verifier runs, I get:
>
> Stored value is not of right type for indices!
> void <badref>
> type[1024 x int] [1024 x int]

WOW. That is an absolutely horrible error message. I've fixed both the
error message itself (it should now read "Stored value type does not
match pointer operand type!"), and the things being printed. The first
one is supposed to print the Store instruction itself (allowing you to
actually find the thing), but didn't due to an unrelated change in the
verifier recently. :frowning:

Try it now.

Will do! Thanks!

> So, for example, if I want to add two numbers stored in memory, I add
> the two load instructions used to fetch the values. Right? Please say
> yes or I have to change my whole conception of how this works! :slight_smile:

yes! :slight_smile:

Phew!

> std::vector<Value*> indexVec; // Index vector
> indexVec.push_back(loadop); // Insert single index
> GetElementPtrInst* gep = new GetElementPtrInst(
> TheStack, indexVec );
> bb->getInstList().push_back( gep ); // Put GEP in Block

This is the problem. The deal here is that the 'getelementptr'
instruction actually takes one extra argument than you are giving it (this
is, by far, the most confusing aspect of the getelementptr instruction).
Lets say your stack is of type:

%Stack = ... <[100 x int] *>

If you index into this with a single "long" index, you will get this:

%Ptr = getelementptr [100 x int]* %Stack, long %X ; <[100 x int] *>

Where what you really want is this:

%Ptr = getelementptr [100 x int]* %Stack, long 0, long %X ; <int*>

The deal with the initial zero is that it is specifying how to index
_through the pointer_, which is often forgotten about. This allows LLVM
to translate this C code:

int *P = ...
P = P + 1;

into this LLVM code:

%P.1 = ...
%P.2 = getelementptr int* %P.1, long 1

Aha! *light blub just went on*

This probably won't make sense the first time you see it. Take a look at:
http://llvm.cs.uiuc.edu/docs/LangRef.html#i_getelementptr

Hey, I read that three times! I'm going to update it to more
specifically identify this "you gotta dereference the pointer first"
characteristic.

And remember that we are turning the '->' syntactic sugar into it's
equivalent [0] form.

Right .. good way to think about it.

If you have any other questions about this, please just let me know (and
try out the fixed verifier too :slight_smile:

-Chris

You'll be the first to know :slight_smile: .. and, I'm off to try the new verifier.

Thanks!