Variable declarations vs. definitions

I have yet another question that I believe also stems from deep
ignorance of the linkage types. How do you declare a global variable
without defining it? The IR ref. clearly indicates that you can do
this, but it looks like one of the many "too obvious to mention" things
that I struggle with. It's easy with functions, of course: "declare
@foo" in the header and "define @foo" in the module just like in C. But
it turns out I have avoided having to learn to do it with variables
until now, when I decided to play with invoke/unwind and see if I could
make a primitive exception mechanism to unwind the stack on my recursive
parser when an error is encountered.

In fact I could avoid it now, but the purpose is to learn as much of the
IR as possible, not use the subset of the language I already understand.

To be clear: remember I'm using Stone Knives here. :slight_smile: I have to figure
out how to make a global variable defined in one translation unit
visible in another (with a header, just like in C). I could easily make
an exception module with accessor functions, and that is likely the
better software engineering solution, but again my real goal is to learn
as much as possible (I don't need exceptions in the parser, either, but
I want to understand them).

Dustin

The syntax isn't entirely obvious... usually, when you're wondering
how to write something in IR, the easiest thing to so is write the
equivalent C code, then use http://llvm.org/demo/index.cgi to see what
it looks like in iR. In this case, try plugging the following snippet
in:

extern int x;
int *y = &x;

-Eli

The equivalent of "extern int G;" is:

@G = external global i32

-Chris

Hello, Dustin

To be clear: remember I'm using Stone Knives here. :slight_smile:

In some cases it's better to realize that it's year 2010 now. :slight_smile:
Just write small snippet in C and use llvm-gcc -emit-llvm to emit LLVM
IR corresponding to it. You can use e.g. http://llvm.org/demo/ for
this.

The syntax isn't entirely obvious...

Thanks, if it doesn't seem as stupid as I feel, I'll grovel less. :slight_smile:

...usually, when you're wondering
how to write something in IR, the easiest thing to so is write the
equivalent C code, then use http://llvm.org/demo/index.cgi to see what
it looks like in iR.

So *that's* why there is a web demo. :slight_smile:

I think I've actually blocked out the C rules because I've been
religiously hiding all variables behind C++ accessors for so long now.
I think the last time I made a practice of declaring variables in .h
files ANSI C was too newfangled to depend on compiler support. :slight_smile:

In fact I wouldn't now except my goal is knowledge and I'm willing to
bend design rules a bit to make sure I try new things. That is part of
the motivation behind my plan to use invoke/unwind in the parser, in
fact. It'll make for nicer code, but I'm so used to propagating errors
back up the stack that I'm used to the pain.

If I do that well enough, anytime I want to remember how to do something
I can go back and see how I did it on this project.

Dustin

OK, then I want to whine a little bit about how that is more obscurely
hinted at than discussed. Whine, whine.... :slight_smile:

Even knowing the word to search on, the only explicit application of the
keyword to data is incidental to an example about structures. I think I
feel less bad about having bounced that to the list.

I'm amazed that the list of linkage types doesn't mention it somewhere.
I tried 'common' among other things, but I admit it was a just a
desperate shot in the dark before I just gave up and asked.

Dustin

Hello, Dustin

To be clear: remember I'm using Stone Knives here. :slight_smile:

In some cases it's better to realize that it's year 2010 now. :slight_smile:

What do you have against digital primitive living? :slight_smile:

Actually, there is sort of some truth to that joking phrase. I am sort
of treating this as a conceptual analog of the old days where your first
task with your new 8-bit "personal computer" was to fire up the
assembler and start writing your system. If people could do that I
should be man enough to do something simpler like write a minimal lisp
or forth in the far more congenial LLVM IR, right?

Just write small snippet in C and use llvm-gcc -emit-llvm to emit LLVM
IR corresponding to it. You can use e.g. http://llvm.org/demo/ for
this.

I'll make a note to try that before giving up next time. Actually my
self-imposed rules don't restrict me using any tool for understanding.
It's not a cheat if I learn it well enough to apply it by hand.

Kind of like writing a paper--you can (and should) steal ideas and style
from Shakespeare, Isaiah, and Abraham Lincoln, you just can't steal
their words. :slight_smile:

In theory, I'll understand LLVM's machine model pretty well when I'm done.

Dustin

Patches welcome!

-Chris

Well, the time I'm motivated to write about something is when I'm
learning and it's relevant to me. But the IR Reference is a reference,
and should be rigorously correct. I can't be that, most of all on this
linkage stuff. For example, the 'external' keyword almost doesn't
appear in the manual, but how and where precisely it should be described
I probably don't know enough to say. Does it count as a linkage
specification?

The single thing that would make the IR reference more useful would be
more examples, since I often couldn't quite reverse engineer the details
from the description and there was no example, or the given example
didn't cover my use case. If you are willing to devote more space to
expanding the example snippets, perhaps that's doable for an amateur.
There certainly are plenty in my source tree.

If I'm the only guy on the planet writing extensive IR by hand, perhaps
the issues I want to resolve are uncommon ones. Who else worries about
how to declare global variables in preprocessed header files? :slight_smile: If
you're machine-generating source, I imagine you don't need header files.
You just emit the declarations as many times as you need them. As I am
not a machine, I need consistency and "once and only once" definitions.

Dustin

Hmm. Is it really? This

@foo = external global i32

@foo = global i32 5

define i32
@main(i32 %argc, i8 **%argv)
{
    %fooVal = load i32* @foo

    ret i32 %fooVal
}

produces a "redefinition of global '@foo'" error. But this

extern int x;

int x = 5;

int *y = &x;

compiles to

@x = global i32 5 ; <i32*> [#uses=1]
@y = global i32* @x ; <i32**> [#uses=0]

The difference is crucial, because I want to put

"@foo = external global i32"

in a header file that is then #included in every module where it is
used, *including the defining module* (for a consistency check, and
because otherwise I'd have to create extra headers that are only
#included by the outside world but not the defining module). It appears
that the front end is supposed to decide between external and global.
In my case I actually could maintain all declarations by hand for one
global word used to return exception information, but that wouldn't work
for a more involved case.

Is there no way to get the same effect as with define/declare for
functions? There I have no problem #includeing a declaration into the
same file as a definition.

The alternative appears to be asking the linker to do it. The docs for
"linkonce" say "This is typically used to implement inline functions,
templates, or other code which must be generated in each translation
unit that uses it." That sounds like my case, and it compiles, but I
don't know if that's gong to get me into trouble.

Dustin

The equivalent of "extern int G;" is:

@G = external global i32

Hmm. Is it really?

Yes.

  But this

extern int x;

int x = 5;

The equivalent of that is:

@x = global i32 5

I made no claim that 'external' in LLVM has the same semantics as 'extern' in C.

The difference is crucial, because I want to put

"@foo = external global i32"

in a header file that is then #included in every module where it is
used, *including the defining module* (for a consistency check, and

LLVM IR is not C, and it is not designed for #includes or other related horrible C constructs.

-Chris

Hmm. Is it really?

Yes.

I guess we have different ideas of what "equivalent" means. The
behavior suggests that "external" is the LLVM construct a front-end
would use to implement C-type "extern," which is precisely the kind of
knowledge I am after but not how I would use "equivalent."

But too many years of thinking about stuff with names like
"diffeomorphism" may have somewhat altered my usage of "equivalent" from
the mainstream and toward the mathematical. :slight_smile:

I made no claim that 'external' in LLVM has the same semantics as
'extern' in C.

And the semantics available in LLVM is the sort of knowledge I seek.
Apparently, the program is working. :slight_smile:

LLVM IR is not C, and it is not designed for #includes or other
related horrible C constructs.

Well, there is no question about the horribleness of having to maintain
parallel declarations in a different file from your definitions, or any
of the other hideous consequences of pure textual manipulation by a tool
without syntactic knowledge. But the problem from the standpoint of the
problem I chose to solve is that there is *no* solution within LLVM. As
bad as CPP is, not having it is far worse unless you do the Right Thing
and extend C to make it unnecessary. The analogous situation is that I
work with what LLVM itself provides, and therefore as bad as #include is
the alternative would be to have hand-maintained parallel declarations
of a function in ever file that calls it.

Note that this is *not* intended as a criticism of LLVM. I am perfectly
aware that it was not designed for hand-coding and expects the front-end
to implement whatever tools are necessary. A front-end could generate
such declarations easily, for example. So I seem to have achieved what
I sought--knowledge. I did not know before where the semantics of C's
"extern" were implemented. Now I do--they are implemented by the
front-end, which is responsible for deciding in which module to emit the
definition instead of the declaration.

In fact, much of the fun in life comes from abusing tools for uses they
were never intended for. :slight_smile: So far, I still think I am learning more
about LLVM IR by doing this than I would by any other means. I think I
understand by reading, but very often I do *not* until I actually try to
use it. As in this case.

But I did suggest one preprocessor-free alternative for this particular
case: just stick the definition in every source file that needs it and
rely on the linker to merge all references. The docs suggest that is
what "linkonce" is for, but I have already learned I know nothing about
linkage. I can't tell the practical usage difference between linkonce
and any of the other linkages that merge definitions. Which, therefore,
is another opportunity to learn. Which, if any, is most appropriate for
this sort of thing? The goal is learning the idiomatic means, not just
finding a workable kludge.

Dustin