Basic pointer questions

I’m trying to learn how pointers work using the C API but I can’t find any examples. Here’s what I have so far but it doesn’t work as expected (built using the C API builder functions).

The idea is is that “g” points to “v” (i.e, takes address of “v”) and then “g” is dereferenced and passes its value to the function return. How is this supposed to work?

@g = global ptr null
@v = global i32 4

define i32 @main() {
entry:
  store ptr @v, ptr @g, align 8
  %0 = load i32, ptr @g, align 4
  ret i32 %0
}

The IR you’ve written is equivalent to this C:

void *g;
int v = 4;

int main() {
  g = &v;
  return (int)g;
}

So you need an extra load to get @v back before you try to load 4:

  %tmp = load ptr, ptr @g, align 8
  %0 = load i32, ptr %tmp, align 4

It should ring alarm bells that you’re storing a ptr @v to some location and then trying to load back an i32 from exactly the same place, that’s never going to work.

Excellent this works now! So the dereference is a two part process. The idea of loading “g” as a pointer (load ptr, ptr @g, align 8) seems like a needless copy but this is like copying the value I guess? I thought the “load to i32” was that value-copy but that seems to be a mere pointer cast, which is more confusing because I see lots of BuildCast* functions which I thought did this.

It is needless, an optimizer will forward @v directly on to the final load if you run it. But you can write IR that contains redundant copies and so on (and front-ends usually do because it’s easier).

load and store always mean a memory access of some kind in LLVM, casts generally don’t. It may help to think about what’s going on in those terms and single-step through the program examining registers & memory (either mentally or in a real debugger).

For example, start with:

@g is at 0x1000 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
@v is at 0x1008: 0x04 0x00 0x00 0x00

Execute store ptr @v, ptr @g, the address of @v is written to @g’s storage:

@g is at 0x1000: 0x08 0x10 0x00 0x00 0x00 0x00 0x00 0x00
@v is at 0x1008: 0x04 0x00 0x00 0x00 (unchanged)

(Wrongly) execute %0 = load i32, ptr @g. 4 bytes are loaded from @g’s storage, 0x08 0x10 0x00 0x00 so %0 is 0x1008 (as an i32). This is reinterpreting what we stored as a pointer and pretending it’s an i32, very rarely what you actually want to do.

Back on track:

Execute %tmp = load ptr, ptr @g. A pointer is read from memory at @g’s storage, %tmp is 0x1008. This (as we planned) is the address of @v’s storage.

Execute %0 = load i32, ptr %tmp. 4 bytes are read from memory at 0x1008 which (as we carefully arranged) is @v’s storage so the result is 4.