non-temporal loads and stores

I am trying to understand how the __builtin_nontemporal_load() and __builtin_nontemporal_store() are supposed to work.

When called with a global pointer they seem to do a double-indirection thingy.

Here is a simple test program:

char foo(char* p) {
     return __builtin_nontemporal_load(p);
}

void bar(char* p, char a) {
     __builtin_nontemporal_store(a, p);
}

char* p;

char baz() {
     return __builtin_nontemporal_load(p);
}

void boo(char a) {
     __builtin_nontemporal_store(a, p);
}

And here is the LLVM IR:

define signext i8 @foo(i8* nocapture readonly %p) #0 {
entry:
   %0 = load i8, i8* %p, align 1, !tbaa !1, !nontemporal !4
   ret i8 %0
}

define void @bar(i8* nocapture %p, i8 signext %a) #1 {
entry:
   store i8 %a, i8* %p, align 1, !tbaa !1, !nontemporal !4
   ret void
}

@p = common global i8* null, align 8

define signext i8 @baz() #0 {
entry:
   %0 = load i8*, i8** @p, align 8, !tbaa !5
   %1 = load i8, i8* %0, align 1, !tbaa !1, !nontemporal !4
   ret i8 %1
}

define void @boo(i8 signext %a) #1 {
entry:
   %0 = load i8*, i8** @p, align 8, !tbaa !5
   store i8 %a, i8* %0, align 1, !tbaa !1, !nontemporal !4
   ret void
}

Why do baz() and boo() load indirectly?

Apologies if I'm missing something blindingly obvious here :wink:

Hi Will,

Declaring a global "char* p;" allocates memory for a pointer to a char. Using the identifier "p" in an expression loads a pointer from this memory location. The loaded pointer is then passed to __builtin_nontemporal_load() / __builtin_nontemporal_store() in your examples.

A simpler example:

char *p;
char *test() {
     return p;
}

is compiled to:

@p = common global i8* null, align 8
define i8* @test() {
   %1 = load i8*, i8** @p, align 8
   ret i8* %1
}

-Manuel