How to extend llvm IR and frontend?

Hi all,

Hypothetically, suppose I have a generic system with multiple address spaces such that each address space is accessed using different instructions.

Now suppose, I wanted to add a new keywords ‘foo’ and ‘bar’ to the front of c variables and function return types such that the following would be valid:

foo void* a;
foo void* somefunc(){…}

bar int b;
int somefunc2(bar int*){…}

Furthermore, if I wanted this keyword to be added as part of the LLVM IR in some manner such that the during the translation of LLVM IR to some target machine ASM it could be used to switch which address spaces are used (e.g. if I specify one of the keywords, I get some instruction or group of instructions to access a certain address space), what would be the correct/easiest place(s) to add this functionality, and would there be existing code that does something similar that can be used as a starting point?

More Info: suppose this uses clang as a frontend.

Thanks,
Aaron Landwehr

I would think clang would be more appropriate for adding a c-declaration descriptor. However, assuming that no translation down to llvm exists, you would need to create the information as either metadata (preferred) or as a new type (ugly and time consuming). If you can keep it as metadata, then some later passes may gather that meta-data and execute properly. A new type requires much in the way of hacking other passes to inform them what to do upon seeing the new type.

  • My 2 cents,
    Jeff Kunkel

The LLVM address space qualifiers will get you at least part of the way. I'd suggest starting your search there.

More generally speaking, note that in the front end you really want this sort of thing to be (syntactically) a type qualifier so you can support pointers back and forth between the address spaces in a natural way. You'll also want a generic pointer type (can reference any address space) to use for unqualified pointers. Otherwise you can't make a conformant C compiler.

-Jim

Hypothetically, suppose I have a generic system with multiple address spaces
such that each address space is accessed using different instructions.
Now suppose, I wanted to add a new keywords 'foo' and 'bar' to the front of
c variables and function return types such that the following would be
valid:
foo void* a;
foo void* somefunc(){...}
bar int b;
int somefunc2(bar int*){...}

How about putting

  #define foo __attribute__((address_space(256)))
  #define bar __attribute__((address_space(257)))

in some header? (Or on the command line, or in clang's default #defines, ...)
Though maybe they'd need to be in a slightly different place; the
example in the documentation
(http://clang.llvm.org/docs/LanguageExtensions.html) is

  #define GS_RELATIVE __attribute__((address_space(256)))
  int foo(int GS_RELATIVE *P) {
    return *P;
  }

so maybe it won't work if they're in front of the declaration.

And of course, they're not technically keywords...

Furthermore, if I wanted this keyword to be added as part of the LLVM IR in
some manner such that the during the translation of LLVM IR to some target
machine ASM it could be used to switch which address spaces are used (e.g.
if I specify one of the keywords, I get some instruction or group of
instructions to access a certain address space), what would be the
correct/easiest place(s) to add this functionality, and would there be
existing code that does something similar that can be used as a starting
point?

The x86 frontend does something like this for fs and gs segment
specifiers, but those just add a prefix to the instruction so they
might not actually be different instructions (I'm not sure).

That would work fine. The attribute works like a normal qualifier: just like 'int const *'
and 'const int *' both mean "pointer to const int", 'GS_RELATIVE int *' and
'int GS_RELATIVE *' both mean "pointer to int object in address space 256".
Whereas 'int * GS_RELATIVE *' means "pointer to pointer object in address
space 256 that points to int object in generic address space".

John.