Using "opaque pointers" right now?

I was looking at the talk from 2015 about opaque pointers.

Aside from using the new methods (e.g. LLVMBuildGEP2), is there any other way to perpare for this change?

And also - is it possible to use something like opaque pointers (that is using a single pointer type) even before the switch has been flipped in LLVM?

Christoffer
AEGIK / www.aegik.se

Hi Christoffer,

Aside from using the new methods (e.g. LLVMBuildGEP2), is there any other way to perpare for this change?

The main one is to make sure you avoid using getElementType on
pointers. If you're writing a front-end this probably means you need
to keep your AST's representation of element types alongside LLVM
pointer Values in your own data-structures (beware, llvm::Type may not
be enough if that contains nested pointers). That lets you pass them
into the instructions that will need it (loads, GEPs, allocas etc).

And also - is it possible to use something like opaque pointers (that is using a single pointer type) even before the switch has been flipped in LLVM?

Not in current LLVM. I have a branch that has some work on them. I've
been meaning to upload, document and propose a real path forwards on
llvm-dev but other things have gotten in the way up to now. I think
there's quite a bit of infrastructure work just to make the bitcode
and IR layers accept opaque pointers, and after that many passes and
other components assert when they first encounter an opaque pointer.

I've not managed to fix all of those yet so I'm still a bit unsure how
things will work then, but I'd expect at least some correctness and
performance regressions to begin with. I'm hoping that work can happen
incrementally and during this phase I hope people would be able to
experiment and make sure they're ready.

Cheers.

Tim.

Hi Christoffer,

Aside from using the new methods (e.g. LLVMBuildGEP2), is there any other way to perpare for this change?

The main one is to make sure you avoid using getElementType on
pointers. If you’re writing a front-end this probably means you need
to keep your AST’s representation of element types alongside LLVM
pointer Values in your own data-structures (beware, llvm::Type may not
be enough if that contains nested pointers). That lets you pass them
into the instructions that will need it (loads, GEPs, allocas etc).

And also - is it possible to use something like opaque pointers (that is using a single pointer type) even before the switch has been flipped in LLVM?

Not in current LLVM. I have a branch that has some work on them. I’ve
been meaning to upload, document and propose a real path forwards on
llvm-dev but other things have gotten in the way up to now. I think
there’s quite a bit of infrastructure work just to make the bitcode
and IR layers accept opaque pointers, and after that many passes and
other components assert when they first encounter an opaque pointer.

Awesome :slight_smile: (I think last I really looked at this my next hurdle was going to be intrinsics that are sort of name-mangled overloaded based on parameter types, but I can see how adding a new opaque pointer type as part of a migration makes sense (maybe that’s what John McCall suggested after I was a fair way down the original migration path… maybe, I forget) to exist in parallel and move things over to)

pointers. If you're writing a front-end this probably means you need
to keep your AST's representation of element types alongside LLVM
pointer Values in your own data-structures

Yeah, that’s no problem - the type is needed for signed/unsigned integer distinctions anyway. There’s no getting around having one’s own type hierarchy.

And also - is it possible to use something like opaque pointers (that is using a single pointer type) even before the switch has been flipped in LLVM?

Not in current LLVM. I have a branch that has some work on them.

Anything I could have a look at to get a sneak peek?

I’ve been meaning to upload, document and propose a real path forwards on
llvm-dev but other things have gotten in the way up to now. I think
there's quite a bit of infrastructure work just to make the bitcode
and IR layers accept opaque pointers, and after that many passes and
other components assert when they first encounter an opaque pointer.

Interesting. Is there a timeline for this or is it more of a ”it’s done when it’s done”?

/Christoffer

> pointers. If you're writing a front-end this probably means you need
> to keep your AST's representation of element types alongside LLVM
> pointer Values in your own data-structures

Yeah, that’s no problem - the type is needed for signed/unsigned integer distinctions anyway. There’s no getting around having one’s own type hierarchy.

Excellent! Clang actually struggles there despite also having
signed/unsigned types, though it's kind of improving.

> Not in current LLVM. I have a branch that has some work on them.

Anything I could have a look at to get a sneak peek?

Sure, I've rebased it and spruced it up very slightly (it's still very
much developer's own tree rather than anything I'd want to be judged
for). There should be two branches at
https://github.com/TNorthover/llvm-project:

  * "opaque-ptr" has an LLVM that's mostly functional, but if you want
you can create opaque pointer types (via hand-written IR or hopefully
a front-end) to see how they fare.
  * "opaque-ptr-always" has a patch that forces every pointer
everywhere to be opaque (useful for flushing out issues). It has
thousands of test failures and hundreds of assertions in just "ninja
check". Though with only tens of root causes as far as I can tell.

Not that I think there's any reason to do so, but don't start relying
on hashes or commits. I am extremely likely to force push there.

If you're hand-writing IR I chose "pN" as the representation for the
type. I've attached an example .ll file I was working with to test
instructions one-by-one, but you might see something like:

    define i32 @load_element_42(p0 %in) {
      %addr = getelementptr i32, p0 %in, i32 42
      %val = load i32, p0 %in
      ret i32 %val
    }

The patches so far seem to have been in a few categories:

  * Adapt the the serialization to cope with opaque pointers (textual
IR <-> in-memory <-> bitcode).
  * Relax assertions and other checks so that individual instructions
can exist without a pointer element type.
  * Remove deprecated CreateCall, CreateLoad and CreateGEP functions
(ones that derive type from the Value input).
  * Fix passes & targets etc to (essentially) remove all uses of
getElementType, one by one.

That order is roughly how I'd see them being committed to LLVM (if
everyone agrees with me), not how they are in the branch. It should
allow for incremental testing of each phase, I think.

Interesting. Is there a timeline for this or is it more of a ”it’s done when it’s done”?

I think a timeline is well off; we'll have to wait to see what the
long tail looks like. Also, since we're forcing changes on front-ends,
there has to be at least a release cycle between announcement and code
becoming invalid for any of that to happen. But we ought to be able to
do that in parallel with development.

I'm hoping to get some consensus for a path forwards when I can gather
my thoughts into an llvm-dev message though. Now that it's all pretty
fresh in my memory again, it's probably a good time.

Cheers.

Tim.

opaque.ll (1.5 KB)

The patches so far seem to have been in a few categories:

  * Adapt the the serialization to cope with opaque pointers (textual
IR <-> in-memory <-> bitcode).
  * Relax assertions and other checks so that individual instructions
can exist without a pointer element type.
  * Remove deprecated CreateCall, CreateLoad and CreateGEP functions
(ones that derive type from the Value input).
  * Fix passes & targets etc to (essentially) remove all uses of
getElementType, one by one.

That order is roughly how I'd see them being committed to LLVM (if
everyone agrees with me), not how they are in the branch. It should
allow for incremental testing of each phase, I think.

I happened to look at this a while ago, and it seems to me that
removing the deprecated Create* functions and getPointerElementType as
early as possible is a good idea since it cannot be done mechanically,
is often quite painful (introducing architectural concerns even!), and
as long as those functions exist (without explicitly being labelled
with LLVM_ATTRIBUTE_DEPRECATED), there's a risk of new uses of them
being introduced.

Cheers,
Nicolai

I definitely agree. I had to tidy up quite a bit of new Clang code
that had been put in since I last worked on it when rebasing. I think
the earliest we could do it realistically would be after the January
branch, and that's what I proposed in the RFC I posted.

Cheers.

Tim.