Enhancing BasicAliasAnalysis for Rust

Since Rust references usually never aliases it would be nice if we could exploit this for code generation. One idea I had to improve alias analysis is to insert metadata on pointer loads. !inheritalias could be added to load instructions to indicate that if we know all the aliases of the pointer we load from, then we also know all the aliases of the pointer value loaded. Given that pointer loads are common and things are likely marked with noalias in Rust, this seems like useful metadata. It could also for example apply to loading C++'s unique_ptr fields.

I’m wondering what the feasibility of extending BasicAliasAnalysis to utilize the proposed metadata would be.

Did you mean to phrase this as a proposed approach and ask for feasibility?
Or are you more interested in the question “Rust has these language semantics, how can I get LLVM to best exploit this?”? LLVM may have an existing mechanism which would serve your needs.

Do you have some concrete examples of IR produced by the Rust frontend that you are seeing LLVM not optimize as well as it could?

– Sean Silva

Did you mean to phrase this as a proposed approach and ask for feasibility?

Yes.

Or are you more interested in the question "Rust has these language
semantics, how can I get LLVM to best exploit this?"? LLVM may have an
existing mechanism which would serve your needs.

I am interested in that as well. Rust has the very nice property where
references never alias, with the exception of references to UnsafeCell.

Do you have some concrete examples of IR produced by the Rust frontend
that you are seeing LLVM not optimize as well as it could?

An example my proposed approach would solve if the Rust frontend was
modified to output !inheritalias metadata:

!0 = metadata !{}

define void @foo(i8** noalias nocapture readonly %g) {
  %1 = load i8** %g, align 8, !inheritalias !0
  %2 = load i8* %1, align 1
  tail call void @bar(i8 signext %2)
  %3 = load i8** %g, align 8, !inheritalias !0
  %4 = load i8* %3, align 1
  tail call void @bar(i8 signext %4)
  ret void
}

declare void @bar(i8 signext)

With this metadata LLVM could infer that %1 and %3 doesn't alias with
anything since %g doesn't and optimize @foo to:

define void @foo(i8** noalias nocapture readonly %g) {
  %1 = load i8** %g, align 8, !inheritalias !0
  %2 = load i8* %1, align 1
  tail call void @bar(i8 signext %2)
  tail call void @bar(i8 signext %2)
  ret void
}

From: "John Kåre Alsaker" <john.mailinglists@gmail.com>
To: "Sean Silva" <chisophugis@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Tuesday, July 29, 2014 6:10:23 PM
Subject: Re: [LLVMdev] Enhancing BasicAliasAnalysis for Rust

Did you mean to phrase this as a proposed approach and ask for
feasibility?
Yes.

Or are you more interested in the question "Rust has these language
semantics, how can I get LLVM to best exploit this?"? LLVM may have
an existing mechanism which would serve your needs.
I am interested in that as well. Rust has the very nice property
where references never alias, with the exception of references to
UnsafeCell.

Do you have some concrete examples of IR produced by the Rust
frontend that you are seeing LLVM not optimize as well as it could?
An example my proposed approach would solve if the Rust frontend was
modified to output !inheritalias metadata:

!0 = metadata !{}

define void @foo(i8** noalias nocapture readonly %g) {
%1 = load i8** %g, align 8, !inheritalias !0
%2 = load i8* %1, align 1
tail call void @bar(i8 signext %2)
%3 = load i8** %g, align 8, !inheritalias !0
%4 = load i8* %3, align 1
tail call void @bar(i8 signext %4)
ret void
}

declare void @bar(i8 signext)

With this metadata LLVM could infer that %1 and %3 doesn't alias with
anything since %g doesn't and optimize @foo to:

define void @foo(i8** noalias nocapture readonly %g) {
%1 = load i8** %g, align 8, !inheritalias !0
%2 = load i8* %1, align 1
tail call void @bar(i8 signext %2)
tail call void @bar(i8 signext %2)
ret void
}

Hi John,

To get you started, here's how I imagine such a scheme might be implemented: At the beginning of the BasicAliasAnalysis::aliasCheck function, we call GetUnderlyingObject on both pointers, and if the underlying objects are different, and if they're both 'identified' objects (as determined by the isIdentifiedObject function), then we return NoAlias.

You might create a wrapper around GetUnderlyingObject that will look through loads tagged with your metadata, but increase some kind of depth counter each time this is done. Then, instead of comparing the underlying objects, you instead compare the (underlying object, depth) tuples and make a decision that way (returning NoAlias only if both differ and both underlying objects are identified).

I can imagine a capability like this could be generally useful, but I can also imagine it not being generally useful, so I encourage you to experiment.

You might find, for example, that what you really want is field sensitivity (because you have a pointer to a structure of noalias pointers), and you might find the GetPointerBaseWithConstantOffset function useful in dealing with that.

Please follow-up with us and let us know what you find to be most useful.

-Hal

I’ve thought about this a lot, but haven’t started to implement anything yet.

I assume that you’re referring to the aliasing rules on borrowed pointers:

1) Accesses of paths derived from an &mut borrow have no aliases in the scope of the borrow

2) Writes to direct paths of local variables have no aliases in the most narrow borrow scope containing the write.

3) Paths derived from an & borrow aren’t modified in the scope of the borrow, unless the path involves an UnsafeCell (formerly Unsafe).

I think that it should be possible to model these using the noalias / alias.scope metadata:

http://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata

There are some questions about using this to model C’s exact restrict semantics, but they shouldn’t apply here. You should be able to translate Rust’s borrow scopes into alias scopes. Currently borrow scopes are nested lexical scopes, but soon they will be single-entry / multiple-exit control-flow regions. Either way it shouldn’t be a problem.

There are a few caveats:

1) A lot of Rust’s memory accesses ultimately occur via raw *T pointers, and we still have to treat an *T as being able to alias anything.

2) We use a lot of memcpys in Rust for moves and copies. I’m not entirely sure how this works with memcpy. If you put both the src and dst scope on a memcpy intrinsic, it will look like the memcpy is modifying the src in addition to the dst. That could be a problem down the line.

If you start working on it and want to talk about this in more detail than is suitable for LLVMdev, feel free to email me or find me on #rust as zwarich.

Cameron