Implementing devirtualization

Hello all,

Our compilers class has been using LLVM, and a partner and I decided to implement devirtualization of virtual C++ calls in LLVM as a class project. We quickly realized that existing debug metadata generated by Clang didn’t give us enough info to (precisely) implement this, and as such have already begun modifying Clang to insert such metadata. However, for devirtualization we also need to reconstruct the class hierarchy, which we also seek to do through metadata. There appears to be sufficient metadata to do this, but we can’t seem to figure out how to actually access this metadata and successfully reconstruct the class hierarchy. Can anyone help with this?

We’re also open to alternative approaches, but we’d like to stay in LLVM IR as much as possible.

Thanks.
-Vitor

Some of this is already done by LLVM/Clang - do you have particular
cases that aren't being devirtualized that you want to focus on?

For some brief background reading you might want to take a look at these bugs:
http://llvm.org/bugs/show_bug.cgi?id=3100
http://llvm.org/bugs/show_bug.cgi?id=810

Which are things I (& others more experienced than myself) have posted
some thoughts on. If you're interested in pursuing PR810 then I have
some code that Nick Lewycky passed on to me involving his experiments
in this domain. & I'm also going to CC Eric Christopher because he
mentioned he'd had some thoughts on how to achieve this (the general
problem described in 810 about how to pass assumptions/facts from the
frontend to the backend) & I never got around to asking him about the
details.

This approach should stay even more in LLVM IR than your proposed
solution of metadata or debug info, but it may have
limitations/problems that your proposed approach does not - so I
certainly wouldn't rule anything out just yet.

- David

Vitor,

Just out of curiosity, is there any chance you could provide a link to
your course page or other related information? I've been advocating
that my own CS department start using LLVM as a tool for teaching
compilers, and I'd like an example I could show to the faculty.

Alternately: does anyone else know of a CS department somewhere that's
had success with using LLVM or Clang to teach compilers?

Thanks,

- --Benjamin Schulz

We’ve got the following test case:

class A {
public:
int x;
A(int x) : x(x) {}
int hoo() {return 4;}
virtual int foo() {return x;}
virtual int goo() {return foo()+10;}
virtual int operator+(A &a) {
return x + a.x;
}
};

class B : public A {
public:
B(int x) : A(x) {}
int hoo() {return 2;}
virtual int foo() {return A::foo()*2;}
};

int main() {
A* a = new A(1);
B* b = new B(2);
int y = a->foo() + b->goo() + a->hoo() + b->hoo() + (*a + *b);
delete a; delete b;
return y;
}

Clang with -O4 -S -emit-llvm emits the following for main():

define i32 @main() {
%1 = tail call noalias i8* @_Znwm(i64 16), !dbg !70
%2 = bitcast i8* %1 to %class.A*, !dbg !70
tail call void @llvm.dbg.value(metadata !{%class.A* %2}, i64 0, metadata !68)
tail call void @llvm.dbg.value(metadata !71, i64 0, metadata !69)
tail call void @llvm.dbg.value(metadata !{%class.A* %2}, i64 0, metadata !66)
tail call void @llvm.dbg.value(metadata !71, i64 0, metadata !67)
%3 = bitcast i8* %1 to i32 (…)***
store i32 (…)** bitcast (i8** getelementptr inbounds ([5 x i8*]* @_ZTV1A, i64 0, i64 2) to i32 (…)), i32 (…)* %3, align 8
%4 = getelementptr inbounds i8* %1, i64 8
%5 = bitcast i8* %4 to i32*
store i32 1, i32* %5, align 4, !tbaa !72
tail call void @llvm.dbg.value(metadata !{%class.A* %2}, i64 0, metadata !49), !dbg !70
%6 = tail call noalias i8* @_Znwm(i64 16), !dbg !75
tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !57)
tail call void @llvm.dbg.value(metadata !76, i64 0, metadata !58)
tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !59)
tail call void @llvm.dbg.value(metadata !76, i64 0, metadata !60)
tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !66)
tail call void @llvm.dbg.value(metadata !76, i64 0, metadata !67)
%7 = getelementptr inbounds i8* %6, i64 8
%8 = bitcast i8* %7 to i32*
store i32 2, i32* %8, align 4, !tbaa !72
%9 = bitcast i8* %6 to i8***
store i8** getelementptr inbounds ([5 x i8*]* @_ZTV1B, i64 0, i64 2), i8*** %9, align 8
tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !52), !dbg !75
%10 = bitcast i8* %1 to i32 (%class.A*), !dbg !77
%11 = load i32 (%class.A
)
* %10, align 8, !dbg !77
%12 = load i32 (%class.A*)** %11, align 8, !dbg !77
%13 = tail call i32 %12(%class.A* %2), !dbg !77
%14 = bitcast i8* %6 to %class.A*, !dbg !77
%15 = bitcast i8* %6 to i32 (%class.A*), !dbg !77
%16 = load i32 (%class.A
)
* %15, align 8, !dbg !77
%17 = getelementptr inbounds i32 (%class.A*)** %16, i64 1, !dbg !77
%18 = load i32 (%class.A*)** %17, align 8, !dbg !77
%19 = tail call i32 %18(%class.A* %14), !dbg !77
%20 = bitcast i8* %1 to i32 (%class.A*, %class.A*), !dbg !77
%21 = load i32 (%class.A
, %class.A
)
** %20, align 8, !dbg !77
%22 = getelementptr inbounds i32 (%class.A*, %class.A*)** %21, i64 2, !dbg !77
%23 = load i32 (%class.A*, %class.A*)** %22, align 8, !dbg !77
%24 = tail call i32 %23(%class.A* %2, %class.A* %14), !dbg !77
%25 = add i32 %13, 6, !dbg !77
%26 = add i32 %25, %19, !dbg !77
%27 = add i32 %26, %24, !dbg !77
tail call void @llvm.dbg.value(metadata !{i32 %27}, i64 0, metadata !54), !dbg !77
%28 = icmp eq i8* %1, null, !dbg !78
br i1 %28, label %30, label %29, !dbg !78

; :29 ; preds = %0
tail call void @_ZdlPv(i8* %1) nounwind, !dbg !78
br label %30, !dbg !78

; :30 ; preds = %29, %0
%31 = icmp eq i8* %6, null, !dbg !78
br i1 %31, label %33, label %32, !dbg !78

; :32 ; preds = %30
tail call void @_ZdlPv(i8* %6) nounwind, !dbg !78
br label %33, !dbg !78

; :33 ; preds = %32, %30
ret i32 %27, !dbg !79
}

It’s a bit long-winded, but from looking at the code it’s clear that no virtual calls are actually necessary, yet Clang and LLVM generated both of them.

In particular, we seek to implement the sort of analysis for devirtualization by Sonajalg et al in http://www.cs.ut.ee/~varmo/seminar/sem09S/final/s6najalg.pdf but in C++, even if all we can get is a more conservative approximation in most cases. It’s basically a lot of type analysis, and involves querying properties about types -lower- in the hierarchy than the declared or instantiated type, which doesn’t seem to be an operation supported by Clang or debug metadata directly, so the only option (as we see it) is to build a custom representation of the class hierarchy from data we -do- have access to which allows it. Well, or generate even more metadata.

It’s unclear to me how much assert/assume features would help. I can see it as useful for simplifying the process of determining how much precision is needed (e.g. in a file-scoped function, we know that its arguments can’t come from somewhere external and hence could actually determine what the “lowest” type arguments can be), but it’s unclear how this per se helps with obtaining type information, since the -g flag seems to generate sufficient data, but with no clear way to access it. It could just be myopia and ignorance on my part, though.

Thanks for your help,
-Vitor

Try with LLVM trunk; with some recent changes, LLVM's GVN is now a bit
more powerful in situations like this.

-Eli

Yep, looks better now (still lots of work done to call new/delete - I
guess I'd have to LTO that with the standard library to get them to go
away), it boils down to "ret i32 24"

Here's the full details:

; ModuleID = 'devirt.cpp'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%class.B = type { %class.A }
%class.A = type { i32 (...)**, i32 }

@_ZTV1B = linkonce_odr unnamed_addr constant [5 x i8*] [i8* null, i8*
bitcast ({ i8*, i8*, i8* }* @_ZTI1B to i8*), i8* bitcast (i32
(%class.B*)* @_ZN1B3fooEv to i8*), i8* bitcast (i32 (%class.A*)*
@_ZN1A3gooEv to i8*), i8* bitcast (i32 (%class.A*, %class.A*)*
@_ZN1AplERS_ to i8*)]
@_ZTVN10__cxxabiv120__si_class_type_infoE = external global i8*
@_ZTS1B = linkonce_odr constant [3 x i8] c"1B\00"
@_ZTVN10__cxxabiv117__class_type_infoE = external global i8*
@_ZTS1A = linkonce_odr constant [3 x i8] c"1A\00"
@_ZTI1A = linkonce_odr unnamed_addr constant { i8*, i8* } { i8*
bitcast (i8** getelementptr inbounds (i8**
@_ZTVN10__cxxabiv117__class_type_infoE, i64 2) to i8*), i8*
getelementptr inbounds ([3 x i8]* @_ZTS1A, i32 0, i32 0) }
@_ZTI1B = linkonce_odr unnamed_addr constant { i8*, i8*, i8* } { i8*
bitcast (i8** getelementptr inbounds (i8**
@_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2) to i8*), i8*
getelementptr inbounds ([3 x i8]* @_ZTS1B, i32 0, i32 0), i8* bitcast
({ i8*, i8* }* @_ZTI1A to i8*) }
@_ZTV1A = linkonce_odr unnamed_addr constant [5 x i8*] [i8* null, i8*
bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (i32 (%class.A*)*
@_ZN1A3fooEv to i8*), i8* bitcast (i32 (%class.A*)* @_ZN1A3gooEv to
i8*), i8* bitcast (i32 (%class.A*, %class.A*)* @_ZN1AplERS_ to i8*)]

define i32 @main() uwtable {
invoke.cont3:
  %call = tail call noalias i8* @_Znwm(i64 16)
  %0 = bitcast i8* %call to i32 (...)***
  store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]*
@_ZTV1A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8
  %x2.i.i = getelementptr inbounds i8* %call, i64 8
  %1 = bitcast i8* %x2.i.i to i32*
  store i32 1, i32* %1, align 4, !tbaa !0
  %call1 = tail call noalias i8* @_Znwm(i64 16)
  %2 = bitcast i8* %call1 to i32 (...)***
  %x2.i.i.i = getelementptr inbounds i8* %call1, i64 8
  %3 = bitcast i8* %x2.i.i.i to i32*
  store i32 2, i32* %3, align 4, !tbaa !0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]*
@_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %2, align 8
  %isnull = icmp eq i8* %call, null
  br i1 %isnull, label %delete.end, label %delete.notnull

delete.notnull: ; preds = %invoke.cont3
  tail call void @_ZdlPv(i8* %call) nounwind
  br label %delete.end

delete.end: ; preds =
%delete.notnull, %invoke.cont3
  %isnull16 = icmp eq i8* %call1, null
  br i1 %isnull16, label %delete.end18, label %delete.notnull17

delete.notnull17: ; preds = %delete.end
  tail call void @_ZdlPv(i8* %call1) nounwind
  br label %delete.end18

delete.end18: ; preds =
%delete.notnull17, %delete.end
  ret i32 24
}

declare noalias i8* @_Znwm(i64)

declare void @_ZdlPv(i8*) nounwind

define linkonce_odr i32 @_ZN1B3fooEv(%class.B* nocapture %this)
nounwind uwtable readonly align 2 {
entry:
  %x.i = getelementptr inbounds %class.B* %this, i64 0, i32 0, i32 1
  %0 = load i32* %x.i, align 4, !tbaa !0
  %mul = shl nsw i32 %0, 1
  ret i32 %mul
}

define linkonce_odr i32 @_ZN1A3gooEv(%class.A* %this) uwtable align 2 {
entry:
  %0 = bitcast %class.A* %this to i32 (%class.A*)***
  %vtable = load i32 (%class.A*)*** %0, align 8
  %1 = load i32 (%class.A*)** %vtable, align 8
  %call = tail call i32 %1(%class.A* %this)
  %add = add nsw i32 %call, 10
  ret i32 %add
}

define linkonce_odr i32 @_ZN1AplERS_(%class.A* nocapture %this,
%class.A* nocapture %a) nounwind uwtable readonly align 2 {
entry:
  %x = getelementptr inbounds %class.A* %this, i64 0, i32 1
  %0 = load i32* %x, align 4, !tbaa !0
  %x2 = getelementptr inbounds %class.A* %a, i64 0, i32 1
  %1 = load i32* %x2, align 4, !tbaa !0
  %add = add nsw i32 %1, %0
  ret i32 %add
}

define linkonce_odr i32 @_ZN1A3fooEv(%class.A* nocapture %this)
nounwind uwtable readonly align 2 {
entry:
  %x = getelementptr inbounds %class.A* %this, i64 0, i32 1
  %0 = load i32* %x, align 4, !tbaa !0
  ret i32 %0
}

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}

We wouldn't really need LTO, just a bit of knowledge of the builtin
operator new and delete (one easy optimization would be to eliminate
the null checks because _Znwm never returns null.)

-Eli

Vitor Luis Menezes wrote:

Hello all,

Our compilers class has been using LLVM, and a partner and I decided to
implement devirtualization of virtual C++ calls in LLVM as a class
project. We quickly realized that existing debug metadata generated by
Clang didn't give us enough info to (precisely) implement this, and as
such have already begun modifying Clang to insert such metadata.
However, for devirtualization we also need to reconstruct the class
hierarchy, which we also seek to do through metadata. There appears to
be sufficient metadata to do this, but we can't seem to figure out how
to actually access this metadata and successfully reconstruct the class
hierarchy. Can anyone help with this?

We're also open to alternative approaches, but we'd like to stay in LLVM
IR as much as possible.

Implement field-sensitive interprocedural sparse conditional constant propagation and it will devirtualize.

The reason I don't like having an explicit devirtualization pass is that it requires a large amount of analysis (interprocedural, no less!) and is strictly single-purpose. The way that LLVM does devirtualization now is through a series of tons of tiny steps:

  * the very last stage in devirt is simply converting the load from a constant to the loaded constant:
    - but sometimes the frontend wouldn't normally emit the vtable at all (the ABI wouldn't require it to), so we wouldn't see the constant to propagate. We added a new linkage type to LLVM, "available_externally" designed to let us capture this extra data that's useful for the optimizer but without generating any additional symbols in the object file.
  * GVN folds loads against stores, using alias analysis.
  * alias analysis is helped by two interprocedural function attributes:
    - noalias return values. This indicates that the pointer returned does not alias any other pointer existing in the program. We will find functions that return these and propagate noalias up the call graph.
    - nocapture arguments. If the pointer doesn't escape the callee, then the function gets a 'nocapture' annotation on it. This is also propagated through the call graph.
    - standard library functions have their noalias and nocapture parts annotated.
    - together, this means that code like:
        FILE *f = fopen(...);
        fwrite(data, size, 1, f);
        fclose(f);
      has a pointer 'f' which can never possibly alias anything else in the program, and that's trivially provable. We also assume that "char *p = (char*)rand();" can never guess the address of 'f'.
  * the pass manager runs bottom-up over the callgraph, running the CallGraphSCCPasses over each strongly connected component, and all the function passes over each function. If we inline, we expose more code to GVN which may make it possible to fold a load against its store.
    - once we do so, we've changed the apparent call graph, while iterating over the call graph. We use this to refine the SCCs in our and re-run the optimization passes again, causing more refinement iteratively, allowing us to devirtualize more.
    - we can only inline functions defined inside a class body because of C++'s ODR rule. The actual linkage type is "linkonce" which would indicate that they are weak and can be discarded if never called, so we added "linkonce_odr" which indicates that the symbol is weak but that any replacement must be equivalent.

What's great about using small steps like this is that each of these pieces is a useful optimization in its own right, even if we don't end up devirtualizing. A devirtualization pass doesn't "smell right" to me because it will have to do lots of analysis regardless of whether it uses it to optimize or throws it away at the end. In LLVM, the optimization passes try hard to be inexpensive when they don't transform the code at all. It's also language specific, while LLVM's existing approach works even with other languages or hand-rolled type hierarchies written with casts and no language feature support.

The piece that's missing is devirtualization when we don't end up inlining. I don't know of a way to break that down into smaller pieces; llvm already has SCCP.cpp which does sparse conditional constant propagation and that includes an interprocedural variant, IPSCCP, but it's not field-sensitive. Propagating the stores from the vptr field to the relevant loads in the functions that aren't inlined would give us the last major missing piece.

Noalias returns, nocapture, SCC refinement, linkonce_odr and available_externally were added with the goal of making devirtualization in LLVM happen, but as orthogonal independent optimizations. I think LLVM should continue with this design. If you want to implement a single substantial optimization pass, I would suggest FSIPSCCP as the largest thing you should write.

Nick

This is a lot of work that is going to be completely foiled by the presence
of almost any opaque call at all.

What's needed here is a language-independent way to exploit
language-specific guarantees like C++ [basic.life]p7, i.e. that certain
provenances of pointer guarantee that certain members have known,
immutable values. There are analogous guarantees basically everywhere;
for example, Java arrays have a length, ADTs in System F have a
discriminator, etc.

I would suggest an intrinsic like
  declare i8* @llvm.bless.known.memory(i8*, …) nounwind readnone
where the … is a sequence of offset/value pairs. The load peephole is
then quite obvious.

An interesting extra case from [basic.life]p7 is that we can also state
that 'const' fields are invariant after the "blessing point", although
(unlike ivars) we can't necessarily assign a fixed value to them right
then.

John.

John McCall wrote:

Noalias returns, nocapture, SCC refinement, linkonce_odr and
available_externally were added with the goal of making devirtualization
in LLVM happen, but as orthogonal independent optimizations. I think
LLVM should continue with this design. If you want to implement a single
substantial optimization pass, I would suggest FSIPSCCP as the largest
thing you should write.

This is a lot of work that is going to be completely foiled by the presence
of almost any opaque call at all.

Yes, but it's still useful.

Also, anything based on knowing the type hierarchy could be foiled by new derivations in other translation units, or that show up with dlopen.

What's needed here is a language-independent way to exploit
language-specific guarantees like C++ [basic.life]p7, i.e. that certain
provenances of pointer guarantee that certain members have known,
immutable values. There are analogous guarantees basically everywhere;
for example, Java arrays have a length, ADTs in System F have a
discriminator, etc.

I would suggest an intrinsic like
   declare i8* @llvm.bless.known.memory(i8*, …) nounwind readnone
where the … is a sequence of offset/value pairs. The load peephole is
then quite obvious.

An interesting extra case from [basic.life]p7 is that we can also state
that 'const' fields are invariant after the "blessing point", although
(unlike ivars) we can't necessarily assign a fixed value to them right
then.

There's two things going on in your proposal, one is taking advantage of the C++ guarantee that the vptr (or const fields) can't change after construction is complete, and the other is noting down what we know the vptr is equal to. I'm separating these because we can more often solve the fact that the vptr didn't change in a function regardless of what its callees do, than we can know what the vptr is on entry to the function.

You don't know what the vptr really is unless you can track from the point where the object was constructed. Without that, the program could add a new derived type in a plugin, then pass it to your function. That's why I'm advocating constant propagation as the fix here, just start at the point of allocation and propagate outwards.

As for immutability of the vptr and const fields, that's what I was aiming for with the invariant intrinsics:
   http://llvm.org/docs/LangRef.html#int_invariant_start
but it's generally considered that the design of these intrinsics isn't quite right. (A secondary goal of those intrinsics is to allow LICM to hoist things out, then in the event of register pressure have the backend reload by loading through the invariant pointer, saving us a spill onto the stack.)

Nick

John McCall wrote:

Noalias returns, nocapture, SCC refinement, linkonce_odr and
available_externally were added with the goal of making devirtualization
in LLVM happen, but as orthogonal independent optimizations. I think
LLVM should continue with this design. If you want to implement a single
substantial optimization pass, I would suggest FSIPSCCP as the largest
thing you should write.

This is a lot of work that is going to be completely foiled by the presence
of almost any opaque call at all.

Yes, but it's still useful.

Sure, there are generally applications for any general optimization you can suggest. I'm just saying that FSIPSCCP is not really a very compelling way to do devirtualization.

Also, anything based on knowing the type hierarchy could be foiled by new derivations in other translation units, or that show up with dlopen.

I am not proposing anything that requires full-program knowledge of class hierarchies. If that's the idea, we are going to actually have to have full-program knowledge somehow, which I don't remember being one of the many, many appositions in FSIPSCCP, either. :slight_smile:

Both of our proposals obviously only work when we can statically see the construction point of an object in some way. However, using a generic memory optimization would require us to be able to see both the actual store to the vtable field and the entire intervening history of that memory to verify that there are no subsequent stores. That analysis is likely prohibitively expensive even where possible, and it will frequently *not* be possible:
  Example #1: I have a constructor which is not defined in this translation unit. You are doomed.
  Example #2: I pass the address of a mutable global variable to a function which performs a virtual call on it. You must prove that literally no code (except possibly a global constructor) can ever store to that vtable.
  Example #3: I construct an object, call a global function foo(), and then do a virtual call on my object. You must either prove that foo() cannot possibly have a handle to the object or hope it's defined in this translation unit.

Language guarantees are *really, really useful*. I understand the desire to improve optimizations that don't require language-specific annotations, but I am not sure it is very practical.

John.

John McCall wrote:

Noalias returns, nocapture, SCC refinement, linkonce_odr and
available_externally were added with the goal of making devirtualization
in LLVM happen, but as orthogonal independent optimizations. I think
LLVM should continue with this design. If you want to implement a single
substantial optimization pass, I would suggest FSIPSCCP as the largest
thing you should write.

This is a lot of work that is going to be completely foiled by the presence
of almost any opaque call at all.

Yes, but it's still useful.

Sure, there are generally applications for any general optimization you can suggest. I'm just saying that FSIPSCCP is not really a very compelling way to do devirtualization.

Also, anything based on knowing the type hierarchy could be foiled by new derivations in other translation units, or that show up with dlopen.

I am not proposing anything that requires full-program knowledge of class hierarchies. If that's the idea, we are going to actually have to have full-program knowledge somehow, which I don't remember being one of the many, many appositions in FSIPSCCP, either. :slight_smile:

Both of our proposals obviously only work when we can statically see the construction point of an object in some way. However, using a generic memory optimization would require us to be able to see both the actual store to the vtable field and the entire intervening history of that memory to verify that there are no subsequent stores. That analysis is likely prohibitively expensive even where possible, and it will frequently *not* be possible:
Example #1: I have a constructor which is not defined in this translation unit. You are doomed.
Example #2: I pass the address of a mutable global variable to a function which performs a virtual call on it. You must prove that literally no code (except possibly a global constructor) can ever store to that vtable.
Example #3: I construct an object, call a global function foo(), and then do a virtual call on my object. You must either prove that foo() cannot possibly have a handle to the object or hope it's defined in this translation unit.

You didn't elaborate on how the llvm.bless markers get there. Is your
idea to put them at the point of allocation and then let llvm
propagate them around to functions that get the pointer? If so, then
it sounds very similar to fsipsccp plus immutable markers.

Language guarantees are *really, really useful*. I understand the desire to improve optimizations that don't require language-specific annotations, but I am not sure it is very practical.

I don't disagree at all. I really do want to optimize based on the
program's behaviour as much as possible, falling back on additional
language-provided guarantees only when necessary. Taking advantage of
the immutable nature of certain fields is something that I agree we
probably want to do.

Nick

I think that this is a great general approach, particularly given that devirt is just a special case of other very common patterns (e.g. default constructors usually initialize all fields). This could be implemented with relatively straight-forward improvements to our AA/memdep infrastructure to allow it to forward values from calls to loads.

The only thing I'm worried about is how expensive the analysis for this would be. This could be handled with some simple summary functions stuff along the lines of global mod/ref (with a threshold to bail out in insane cases).

Once the general infrastructure exists, we could add intrinsics or something else for languages that know stuff to communicate it to the optimizer.

-Chris