Really nasty use-after-free problems surrounding StoredDeclsList

Greetings folks,

Manuel pointed me at a use-after-free issue that is really nasty. We’ve got a lot of these while using preambles in libclang for code completion. The crux of the problem seems to stem from the following:

  • DeclContext::lookup(…) returns a MutableArrayRef<NamedDecl*> which points into a particular StoredDeclsList looked up in a StoredDeclsMap.

  • In some cases, the StoredDeclsList only has one element, and it stores it internally.

  • In other cases, the StoredDeclsList stores the elements in a small vector which can grow and be re-allocated…

  • Thus, things which add decls to an AST have a very real chance of invalidating the pointer stored in the MutableArrayRef returned by DeclContext::lookup.

=/

My question is how best to fix this long term? I can go in and add targeted copying of the NamedDecl*s in the lookup result when there are clearly operations that might shift the AST… I’ve started looking for these and may commit a few strategic fixes that are causing crashes for us, but it seems like a losing proposition because the results of a lookup are sometimes rather long lived – we store them in CXXBasePaths for example.

Is there a better long-term solution? Other ideas or suggestions? Have I messed up my analysis somewhere?

Greetings folks,

Manuel pointed me at a use-after-free issue that is really nasty. We've
got a lot of these while using preambles in libclang for code completion.
The crux of the problem seems to stem from the following:

- DeclContext::lookup(...) returns a MutableArrayRef<NamedDecl*> which
points into a particular StoredDeclsList looked up in a StoredDeclsMap.

- In some cases, the StoredDeclsList only has one element, and it stores
it internally.

- In other cases, the StoredDeclsList stores the elements in a small
vector which can grow and be re-allocated...

- Thus, things which add decls to an AST have a very real chance of
invalidating the pointer stored in the MutableArrayRef returned by
DeclContext::lookup.

=/

My question is how best to fix this long term? I can go in and add
targeted copying of the NamedDecl*s in the lookup result when there are
clearly operations that might shift the AST... I've started looking for
these and may commit a few strategic fixes that are causing crashes for us,

As with any iterator invalidation problem, the burden is on the caller to
not hold an iterator across an operation that might mutate the container,
so that doesn't sound completely terrible. Perhaps we should add a debug
mode feature to force the StoredDeclsList to re(heap-)allocate any time
it's touched, so that ASan can reliably detect these?

It seems to me that the risk is with cases where Sema implicitly adds
declarations (special members or maybe builtins) to a context as part of
some innocent-seeming operation -- otherwise, it should be pretty obvious
that you're creating a declaration while in the middle of iterating over a
lookup result.

I've gone through the callers of DeclContext::lookup, and most of them look
fine. The exceptions are:

  * SemaLookup's LookupSpecialMembers is incorrect, because
AddTemplateOverloadCandidate performs template argument deduction and thus
can trigger the declaration of special members.
  * SemaDecl's FindOverriddenMethod looks suspicious, because it's not
obvious that IsOverload can't trigger an implicit declaration of a special
member
  * SemaDeclCXX's FindHiddenVirtualMethod looks suspicious for the same
reason
  * The CXXBasePath issue you note below

but it seems like a losing proposition because the results of a lookup are

sometimes rather long lived -- we store them in CXXBasePaths for example.

Yeah, that looks bad... In mitigation, we don't store CXXBasePaths
long-term, and the lookup result stored there genuinely shouldn't change
once we've built the object, so this is probably not a problem today. But I
don't see any issue with making a copy there.

Is there a better long-term solution? Other ideas or suggestions? Have I

messed up my analysis somewhere?

I think this is just a few point bugs rather than a widespread systemic
failure, but it's certainly suboptimal. For the most part, we're immune to
problems here because most uses of lookup build a LookupResult rather than
navigating the DeclContext's declarations directly.