r137887: [libclang] Workaround potential race condition with code completion AllocatedResults being freed after a CXTranslationUnit.

Hello,

While working towards fixing the bug in CXString when it reads a one
past end character, I stumbled across r137887.

Specifically, I am worried about this workaround:

137887 kremenek // Normally, clients of CXString shouldn't
care whether or not
137887 kremenek // a CXString is managed by a pool or by
explicitly malloc'ed memory.
137887 kremenek // However, there are cases when
AllocatedResults outlives the
137887 kremenek // CXTranslationUnit. This is a workaround
that failure mode.
137887 kremenek if (cxstring::isManagedByPool(cursorUSR)) {
137887 kremenek CXString heapStr =
137887 kremenek
cxstring::createCXString(clang_getCString(cursorUSR), true);

I want to make two points here:

(1) There are a lot of other strings accessible from
CXCodeCompleteResults, which are allocated on the ASTContext's
allocator of the translation unit. So this workaround is only for
some specific case, it is not general.

(1.5) The issue itself looks not like a race condition (none of the
libclang objects are thread-safe), but a bug in the client code.

(2) Fixing the reading one past end character bug will move more
CXStrings to the CXStringPool -- StringRefs that need to be copied
will get allocated there (to minimize malloc traffic). Of course, it
would be invalid to free them after the corresponding
CXTranslationUnit.

Can we remove this workaround now? Will (2) have a lot of bad effects
because of assumptions in clients?

Dmitri

Hello,

While working towards fixing the bug in CXString when it reads a one
past end character, I stumbled across r137887.

Specifically, I am worried about this workaround:

137887 kremenek // Normally, clients of CXString shouldn't
care whether or not
137887 kremenek // a CXString is managed by a pool or by
explicitly malloc'ed memory.
137887 kremenek // However, there are cases when
AllocatedResults outlives the
137887 kremenek // CXTranslationUnit. This is a workaround
that failure mode.
137887 kremenek if (cxstring::isManagedByPool(cursorUSR)) {
137887 kremenek CXString heapStr =
137887 kremenek
cxstring::createCXString(clang_getCString(cursorUSR), true);

I want to make two points here:

(1) There are a lot of other strings accessible from
CXCodeCompleteResults, which are allocated on the ASTContext's
allocator of the translation unit. So this workaround is only for
some specific case, it is not general.

That's… terrifying. We really want to allocate these in the CXCodeCompleteResults bump pointer allocator.

(1.5) The issue itself looks not like a race condition (none of the
libclang objects are thread-safe), but a bug in the client code.

No, it's actually intentional behavior. We want code completion results to be able to hang around and be valid even after the translation unit has been re-parsed, so that clients can lazily populate their UIs from the code completion results without having to worry about other reparses causing trouble.

(2) Fixing the reading one past end character bug will move more
CXStrings to the CXStringPool -- StringRefs that need to be copied
will get allocated there (to minimize malloc traffic). Of course, it
would be invalid to free them after the corresponding
CXTranslationUnit.

I think CXCodeCompleteResults is going to need a CXStringPool itself for these strings.

  - Doug

Hello,

While working towards fixing the bug in CXString when it reads a one
past end character, I stumbled across r137887.

Specifically, I am worried about this workaround:

137887 kremenek // Normally, clients of CXString shouldn't
care whether or not
137887 kremenek // a CXString is managed by a pool or by
explicitly malloc'ed memory.
137887 kremenek // However, there are cases when
AllocatedResults outlives the
137887 kremenek // CXTranslationUnit. This is a workaround
that failure mode.
137887 kremenek if (cxstring::isManagedByPool(cursorUSR)) {
137887 kremenek CXString heapStr =
137887 kremenek
cxstring::createCXString(clang_getCString(cursorUSR), true);

I want to make two points here:

(1) There are a lot of other strings accessible from
CXCodeCompleteResults, which are allocated on the ASTContext's
allocator of the translation unit. So this workaround is only for
some specific case, it is not general.

That's… terrifying. We really want to allocate these in the CXCodeCompleteResults bump pointer allocator.

Right, we have an allocator there and it is indeed used in
SemaCodeComplete.cpp. Sorry for the false alarm.

(1.5) The issue itself looks not like a race condition (none of the
libclang objects are thread-safe), but a bug in the client code.

No, it's actually intentional behavior. We want code completion results to be able to hang around and be valid even after the translation unit has been re-parsed, so that clients can lazily populate their UIs from the code completion results without having to worry about other reparses causing trouble.

OK, so this is a feature.

(2) Fixing the reading one past end character bug will move more
CXStrings to the CXStringPool -- StringRefs that need to be copied
will get allocated there (to minimize malloc traffic). Of course, it
would be invalid to free them after the corresponding
CXTranslationUnit.

I think CXCodeCompleteResults is going to need a CXStringPool itself for these strings.

This is going to be a quite transient string pool, which will not save
a lot of memory allocations. I will look into possible designs

Dmitri