[PATCH] libclang: report error code for bad PCH files

Hello,

The attached patch improves libclang to report the error condition
when CXTranslationUnit can not be created because of a stale PCH file.
This allows the caller, for example, to rebuild the PCH file and
retry the request.

There are APIs like clang_parseTranslationUnit(), which don't support
reporting detailed errors (the only error condition is a NULL result).
Because of this, in order to add this feature without breaking binary
compatibility, I propose to add a new API,
clang_TranslationUnit_hasFailure(), which checks if the TU is usable.
Then other APIs will return a non-NULL TU that is marked as erroneous,
and passing the erroneous TU to any API is a no-op.

My only concern is about clang_indexSourceFile function. On error, it
used to return NULL in out_TU. We might have clients that assume this
and don't dispose the returned TU if error is signaled. Now the
function can return an erroneous TU that should be disposed, otherwise
it is leaked.

As an alternative, in order not to introduce a possible memory leak in
clang_indexSourceFile, we could do the following. Instead of
allocating a block of memory to return as an erroneous TU, we could
use a magic pointer value, or an address of a static constant
"TheErroneousTU".

Argyrios, which way do you prefer?

Dmitri

libclang-report-error-code-for-bad-pch-v1.patch (16.3 KB)

Hello,

The attached patch improves libclang to report the error condition
when CXTranslationUnit can not be created because of a stale PCH file.
This allows the caller, for example, to rebuild the PCH file and
retry the request.

There are APIs like clang_parseTranslationUnit(), which don't support
reporting detailed errors (the only error condition is a NULL result).
Because of this, in order to add this feature without breaking binary
compatibility, I propose to add a new API,
clang_TranslationUnit_hasFailure(), which checks if the TU is usable.
Then other APIs will return a non-NULL TU that is marked as erroneous,
and passing the erroneous TU to any API is a no-op.

My only concern is about clang_indexSourceFile function. On error, it
used to return NULL in out_TU. We might have clients that assume this
and don't dispose the returned TU if error is signaled. Now the
function can return an erroneous TU that should be disposed, otherwise
it is leaked.

As an alternative, in order not to introduce a possible memory leak in
clang_indexSourceFile, we could do the following. Instead of
allocating a block of memory to return as an erroneous TU, we could
use a magic pointer value, or an address of a static constant
"TheErroneousTU".

Argyrios, which way do you prefer?

This seems more complicated than it’s worth, particularly since we do add a new API that the caller must be aware of, anyway.

How about for the functions that don’t support detailed error codes we just introduce a new variant that does support error codes ?
Clients interested in the error codes can move to calling the new function (whether we should deprecate the existing one or not is a different question).

This does seem cleaner. With this approach, we will have to introduce
three new APIs, mirroring the following:

clang_createTranslationUnitFromSourceFile
clang_createTranslationUnit
clang_parseTranslationUnit

While these APIs already return error codes:

clang_reparseTranslationUnit
clang_indexSourceFile

Compared to this, the approach that I proposed has at least two advantages:
- only a single new API;
- allows us to report more detailed errors -- for example, which PCH
file specifically was bad.

OTOH, there is a huge disadvantage of my approach: the erroneous TU is
ignored by all APIs, and the client that does not check for the error
with the new API will just see that TU as completely empty. I think
this is bad enough to justify addition of three new APIs. I will
update the patch.

Dmitri

I have implemented this approach in the attached patch. The new APIs
are just suffixed with "2", for example,
clang_createTranslationUnit2(), because of a lack of imagination and
to clearly show that the two parallel APIs are essentially the same.

I did not deprecate the old APIs, we can just treat them as
"convenience" wrappers.

I was thinking about changing the return type of
clang_reparseTranslationUnit and clang_indexSourceFile to 'enum
CXErrorCode', but I am not 100% that it is an ABI-compatible change on
all platforms (it is compatible on x86-64, I think), so I did not go
this way.

Please review.

Dmitri

This seems more complicated than it’s worth, particularly since we do add a new API that the caller must be aware of, anyway.

How about for the functions that don’t support detailed error codes we just introduce a new variant that does support error codes ?
Clients interested in the error codes can move to calling the new function (whether we should deprecate the existing one or not is a different question).

This does seem cleaner. With this approach, we will have to introduce
three new APIs, mirroring the following:

clang_createTranslationUnitFromSourceFile
clang_createTranslationUnit
clang_parseTranslationUnit

clang_createTranslationUnitFromSourceFile is just “clang_parseTranslationUnit minus options”, we don’t need a variant for that.

While these APIs already return error codes:

clang_reparseTranslationUnit
clang_indexSourceFile

Are you considering clang_indexTranslationUnit ?

I have implemented this approach in the attached patch.

Did you forget to attach it ?

The new APIs
are just suffixed with “2”, for example,
clang_createTranslationUnit2(), because of a lack of imagination and
to clearly show that the two parallel APIs are essentially the same.

Um, “clang_createTranslationUnit_Ext” ? I don’t have a strong preference, it’s your call.

I did not deprecate the old APIs, we can just treat them as
“convenience” wrappers.

I was thinking about changing the return type of
clang_reparseTranslationUnit and clang_indexSourceFile to ‘enum
CXErrorCode’, but I am not 100% that it is an ABI-compatible change on
all platforms (it is compatible on x86-64, I think), so I did not go
this way.

That’s fine.

Please review.

I need a patch! :slight_smile:

This seems more complicated than it’s worth, particularly since we do add a
new API that the caller must be aware of, anyway.

How about for the functions that don’t support detailed error codes we just
introduce a new variant that does support error codes ?
Clients interested in the error codes can move to calling the new function
(whether we should deprecate the existing one or not is a different
question).

This does seem cleaner. With this approach, we will have to introduce
three new APIs, mirroring the following:

clang_createTranslationUnitFromSourceFile
clang_createTranslationUnit
clang_parseTranslationUnit

clang_createTranslationUnitFromSourceFile is just
"clang_parseTranslationUnit minus options", we don't need a variant for
that.

OK, removing it.

While these APIs already return error codes:

clang_reparseTranslationUnit
clang_indexSourceFile

Are you considering clang_indexTranslationUnit ?

I think it can not fail because of AST deserialization. The TU should
already exist before calling this function.

I have implemented this approach in the attached patch.

Did you forget to attach it ?

The new APIs
are just suffixed with "2", for example,
clang_createTranslationUnit2(), because of a lack of imagination and
to clearly show that the two parallel APIs are essentially the same.

Um, "clang_createTranslationUnit_Ext" ? I don't have a strong preference,
it's your call.

I did not deprecate the old APIs, we can just treat them as
"convenience" wrappers.

I was thinking about changing the return type of
clang_reparseTranslationUnit and clang_indexSourceFile to 'enum
CXErrorCode', but I am not 100% that it is an ABI-compatible change on
all platforms (it is compatible on x86-64, I think), so I did not go
this way.

That's fine.

Please review.

I need a patch! :slight_smile:

Sorry, attaching an updated patch now.

Dmitri

libclang-report-error-code-for-bad-pch-v2.patch (40 KB)

This seems more complicated than it’s worth, particularly since we do add a
new API that the caller must be aware of, anyway.

How about for the functions that don’t support detailed error codes we just
introduce a new variant that does support error codes ?
Clients interested in the error codes can move to calling the new function
(whether we should deprecate the existing one or not is a different
question).

This does seem cleaner. With this approach, we will have to introduce
three new APIs, mirroring the following:

clang_createTranslationUnitFromSourceFile
clang_createTranslationUnit
clang_parseTranslationUnit

clang_createTranslationUnitFromSourceFile is just
“clang_parseTranslationUnit minus options”, we don’t need a variant for
that.

OK, removing it.

While these APIs already return error codes:

clang_reparseTranslationUnit
clang_indexSourceFile

Are you considering clang_indexTranslationUnit ?

I think it can not fail because of AST deserialization. The TU should
already exist before calling this function.

Ah, right.

I have implemented this approach in the attached patch.

Did you forget to attach it ?

The new APIs
are just suffixed with “2”, for example,
clang_createTranslationUnit2(), because of a lack of imagination and
to clearly show that the two parallel APIs are essentially the same.

Um, “clang_createTranslationUnit_Ext” ? I don’t have a strong preference,
it’s your call.

I did not deprecate the old APIs, we can just treat them as
“convenience” wrappers.

I was thinking about changing the return type of
clang_reparseTranslationUnit and clang_indexSourceFile to ‘enum
CXErrorCode’, but I am not 100% that it is an ABI-compatible change on
all platforms (it is compatible on x86-64, I think), so I did not go
this way.

That’s fine.

Please review.

I need a patch! :slight_smile:

Sorry, attaching an updated patch now.

LGTM!

Thank you for the review, committed r201249.

Dmitri