[RFC] Improving import failure checking strategy inside ASTImporter

Hello,

I'd like to discuss the idea of changing the way how we check for import issues in ASTImporter and ASTNodeImporter.

1. Introduction

ASTImporter is now extensively used in Cross-TU analysis and is in active development. However, we find (and introduce) many errors related to how import failures are handled now.

Currently, ASTImporter::Import() and ASTNodeImporter::Visit*() methods return nullptr for pointers and default value in case of conflict. And this approach brings many issues due to the following reasons.

1. We have to check the return result and take actions explicitly.

The common strategy of failure handling is very simple - just return. While it is very simple, almost all the code of ASTImporter consists of such explicit checks making it much harder to read.

2. Both nullptr and default value are valid by itself so we have to place explicit checks to check if a failure happened. Example:

Decl *To = Importer.Import(From);
if (From && !To)
return nullptr;

This doesn't look well and it's easy to forget where From can be nullptr and where it is impossible.

3. Error checking is not enforced.

We have to check for import errors every time Import() is called. And it is extremely easy to forget placing a check after a call. This error pattern is so common in ASTImporter that I was going to write a CSA check for this.

2. Possible solutions.

The problem with enforcing checks is the most annoying one but also looks to be the simplest - we can just replace the return value with Optional or ErrorOr so clients will be enforced to perform a failure checking. This replacement can include two changes.

1. A number of large commits that change the return type of ASTNodeImporter::Visit*() methods. And I'd like to hear the comments from existing users that have their own improvements of ASTImporter because they have to change their code not touched by the commit as well - there be some pain.

2. Introducing an alternative method in ASTImporter that returns ErrorOr. So, we can migrate to the new, say, ImportOrError(), by small patches.

This replacement also solves the problem (2) because value and error are now separated.

The problem of improving readability of these checks is a bit harder. The most straightforward approach is to create a macro like:

#define IMPORT_PTR_INTO_VAR(VarName, Type, From) \
Type *VarName = nullptr; \
{ \
auto MacroImportRes = Importer.Import(From); \
if (!MacroImportRes) \
return MacroImportRes.getError(); \
VarName = *MacroImportRes; \
}

Same for non-pointer type. So, the final code will look like this:

ErrorOr<Expr *> ASTNodeImporter::VisitCXXMemberCallExpr(CXXMemberCallExpr *E) {
IMPORT_QUAL_TYPE_INTO_VAR(T, E->getType())
IMPORT_PTR_INTO_VAR(ToFn, Expr, E->getCallee())

SmallVector<Expr *, 4> ToArgs(E->getNumArgs());
IMPORT_CONTAINER(E->arguments(), ToArgs);

return new (Importer.getToContext()) CXXMemberCallExpr(
Importer.getToContext(), ToFn, ToArgs, T, E->getValueKind(),
Importer.Import(E->getRParenLoc()));
}

But macros don't look clean. Unfortunately, I still haven't found a better option so any suggestions are very welcome.

PS. The error handling pattern we wish to get is very close to exception handling in C++ because usually we don't want to perform local error handling but centralized instead. With a possibility to just throw an exception and catch it in the ASTImporter entry point (returning an ErrorOr), error handling in ASTImporter becomes almost trivial and pretty clean. But we don't use exceptions in LLVM and clang and I don't know a similar functionality in LLVM utilities.

Hi Aleksei,

It is a great and welcoming initiative to identify the weaknesses in
the ASTImporter.

About, handling the error case after a call:
I couldn't agree more that this is a burden to always have a check at
the call site, but this is the price we pay for not having exceptions.
By using macros I have a feeling that the code would be even less
readable because we would hide control flow modifications inside the
macro, also because finally we'd need to create so many different
kinds of macros.
As far as I know, there are cases where an `Import` can return a
`nullptr` or a default even in case of non-error situations, please
correct me if I am wrong. Thus, introducing an optional like return
type would require to discover all these cases. Still, I think this
would worth it, because the enforcement of error checking (as you
mentioned), plus it would make it more explicit which situation is
really an error.
Actually, I think this should be done with a few large commits, e.g.
once we change the return type of `ASTImporter::Import(Decl*)` to
`Optional<Decl*>` then we have to change all call sites.
By introducing an `ImportOrError` and then doing the changes gradually
would result the same final state as if we had done it in one larger
step where we changed all call sites, but until all call sites are
changed we have to maintain both `Import` and `ImportOrError`.

Beside the problem of handling the error case after an `Import` call
we have observed several other weaknesses, and would like to hear your
opinion about them as well.
So, the list of problems so far:

- Do the same things after and before a new AST node is created (::Create)
The original import logic should be really simple, we create the node,
then we mark that as imported, then we recursively import the parts
for that node and then set them on that node.
However, the AST is actually a graph, so we have to handle circles.
If we mark something as imported (Imported()) then we return with the
corresponding To whenever we want to import that node again, this way
circles are handled.
In order to make this algorithm work we must ensure things, which are
not enforced in the current ASTImporter:
* There are no Import() calls in between any node creation (::Create)
and the Imported() call.
* There are no VisitXXX() calls in any other VisitYYY() function, only
call of Import() allowed.
* Before actually creating an AST node (::Create), we must check if
the Node had been imported already, if yes then return with that one.
One very important case for this is connected to templates: we may
start an import both from the templated decl of a template and from
the template itself.
There are good news, the first and third points can be enforced by
providing a variadic forwarding template for the creation, we are
currently preparing to upstream this change from our fork
(https://github.com/Ericsson/clang/pull/354/files) but this is going
to be a huge change.
Still, I have no idea how to enforce the second point. (Maybe a checker?)

- Decl chains are not imported
Currently we import declarations only if there is no available definition.
If in the "From" TU there are both a declaration and a definition we
import only the definition.
Thus we do not import all information from the original AST.
One example:
    struct B { virtual void f(); };
    void B::f() {}
Here, when we import the definition, then the virtual flag is not
getting imported, so finally we import a function as a non-virtual
function.
An other problem is with recursive functions:
    void f(); void f() { f(); }
When we import the prototype then we end up having two identical
definitions for f(), and no prototype.
We again preparing a patch for this, but this solves the problem only
in case of functions.
By not importing the whole decl chain we loose all attributes which
may be present only in declarations.

- Structural Equivalency
There are several holes here. E.g. we do not check the member
functions in a class for equivalency, only members.
Thus, we may end up stating that a forward decl is equivalent with a
class definition if the class does not have data members.
There are no tests at all for structural equivalency.
In our fork we have some tests, and planning to upstream them too.

- Minimal import
This feature is used by LLDB, but not at all in CTU. This feature
makes the code bigger and more complex.
There are 0 unit tests for minimal import, whatever information we
have about this we can get only from the lldb repo.

- Lookup
The way we lookup names during an import of a decl is getting really
complex, especially with respect to friends.
We find an already imported function from an unnamed namespace, thus
we simply skip the import of a function in an other TU if that has the
same name and that is also in an unnamed namespace
(https://github.com/Ericsson/clang/issues/335).
We still use `localUncachedLookup` which can be very inefficient, we
should be using `noload_lookup`.

Thanks,
Gabor

Yes, this is what I described as problem (2). That sounds pretty good, and this is what I was thinking about some time ago. You mean something like CreateDeserialized? The problem here are Types: their creation requires all dependencies to be imported before the Type is created. I don’t know how to deal with it. If I remember correctly, this pattern is not widely used and such code can be refactored easily. Yes, I have met this problem before. I tried to import all redeclaration chain but haven’t completed the work. One of problems here is how to stick the redeclaration chain to an existing declaration. I think we need to discuss this with LLDB folks (lldb-dev?). Unfortunately, Sean Callanan (who was an active reviewer and committer) left the project and Jim Ingham never participated in ASTImporter reviews. What is even worse is that the lookup (which is pretty harmless itself) requires import of DeclContexts - possibly, including the declaration we are going to search for (and that makes its logic much more complicated). Never noticed this problem before, thank you. I’m also not sure if import of TU-local data structures with same names is performed correctly now. This can be true. Do you have any numbers? For me, import correctness has greater priority now than speed optimizations, but this can be useful for future improvements.

Adding lldb-dev to the thread, since LLDB is a primary user of ASTImporter.

Hi Aleksei,

The problem of improving readability of these checks is a bit harder. The
most straightforward approach is to create a macro like:

#define IMPORT_PTR_INTO_VAR(VarName, Type, From) \
  Type *VarName = nullptr;                                 \
  {                                                        \
    auto MacroImportRes = Importer.Import(From);           \
    if (!MacroImportRes)                                   \
      return MacroImportRes.getError();                    \
    VarName = *MacroImportRes;                             \
  }

Same for non-pointer type. So, the final code will look like this:

ErrorOr<Expr *> ASTNodeImporter::VisitCXXMemberCallExpr(CXXMemberCallExpr
*E) {
  IMPORT_QUAL_TYPE_INTO_VAR(T, E->getType())
  IMPORT_PTR_INTO_VAR(ToFn, Expr, E->getCallee())

  SmallVector<Expr *, 4> ToArgs(E->getNumArgs());
  IMPORT_CONTAINER(E->arguments(), ToArgs);

  return new (Importer.getToContext()) CXXMemberCallExpr(
        Importer.getToContext(), ToFn, ToArgs, T, E->getValueKind(),
        Importer.Import(E->getRParenLoc()));
}

But macros don't look clean. Unfortunately, I still haven't found a better
option so any suggestions are very welcome.

Macros are indeed ugly, and exceptions are even worse.
I think a better solution would be to return an Optional, and make Importer.Import accept an out-parameter Error, passed by reference.
Then you could write:

Optional<Expr*> VisitCXXMemberCallExpr(CXXMemberCallExpr *E) {
Error e;
if (auto T = Importer.import(E->getType(), e))
if (auto ToFn = Importer.import(E->getCallee(), e))
if (auto ToArgs = Importer.import(E->arguments(), e)
return new (Importer.getToContext()) CXXMemberCallExpr(…)
outE = e;
return none;
}

That’s even more concise than your example, explicit and does not need macros.

This would be less verbose in C++17 where you would not need nested if’s,
but sadly we don’t have that in LLVM yet.

George

Hi George,

You have wrote exactly the first idea I have described in the initial proposal:
> The problem with enforcing checks is the most annoying one but also looks to be the simplest - we can just replace the return value with Optional or ErrorOr so clients will be enforced to perform a failure checking.

My question was about avoiding related boilerplate with nested if's (or lots of early returns). Optionals are excellent if one needs to do in-place checks. But if we have >50 nested Import() calls (which is a pretty common case for recursive ASTImporter), and we want to check only the top-level result, their usage becomes questionable. However, it still looks to be the best option we have. Adding an argument passed by reference in addition to Optional doesn't look good to me as well; moreover, we have Expected/ErrorOr for these usage scenarios.

Anyway, maybe even my question itself is not correct. The question of error handling doesn't exist on its own, it is the result of ASTImporter design. Maybe a potential solution is to split the lookup and structural equivalence checking (to a kind of dry run) from node import: in this case, Import() is called only after all checks are done so it cannot fail. But this will need ASTImporter redesign, and it is definitely not a thing I'm going to work on soon.

01.05.2018 23:43, George Karpenkov via cfe-dev пишет:

Right, I see, sorry for misunderstanding.
The advantage of using “error” passed by reference is that it could be reused across multiple .import calls;
having 50+ nested import calls could be mitigated by wrapping related groups in functions,
that still seems to me a lesser evil than macros.

Some import helpers can make our life a bit easier. Some examples below:

 template &lt;typename DeclTy&gt;
 LLVM\_NODISCARD bool importInto\(DeclTy \*&amp;ToD, DeclTy \*FromD\) \{
   if \(auto Res = Importer\.Import\(FromD\)\) \{
     ToD = cast\_or\_null&lt;DeclTy&gt;\(\*Res\);
     return false;
   \}
   return true;
 \}

 template &lt;typename DeclTy&gt;
 LLVM\_NODISCARD bool importIntoNonNull\(DeclTy \*&amp;ToD, DeclTy \*FromD\) \{
   if \(auto Res = Importer\.Import\(FromD\)\) \{
     ToD = cast&lt;DeclTy&gt;\(\*Res\);
     return false;
   \}
   return true;
 \}

 template &lt;typename DeclTy&gt; Optional&lt;DeclTy \*&gt; importCasted\(DeclTy \*FromD\) \{
   if \(auto Res = Importer\.Import\(FromD\)\)
     return cast\_or\_null&lt;DeclTy&gt;\(\*Res\);
   return None;
 \}

 template &lt;typename DeclTy&gt;
 Optional&lt;DeclTy \*&gt; importCastedNonNull\(DeclTy \*FromD\) \{
   if \(auto Res = Importer\.Import\(FromD\)\)
     return cast&lt;DeclTy&gt;\(\*Res\);
   return None;
 \}

They can be used like this:

QualType ASTNodeImporter::VisitTypedefType(const TypedefType *T) {
if (auto ToDecl = importCastedNonNull(T->getDecl()))
return Importer.getToContext().getTypeDeclType(*ToDecl);
return {};
}

...

if (importInto(ToEPI.ExceptionSpec.SourceDecl,
FromEPI.ExceptionSpec.SourceDecl))
return {};

if (importInto(ToEPI.ExceptionSpec.SourceTemplate,
FromEPI.ExceptionSpec.SourceTemplate))
return {};

...

QualType ASTNodeImporter::VisitUnresolvedUsingType(
const UnresolvedUsingType *T) {
UnresolvedUsingTypenameDecl *ToD, *ToPrevD, *FromD = T->getDecl();
if (importInto(ToD, FromD) || importInto(ToPrevD, FromD->getPreviousDecl()))
return {};

return Importer.getToContext().getTypeDeclType(ToD, ToPrevD);
}

Unfortunately, LLVM_NODISCARD is only enabled for C++14 which is not our case.

02.05.2018 01:28, George Karpenkov пишет:

Sorry, LLVM_DISCARD is fine, I was just mislead.

Hi Aleksei,

I think it is a very good idea to exploit [[nodiscard]] .
On the other hand I find a bit misleading some of the names of these
functions. `importCastedNonNull` does a simple `cast<>` inside, thus
perhaps a name `importAndCast` would better explain what happens.
Similarly, instead of `importCasted`, `importAndCastOrNull`
(import_and_cast_or_null ?) sounds better for me, what do you think?

Gabor

An other idea, couldn't we just pass the cast operation as a template
template parameter?

I don't have any naming preferences here (it was just to show some examples), but your options are indeed more clear. Or, maybe, we should just say instead that our Import should perform this cast by default and name these functions `import()` and `importOrNull()`?

18.05.2018 13:08, Gábor Márton via cfe-dev пишет:

Sorry, I didn't got the idea. Could please you explain a bit more?

18.05.2018 13:09, Gábor Márton via cfe-dev пишет:

Of course, I think there is no need to have different functions which differ only in the cast operation. We can use a template template parameter for the cast operation:

template <template class CastOp, typename DeclTy>
LLVM_NODISCARD bool import(DeclTy *&ToD, DeclTy *FromD) {
if (auto Res = Importer.Import(FromD)) {
ToD = CastOp(*Res);
return false;
}
return true;
}

And then the usage:
if (import<cast_or_null>(ToEPI.ExceptionSpec.SourceDecl,
FromEPI.ExceptionSpec.SourceDecl))
return {};

Gabor

Hi Gabor,

Passing cast operation as a template argument looks overcomplicated to me. However, it can be used as a base for other operations, so I think it is worth it to include this as an overload. It can also be useful if an operation other than cast is required, but, unfortunately, I cannot remember any case where we need such behaviour.
The problem is that I cannot compile your sample. I tried to write this:

 /// Helper for importing Decls as Optionals\.
 template &lt;template &lt;typename&gt; class CastOp, typename DeclTy&gt;
 Optional&lt;DeclTy \*&gt; import\(DeclTy \*FromD\) \{
   if \(auto Res = Importer\.Import\(FromD\)\)
     return CastOp&lt;DeclTy&gt;\(\*Res\);
   return None;
 \}

 template &lt;typename DeclTy&gt;
 Optional&lt;DeclTy \*&gt; importNullable\(DeclTy \*FromD\) \{
   return import&lt;cast\_or\_null&gt;\(FromD\);
 \}

 template &lt;typename DeclTy&gt; Optional&lt;DeclTy \*&gt; importNonNull\(DeclTy \*FromD\) \{
   return import&lt;cast&gt;\(FromD\);
 \}

But there are two issues. First, cast<> has two template arguments, not one. Second, template instantiation cannot guess what overload of cast should be used in each case. I'm sorry, but after a few attempts of making this code to compile, I have to ask you to elaborate a bit more once again.

Regarding naming: what do you think about importNonNull[Into]() and importNullable[Into]()? So, we assume that import should return the same type as the argument (which, I think, is somewhat expected). Or should we reflect the cast it in the function name somehow?

Currently, the list of helpers look like this:

 template &lt;typename DeclTy&gt;
 LLVM\_NODISCARD bool importNullableInto\(DeclTy \*&amp;ToD, Decl \*FromD\);

 template &lt;typename DeclTy&gt;
 LLVM\_NODISCARD bool importNonNullInto\(DeclTy \*&amp;ToD, Decl \*FromD\);

 template &lt;typename DeclTy&gt;
 Optional&lt;DeclTy \*&gt; importNullable\(DeclTy \*FromD\);

 template &lt;typename DeclTy&gt; Optional&lt;DeclTy \*&gt; importNonNull\(DeclTy \*FromD\);

There is also another thing I am curious about: do we need to use dyn_casts on the return type? I'm not sure we can return a node with type so different from the imported node's type. Currently, there are some dyn_cast[_or_null]s in our code, but, possibly, they can be replaced with just casts, because we usually return nullptr or a pointer of same type as the imported one.

18.05.2018 22:47, Gábor Márton пишет:

But there are two issues. First, cast<> has two template arguments, not one. Second, template instantiation cannot guess what overload of cast should be used in each case. I’m sorry, but after a few attempts of making this code to compile, I have to ask you to elaborate a bit more once again.

You are right, we can’t just pass a template function as a template parameter. However, we could use a lambda:
template <typename CastOpTy, typename DeclTy>
Optional<DeclTy *> import(CastOpTy CastOp, DeclTy *&ToD, DeclTy *FromD) {
if (auto Res = Importer.Import(FromD)) {
return CastOp(Res);
}
return None;
}
template
Optional<DeclTy *> importNullable(DeclTy *&ToD, DeclTy FromD) {
return import([](Decl
R){ return cast_or_null(R); }, ToD, FromD);
}

This approach still has disadvantages: we have to use a concrete type in the lambda parameter (Decl) because polymorphic lambdas are not supported in C++11.
So I am not sure if this is really worth it to avoid just some very few lines of repetition.

Regarding naming: what do you think about importNonNullInto and importNullableInto?
Sounds good to me.

There is also another thing I am curious about: do we need to use dyn_casts on the return type? I’m not sure we can return a node with type so different from the imported node’s type. Currently, there are some dyn_cast[_or_null]s in our code, but, possibly, they can be replaced with just casts, because we usually return nullptr or a pointer of same type as the imported one.
I agree. Immediately after an import I think it is perfectly legitimate to assume that the imported node has the very same type.

Gabor

Hi Aleksei,

I’ve created the patches for the two major issues mentioned in the previous mail. They are not small, but I think they are within a reasonable size (less than 1000 lines of change).
Sorry about the storm of patches lately, but only now we had the time window to opensource the changes from our fork.
We still have a few changes, but they are rather smaller ones, Balazs will probably create a few more patches next week (you can follow our plan for the upstreaming here: https://github.com/Ericsson/clang/projects/2 )
I won’t be available for the next two weeks but I’ll try to check emails occasionally and will try to answer your questions and concerns.

Cheers,
Gabor