Further AST spelunking

Hi again,

I'm still plodding away at my SWIG alternative. As I said in my previous email, typedefs that are declared within templates but which do not depend on any of the template arguments are held in the AST as children of a CXXRecordDecl ignorant of the fact that it should be a template and what its arguments should be. This is a problem for me because it means that I can't print a properly scoped name to identify the type for c++ code. I have managed to work around this by adding a check for whether a typedef is a child of a CXXRecordDecl but should really be a child of a template. The code I'm using appears to work, but I'd like to double-check that it is appropriate:

bool shouldBeTemplate(clang::DeclContext const* context) {
   if(context->getDeclKind() == clang::Decl::CXXRecord) {
     auto record = llvm::cast<clang::CXXRecordDecl>(context);
     auto dname = record->getDeclName();
     // there will always be a parent
     auto cursor = context->getParent()->lookup(dname);
     for(auto named : cursor) {
       if(llvm::isa<clang::ClassTemplateDecl>(named)) {
         return true;
       }
     }
   }
   return false;
}

Another issue is that, while it is safe to handle values that are pointers to incomplete structure types, it is illegal to call a function that takes an argument value or returns a result value that is an incomplete type. My first attempt was to check for RecordTypes and call CXXRecordDecl::getDefinition(), but for some (but confusingly to me, not all) template types getDefinition() returns NULL even though the type isn't incomplete, is this expected? The code that I'm using is below, I also wonder whether getCanonicalDecl() is overkill here:

bool isIncompleteType(clang::QualType inType) {
   if(inType->isRecordType()) {
     auto realType = inType->getAs<clang::RecordType>();`
     auto decl = realType->getDecl()->getCanonicalDecl();
     if(decl->getDefinition() ||
llvm::isa<clang::ClassTemplateSpecializationDecl>(decl)) {
       return false;
     }
     else {
       return true;
     }
   }
   return false;
}

I've also discovered that parsing code that calls a builtin function causes a no-argument, returns-int declaration to be inserted. It's been a while, but as I remember, in C this kind of declaration actually means that the function takes an unspecified number of arguments, but each one passed should be promoted to the size of an int (did they update this since pointers became much larger than ints?). In C++ it means something rather different. It seems a bit odd to me that builtin functions don't have the correct declaration inserted, since the compiler must have them on hand somewhere.

Finally, I need to generate the list of base classes that a derived class can be implicitly converted to. I have an intuitive understanding of when it should work, but when I try to think of an algorithm to sort out arbitrarily-evil trees combining virtual and normal inheritance I go a bit cross-eyed. Can anyone point me at something that will help me figure this out?

Hi again,

I'm still plodding away at my SWIG alternative. As I said in my previous
email, typedefs that are declared within templates but which do not depend
on any of the template arguments are held in the AST as children of a
CXXRecordDecl ignorant of the fact that it should be a template and what
its arguments should be. This is a problem for me because it means that I
can't print a properly scoped name to identify the type for c++ code. I
have managed to work around this by adding a check for whether a typedef is
a child of a CXXRecordDecl but should really be a child of a template. The
code I'm using appears to work, but I'd like to double-check that it is
appropriate:

bool shouldBeTemplate(clang::DeclContext const* context) {
  if(context->getDeclKind() == clang::Decl::CXXRecord) {
    auto record = llvm::cast<clang::CXXRecordDecl>(context);
    auto dname = record->getDeclName();
    // there will always be a parent
    auto cursor = context->getParent()->lookup(dname);
    for(auto named : cursor) {
      if(llvm::isa<clang::ClassTemplateDecl>(named)) {
        return true;
      }
    }
  }
  return false;
}

I think what you are trying to compute is equivalent to
context->isDependentContext(), which returns true if this context is nested
inside any template, class or function.

Another issue is that, while it is safe to handle values that are pointers
to incomplete structure types, it is illegal to call a function that takes
an argument value or returns a result value that is an incomplete type. My
first attempt was to check for RecordTypes and call
CXXRecordDecl::getDefinition(), but for some (but confusingly to me, not
all) template types getDefinition() returns NULL even though the type isn't
incomplete, is this expected? The code that I'm using is below, I also
wonder whether getCanonicalDecl() is overkill here:

bool isIncompleteType(clang::QualType inType) {
  if(inType->isRecordType()) {
    auto realType = inType->getAs<clang::RecordType>();`
    auto decl = realType->getDecl()->getCanonicalDecl();
    if(decl->getDefinition() ||
llvm::isa<clang::ClassTemplateSpecializationDecl>(decl)) {
      return false;
    }
    else {
      return true;
    }
  }
  return false;
}

I think you can compute this more directly with inType->isIncompleteType().

I've also discovered that parsing code that calls a builtin function causes

a no-argument, returns-int declaration to be inserted. It's been a while,
but as I remember, in C this kind of declaration actually means that the
function takes an unspecified number of arguments, but each one passed
should be promoted to the size of an int (did they update this since
pointers became much larger than ints?). In C++ it means something rather
different. It seems a bit odd to me that builtin functions don't have the
correct declaration inserted, since the compiler must have them on hand
somewhere.

Yes, in C, this is a no-prototype, implicit int return function. I'm not
sure what kind of builtin function you're referring to. If the name starts
with __builtin_, then the compiler knows the prototype. If it's a libc
function like "fprintf()", then you will probably get a warning and the
implicit declaration you describe. In C++, you shouldn't get these implicit
declarations, it's just an error.

Finally, I need to generate the list of base classes that a derived class
can be implicitly converted to. I have an intuitive understanding of when
it should work, but when I try to think of an algorithm to sort out
arbitrarily-evil trees combining virtual and normal inheritance I go a bit
cross-eyed. Can anyone point me at something that will help me figure this
out?

The inheritance hierarchy is always a directed acyclic graph, so you can
walk it recursively as long as you remember what you've already seen. If
you have a class with multiple base subobjects of the same type and you
don't care about which one an implicit conversion would pick, then you can
do depth-first search and throw everything into a set:
  // Result is the set BasesSeen.
  void doit(BasesSeen, RD)
    if (RD in BasesSeen)
      return
    BasesSeen.insert(RD)
    for (Base in RD.bases())
      doit(BasesSeen, Base)

Hope that helps!

Yes, this works, thanks. Thanks for the suggestion, I hadn’t seen that either. Unfortunately it doesn’t work, for the same cases as getDefinition(), these are (from my test data): std::fpos<__mbstate_t > std::reverse_iterator<__gnu_cxx::__normal_iterator<char const * , std::basic_string<char , std::char_traits , std::allocator > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char* , std::basic_string<char , std::char_traits , std::allocator > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<wchar_t const * , std::basic_string<wchar_t , std::char_traits<wchar_t > , std::allocator<wchar_t > > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<wchar_t* , std::basic_string<wchar_t , std::char_traits<wchar_t > , std::allocator<wchar_t > > > > __gnu_cxx::__normal_iterator<char16_t* , std::basic_string<char16_t , std::char_traits<char16_t > , std::allocator<char16_t > > > __gnu_cxx::__normal_iterator<char16_t const * , std::basic_string<char16_t , std::char_traits<char16_t > , std::allocator<char16_t > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char16_t const * , std::basic_string<char16_t , std::char_traits<char16_t > , std::allocator<char16_t > > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char16_t* , std::basic_string<char16_t , std::char_traits<char16_t > , std::allocator<char16_t > > > > std::initializer_list<char16_t > __gnu_cxx::__normal_iterator<char32_t* , std::basic_string<char32_t , std::char_traits<char32_t > , std::allocator<char32_t > > > __gnu_cxx::__normal_iterator<char32_t const * , std::basic_string<char32_t , std::char_traits<char32_t > , std::allocator<char32_t > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char32_t const * , std::basic_string<char32_t , std::char_traits<char32_t > , std::allocator<char32_t > > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char32_t* , std::basic_string<char32_t , std::char_traits<char32_t > , std::allocator<char32_t > > > > std::initializer_list<char32_t > std::istreambuf_iterator<char , std::char_traits > std::ostreambuf_iterator<char , std::char_traits > std::istreambuf_iterator<wchar_t , std::char_traits<wchar_t > > std::ostreambuf_iterator<wchar_t , std::char_traits<wchar_t > > __gnu_cxx::__normal_iterator<char* , std::vector<char , std::allocator > > __gnu_cxx::__normal_iterator<char const * , std::vector<char , std::allocator > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char const * , std::vector<char , std::allocator > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<char* , std::vector<char , std::allocator > > > __gnu_cxx::__normal_iterator<double const * , std::vector<double , std::allocator > > std::reverse_iterator<__gnu_cxx::__normal_iterator<double const * , std::vector<double , std::allocator > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<double* , std::vector<double , std::allocator > > > __gnu_cxx::__normal_iterator<unsigned int* , std::vector<unsigned int , std::allocator > > std::reverse_iterator<__gnu_cxx::__normal_iterator<unsigned int const * , std::vector<unsigned int , std::allocator > > > std::reverse_iterator<__gnu_cxx::__normal_iterator<unsigned int* , std::vector<unsigned int , std::allocator > > > std::initializer_list std::_List_iterator<TagLib::ByteVector > std::_List_const_iterator<TagLib::ByteVector > std::reverse_iterator<std::_List_const_iterator<TagLib::ByteVector > > std::reverse_iterator<std::_List_iterator<TagLib::ByteVector > > std::initializer_list<TagLib::ByteVector > std::_List_iterator<TagLib::String > std::_List_const_iterator<TagLib::String > std::reverse_iterator<std::_List_const_iterator<TagLib::String > > std::reverse_iterator<std::_List_iterator<TagLib::String > > std::initializer_list<TagLib::String > The functions that I’ve hit in my test data are: __atomic_fetch_add(); __builtin_isfinite(); __builtin_isinf(); __builtin_isnan(); __builtin_isnormal(); __builtin_isgreater(); __builtin_isgreaterequal(); __builtin_isless(); __builtin_islessequal(); __builtin_islessgreater(); For all of the above FunctionDecl::getBuiltinID() returns non-zero. I’ll give this a bash, though I’ll need to keep track of virtual and normal parents separately. I was trying to imagine a bottom-up recursive algorithm, which wasn’t working. Thanks for the suggestion!

I think you can compute this more directly with inType->isIncompleteType().

Thanks for the suggestion, I hadn't seen that either. Unfortunately it
doesn't work, for the same cases as getDefinition(), these are (from my
test data):

  std::fpos<__mbstate_t >

... snip ...

  std::initializer_list<TagLib::String >

I think these are just uninstantiated templates. We don't instantiate
templates when you declare a function that takes a template specialization
by value, for example, this code compiles:
template <typename T> struct MyVec;
void f(MyVec<int> v);

... but if you add a call to f, it will fail because it cannot complete
MyVec<int> by instantiation:
void g(MyVec<int> &x) { f(x); }

For your use case, you probably need to call RequireCompleteType at the
appropriate point. You may need to wait until the end of the TU if there
are some circular dependencies.

I've also discovered that parsing code that calls a builtin function

causes a no-argument, returns-int declaration to be inserted. It's been a
while, but as I remember, in C this kind of declaration actually means that
the function takes an unspecified number of arguments, but each one passed
should be promoted to the size of an int (did they update this since
pointers became much larger than ints?). In C++ it means something rather
different. It seems a bit odd to me that builtin functions don't have the
correct declaration inserted, since the compiler must have them on hand
somewhere.

Yes, in C, this is a no-prototype, implicit int return function. I'm not
sure what kind of builtin function you're referring to. If the name starts
with __builtin_, then the compiler knows the prototype. If it's a libc
function like "fprintf()", then you will probably get a warning and the
implicit declaration you describe. In C++, you shouldn't get these implicit
declarations, it's just an error.

The functions that I've hit in my test data are:
  __atomic_fetch_add();
  __builtin_isfinite();
  __builtin_isinf();
  __builtin_isnan();
  __builtin_isnormal();
  __builtin_isgreater();
  __builtin_isgreaterequal();
  __builtin_isless();
  __builtin_islessequal();
  __builtin_islessgreater();

For all of the above FunctionDecl::getBuiltinID() returns non-zero.

These are actually supposed to be variadic, according to the builtins table:
BUILTIN(__builtin_isunordered , "i.", "nc")

They have custom type checking:
/// SemaBuiltinUnorderedCompare - Handle functions like __builtin_isgreater
and
/// friends. This is declared to take (...), so we have to check
everything.
bool Sema::SemaBuiltinUnorderedCompare(CallExpr *TheCall) {

How do I get access to Sema, from a FrontendAction/ASTConsumer?

Never-mind, CompilerInstance::getSema()

Hi,

Still working on my SWIG replacement (slowly).

I re-wrote my walker this weekend to improve debugging and I've got a weirdness that I'd like explained:

The [first child of a [CXXRecordDecl that is not a forward declaration]] is (apparently always) a second (empty) CXXRecordDecl that uses the same identifier and source location.

For a file containing the following declarations:

class A {};
class B;

My trace output is:

walk CXXRecord A at /home/peter/Programming/llvm/ninja/../extra-dir/quaff/test.cc:23:1
   walk CXXRecord A::A at /home/peter/Programming/llvm/ninja/../extra-dir/quaff/test.cc:23:1
walk CXXRecord B at /home/peter/Programming/llvm/ninja/../extra-dir/quaff/test.cc:24:1

So, as a forward declaration, B has no children (as expected), but A has this strange child.

What purpose does this serve? Can I rely on it being that way into the future, and insert code that can automatically skip the first child (if there are any children)?

Hi Peter,

This is the injected class name. I had the same question a few years back:
http://clang-developers.42468.n3.nabble.com/Nested-forward-declarations-of-self-td4028967.html

HTH,
- Kim