r72619

Hi Duncan,

There's a problem with your check-in for r72619 is causing "weak external" symbols to appear in C++ code when it shouldn't. Take this code for example,

#include <stdexcept>

void dummysymbol() {
   throw(std::runtime_error("string"));
}

The c'tor for std::string is emitted as code from llvm-gcc. It is then inlined. And a weak external for the `_S_construct' variable is created. However, C++ rules stipulate that it shouldn't be, because of explicit & implicit template instantiation rules that Doug knows about but which makes my head spin.

I'm going to need to revert at least some of these changes. Could you take a look at this and let me know your thoughts on it?

-bw

Hi Bill,

There's a problem with your check-in for r72619 is causing "weak external" symbols to appear in C++ code when it shouldn't. Take this code for example,

#include <stdexcept>

void dummysymbol() {
  throw(std::runtime_error("string"));
}

The c'tor for std::string is emitted as code from llvm-gcc. It is then inlined. And a weak external for the `_S_construct' variable is created. However, C++ rules stipulate that it shouldn't be, because of explicit & implicit template instantiation rules that Doug knows about but which makes my head spin.

I can't reproduce this on x86-64 linux (but maybe I misunderstood what the
problem is?). I've attached the bitcode I get at -O4.

I'm going to need to revert at least some of these changes. Could you take a look at this and let me know your thoughts on it?

Can you please send me the bitcode you get on darwin.

Ciao,

Duncan.

bill.ll (17.4 KB)

Here's what I get with TOT compiling with -Os. The orig.ll is what I get before r72619. Notice that orig.ll has only one function in it. Both the one you sent and duncan.ll have more than one function. It's not the fact that more than one function is showing up, but these functions in particular shouldn't be there because of the implicit/explicit template instantiation stuff.

-bw

duncan.ll (14.4 KB)

orig.ll (4.66 KB)

Hi Bill,

Here's what I get with TOT compiling with -Os. The orig.ll is what I get before r72619. Notice that orig.ll has only one function in it. Both the one you sent and duncan.ll have more than one function. It's not the fact that more than one function is showing up, but these functions in particular shouldn't be there because of the implicit/explicit template instantiation stuff.

there are several functions in the example you sent, some linkonce and some
available_externally. Which ones shouldn't be there and why? Can you please
give more details about why it is a problem - it was kind of hand-wavy so
far :slight_smile:

Thanks,

Duncan.

Only "_Z11dummysymbolv" should be there. Here's Doug's explanation of why this should be so:

Here's what it *looks* like is happening, and where the FE is probably getting it wrong. First of all, the constructor in question is defined outside of the basic_string class template as a non-inline definition:

   template<typename _CharT, typename _Traits, typename _Alloc>
     basic_string<_CharT, _Traits, _Alloc>::
     basic_string(const _CharT* __s, const _Alloc& __a)

Second, there is an explicit template instantiation declaration:

     extern template class basic_string<char>;

That extern template instantiation declaration is supposed to suppress the implicit instantiation of non-inline member functions like that basic_string constructor. I had tripped over something similar to this previously, where the SL llvm-gcc was suppressing instantiation of all member functions of basic_string<char> (including inline ones, which would be a performance problem). So, there was clearly a change in this area.

Here's my guess: We're not properly suppressing the implicit instantiation of non-inline member functions defined out-of-line. Thus, we're instantiating that basic_string constructor when we shouldn't be. That instantiation then forces the implicit instantiation of _S_construct<const char*>. Since _S_construct is a member template, it's instantiation is *not* suppressed (despite being in basic_string<char>), so we emit it as a weak definition.

I don't have a debug llvm-gcc available to see why this might be happening. The logic to suppress instantiation based on an extern template is in instantiate_decl (gcc/cp/pt.c):

   /* Check to see whether we know that this template will be
      instantiated in some other file, as with "extern template"
      extension. */
   external_p = (DECL_INTERFACE_KNOWN (d) && DECL_REALLY_EXTERN (d));
   /* In general, we do not instantiate such templates... */
   if (external_p
       /* ... but we instantiate inline functions so that we can inline
          them and ... */
       && ! (TREE_CODE (d) == FUNCTION_DECL && DECL_INLINE (d))
       /* ... we instantiate static data members whose values are
          needed in integral constant expressions. */
       && ! (TREE_CODE (d) == VAR_DECL
             && DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (d)))
     goto out;

For the basic_string constructor, if we don't take that "goto", something is wrong. If we do take that "goto", my guess is wrong. I don't have a recent debug llvm-gcc to validate my guess.

Addendum: We don't take the "goto" in the "original" case and in ToT because of r51655. However, r72619 is what's triggering ToT to generate the bad instantiations that Doug talks about.

-bw

Only "_Z11dummysymbolv" should be there. Here's Doug's explanation of
why this should be so:

Here's what it *looks* like is happening, and where the FE is probably
getting it wrong. First of all, the constructor in question is defined
outside of the basic_string class template as a non-inline definition:

  template<typename _CharT, typename _Traits, typename _Alloc>
    basic_string<_CharT, _Traits, _Alloc>::
    basic_string(const _CharT* __s, const _Alloc& __a)

Second, there is an explicit template instantiation declaration:

    extern template class basic_string<char>;

That extern template instantiation declaration is supposed to suppress
the implicit instantiation of non-inline member functions like that
basic_string constructor. I had tripped over something similar to this
previously, where the SL llvm-gcc was suppressing instantiation of all
member functions of basic_string<char> (including inline ones, which
would be a performance problem). So, there was clearly a change in
this area.

Here's my guess: We're not properly suppressing the implicit
instantiation of non-inline member functions defined out-of-line.
Thus, we're instantiating that basic_string constructor when we
shouldn't be. That instantiation then forces the implicit
instantiation of _S_construct<const char*>. Since _S_construct is a
member template, it's instantiation is *not* suppressed (despite being
in basic_string<char>), so we emit it as a weak definition.

I don't have a debug llvm-gcc available to see why this might be
happening. The logic to suppress instantiation based on an extern
template is in instantiate_decl (gcc/cp/pt.c):

  /* Check to see whether we know that this template will be
     instantiated in some other file, as with "extern template"
     extension. */
  external_p = (DECL_INTERFACE_KNOWN (d) && DECL_REALLY_EXTERN (d));
  /* In general, we do not instantiate such templates... */
  if (external_p
      /* ... but we instantiate inline functions so that we can inline
         them and ... */
      && ! (TREE_CODE (d) == FUNCTION_DECL && DECL_INLINE (d))
      /* ... we instantiate static data members whose values are
         needed in integral constant expressions. */
      && ! (TREE_CODE (d) == VAR_DECL
            && DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (d)))
    goto out;

For the basic_string constructor, if we don't take that "goto",
something is wrong. If we do take that "goto", my guess is wrong. I
don't have a recent debug llvm-gcc to validate my guess.

So, on top of this it seems like a lot of the semantics have changed after your patch. I'm certain the existing patch is wrong and that we'll want a computation somewhat similar to the clang one that I think Doug is going to post.

I think the safe thing is to revert for now and we can discuss all of the semantics and what's going on in a more leisurely fashion and let poor Bill get his project building :slight_smile:

-eric

Imagine me standing in the snow, wearing a threadbare coat, selling matchsticks in front of city hall, my face dirty, my shoes have holes, a taxi cab drives by and splashes me, a solitary tear leaves a trail down my face.

Cue violins.

-bw

:slight_smile:

Clang has a specific function to determine whether an inline function should be externally visible (so that it will have strong linkage) or not (meaning that it should have available_externally linkage). It's FunctionDecl::isInlineDefinitionExternallyVisible() in tools/clang/lib/AST/Decl.cpp. That covers the C case and GNU inline semantics in C++.

For C++, it's easier to distinguish: inline functions have external definitions unless they have an explicit template instantiation declaration (C++0x parlance for "extern template"); see GetLinkageForFunction in tools/clang/lib/CodeGen/CodeGenModule.cpp.

llvm-gcc will have to duplicate that logic to use available_externally properly.

  - Doug

Cue violins.

Feeling angelic, I decided to try tweaking the order in which the
different linkage types are considered in StartFunctionBody, so
that DECL_EXTERNAL (->available_externally) is considered before
DECL_COMDAT (->linkonce). But it didn't help :frowning:

Ciao,

Duncan.

In case anyone cares, dragonegg gets this right. This shows that
(as suspected) the problem is in the gcc parts of llvm-gcc rather
than in the gimple -> IR conversion itself, since dragonegg has
the same conversion logic as llvm-gcc.

Ciao,

Duncan.

That's not entirely surprising... :slight_smile:

-bw

Looking for a raise, eh? There's certainly fictional precedent, and perhaps real as well: