C++ Language Support Library

Hi,

Regarding the C++ language support library, there are a few issues that I'd like to raise. Please note that this is only about the language support part of the C++ standard library.

1) Which support library do we use?
There are basically two options here. We can write our own, or we can use libsupc++ from GCC.

Using our own:
+ No licensing issues.
+ Allows us to do what we need to do (within the limits of the common C++ ABI)
- We have to write it.
- We have to maintain it.
- Interoperability problems with GCC and LLVM-GCC.

Using libsupc++:
+ Perfect interoperability with GCC and LLVM-GCC.
+ No need to write our own.
- May not fit our needs.
- Is GPL-licensed.

This decision isn't very pressing, since we only need to make it when we seriously start implementing C++ CodeGen, and we're far from that still.

2) How do we access its types?
The types from the support library are accessed within Sema - play a crucial role in some situations, even. There needs to be a nice interface to declare and access these types. The C support types - ptrdiff_t, size_t, etc. - are simply present in the ASTContext, always. With the C++ types, this may be considered a waste.

Here's a summary of which parts of the language use which types (may be incomplete):

a) typeid: Uses std::type_info as the return value, may throw std::bad_typeid. The exception type isn't necessarily accessed within the Sema; type_info most definitely is.
b) dynamic_cast: May throw bad_cast. Not necessarily accessed within the Sema.
c) new: Uses operator new and operator delete. May throw std::bad_alloc. None of these is necessarily used in Sema, but they will be needed during CodeGen.
d) Exceptions: exceptions use std::bad_exception, std::terminate, std::unexpected and more. Accessed by CodeGen, if not Sema.
e) Globals: Destructors for globals and statics are registered with atexit() or __cxa_atexit(). CodeGen would generate these calls.

C++0x adds some more:
f) Initializer lists: use std::initializer_list, which uses the ObjectType concept. Required in Sema for overload resolution.
g) Null pointers: use std::nullptr_t. This special type is considered fundamental and even participates in implicit conversions.
h) Range loop: uses the std::Range concept.

Concepts in general are very pervasive throughout C++0x.

The question of access is pressing. My typeid implementation doesn't work correctly without it.

Sebastian

2) How do we access its types?
The types from the support library are accessed within Sema - play a
crucial role in some situations, even. There needs to be a nice
interface to declare and access these types. The C support types -
ptrdiff_t, size_t, etc. - are simply present in the ASTContext, always.

Not true; we actually compute them as needed. Computing those types
isn't very expensive, though.

With the C++ types, this may be considered a waste.

A waste in what sense? We need to define them somewhere, and the
ASTContext seems as good a place as any. It's not as if putting the
methods on the ASTContext restricts the implementation in any
significant way. And if computing the types has a significant cost,
we can cache them.

Here's a summary of which parts of the language use which types (may be
incomplete):

a) typeid: Uses std::type_info as the return value, may throw
std::bad_typeid. The exception type isn't necessarily accessed within
the Sema; type_info most definitely is.
b) dynamic_cast: May throw bad_cast. Not necessarily accessed within the
Sema.
c) new: Uses operator new and operator delete. May throw std::bad_alloc.
None of these is necessarily used in Sema, but they will be needed
during CodeGen.
d) Exceptions: exceptions use std::bad_exception, std::terminate,
std::unexpected and more. Accessed by CodeGen, if not Sema.
e) Globals: Destructors for globals and statics are registered with
atexit() or __cxa_atexit(). CodeGen would generate these calls.

C++0x adds some more:
f) Initializer lists: use std::initializer_list, which uses the
ObjectType concept. Required in Sema for overload resolution.
g) Null pointers: use std::nullptr_t. This special type is considered
fundamental and even participates in implicit conversions.
h) Range loop: uses the std::Range concept.

All right, so ignoring the concept stuff, all we need is type_info,
nullptr_t, and maybe a few of the exception types depending on the
implementation. I'd say just add them to the ASTContext.

-Eli

Eli Friedman wrote:

  

2) How do we access its types?
The types from the support library are accessed within Sema - play a
crucial role in some situations, even. There needs to be a nice
interface to declare and access these types. The C support types -
ptrdiff_t, size_t, etc. - are simply present in the ASTContext, always.
    
Not true; we actually compute them as needed. Computing those types
isn't very expensive, though.
  

Ah, OK.

  

With the C++ types, this may be considered a waste.
    
A waste in what sense? We need to define them somewhere, and the
ASTContext seems as good a place as any. It's not as if putting the
methods on the ASTContext restricts the implementation in any
significant way. And if computing the types has a significant cost,
we can cache them.
  

I meant pre-computing them, before they are first used, would be a waste.

All right, so ignoring the concept stuff, all we need is type_info,
nullptr_t, and maybe a few of the exception types depending on the
implementation. I'd say just add them to the ASTContext.
  

I need to do that, and I also need to add them to the lookup table of Sema.

Sebastian

Eli Friedman wrote:

All right, so ignoring the concept stuff, all we need is type_info,
nullptr_t, and maybe a few of the exception types depending on the
implementation. I'd say just add them to the ASTContext.
  
Sema also needs to track whether std::type_info has been defined or not.
We could always add a 'std' namespace and a 'class typeinfo' forward reference and check whether the 'typeinfo' RecordDecl is defined, but it seems a bit wasteful to always declare them.

-Argiris

How do the headers define typeinfo? If it isn't something like
"typedef __builtin_typeinfo typeinfo;", it's going to be a pain to
deal with...

-Eli

Eli Friedman wrote:

Some implementations also require that it be marked with a special
#pragma, then they check that the contents of the class are what is
expected based on the standard. Frankly, I think this is the easiest
way to go, because it's going to be a pain to put all of the
class-defining logic into Sema by hand.

  - Doug

Oh, I didn't realize that per [expr.typeid], "If the header <typeinfo>
(18.6.1) is not included prior to a use of typeid, the program is
ill-formed."

Is there something wrong with having Sema just look up the identifier
"std::type_info" when it's processing a typeid?

-Eli

Eli Friedman wrote:

Is there something wrong with having Sema just look up the identifier
"std::type_info" when it's processing a typeid?
  
Sounds good! We can also check if the SourceLocation is in a system header and emit an error on something like this (which gcc happily accepts):

namespace std { class type_info {}; }
void f() { typeid(f); }

-Argiris

Hi Sebastian,

Hi,

Regarding the C++ language support library, there are a few issues that
I'd like to raise. Please note that this is only about the language
support part of the C++ standard library.

1) Which support library do we use?
There are basically two options here. We can write our own, or we can
use libsupc++ from GCC.

The licensing issues dominate, so we'll have to write out own. The
good news is that the primary entry points to libsupc++ are mainly
those functions required by the Itanium C++ ABI that GCC implements,
which is specified here:

  http://www.codesourcery.com/public/cxx-abi/abi.html

If we follow that, and allow for some tweaking, we should be able to
maintain compatibility with GCC.

  - Doug

Are:

type_info::__is_pointer_p
type_info::__is_function_p
type_info::__do_catch
type_info::__do_upcast

exposed entry points? These are public virtual functions of the gcc type_info, and are not part of the Itanium C++ ABI.

-Howard

None of these functions is actually used by the GNU front end, and the
ChangeLogs seem to indicate that the compiler stopped using these
names in 2000. They are all declared protected in libsupc++ and are
used internally.

That said, I don't have enough of a grasp of the C++ ABI to say
whether or not any of that matters :slight_smile:

  - Doug

Doug Gregor wrote:

  

The licensing issues dominate, so we'll have to write out own. The
good news is that the primary entry points to libsupc++ are mainly
those functions required by the Itanium C++ ABI that GCC implements,
which is specified here:

http://www.codesourcery.com/public/cxx-abi/abi.html

If we follow that, and allow for some tweaking, we should be able to
maintain compatibility with GCC.
      

Are:

type_info::__is_pointer_p
type_info::__is_function_p
type_info::__do_catch
type_info::__do_upcast

exposed entry points? These are public virtual functions of the gcc
type_info, and are not part of the Itanium C++ ABI.
    
None of these functions is actually used by the GNU front end, and the
ChangeLogs seem to indicate that the compiler stopped using these
names in 2000. They are all declared protected in libsupc++ and are
used internally.

That said, I don't have enough of a grasp of the C++ ABI to say
whether or not any of that matters :slight_smile:

If these functions truly are called only from within libsupc++ itself, then we only need to replicate the data members of the GCC RTTI structures, not its virtual table layout. This is because the virtual tables of type_info and its derived types are part of the support library.

The most interesting part of the interoperability is exceptions. We can claim to throw "GNUCC++\0" exceptions, and GCC code will be able to catch them. But then we really have to match GCC exactly in all relevant aspects (object layout, mostly).

If we want to be compatible on the object file level (i.e. we want to link GCC and Clang object files together), we need to exactly duplicate GCC's exception tables. Then, even our personality function will have to claim that it is GCC's (i.e. be called __gxx_personality_v0).

Sebastian

I agree that this is a needed feature.

I have a <stdexcept> laying around that will do this. It does not use std::string internally, so we are free to break abi with gcc's std::string (e.g. do a short string optimization). However it does match the layout of gcc's exceptions which do use std::string internally. It recreates a very limited ref-counted const std::string, layout compatible with gcc's std::string, non-templated (char-only), only has c_str() member (plus special members), not publicly exposed, 55 lines of code in all. All each exception class exposes publicly is a void*. E.g.:

class logic_error
     : public exception
{
private:
     void* __imp_;
public:
     explicit logic_error(const string&);
     explicit logic_error(const char*);

     logic_error(const logic_error&) throw();
     logic_error& operator=(const logic_error&) throw();

     virtual ~logic_error() throw();

     virtual const char* what() const throw();
};

Fortunately sizeof(gcc std::string) == sizeof(void*).

-Howard

Please no, I like produce -E files, then strip them down by removing all # lines...

What are the potential license issues of using libstdc++? While we could require a different implementation, it would really be nice for clang to also/optionally support libstdc++. Is there some strange wording in the library license?

-Chris

No, libsupc++ is the same license as libstdc++, which is GPL with an
exception for software compiled with the library:

// As a special exception, you may use this file as part of a free software
// library without restriction. Specifically, if other files instantiate
// templates or use macros or inline functions from this file, or you compile
// this file and link it with other files to produce an executable, this
// file does not by itself cause the resulting executable to be covered by
// the GNU General Public License. This exception does not however
// invalidate any other reasons why the executable file might be covered by
// the GNU General Public License.

I don't think anything prevents us from supporting libstdc++, but I'm
guessing we don't want to use it exclusively.

  - Doug

Ok. libgcc (as of GPL2 days, the GPL3 wording *still* isn't finished yet) has strange wording that basically says "the code is GPL unless compiled by GCC". For a random linux distro, for example, it would be fine to compile libstdc++ with GCC, and clang should be able to work with it.

I think that retaining compatibility with libstdc++ would be a very worthwhile goal (as is supporting the apache library, stlport, and/or whatever other ones exist, where reasonable). What is the cost of doing this? Does it use crazy GCC extensions that we don't want to implement?

-Chris

Chris Lattner wrote:

I think that retaining compatibility with libstdc++ would be a very worthwhile goal (as is supporting the apache library, stlport, and/or whatever other ones exist, where reasonable). What is the cost of doing this? Does it use crazy GCC extensions that we don't want to implement?

I think it's not only worthwhile, it's absolutely vital. If we don't support compiling with libstdc++ (and compile it exactly the way GCC does), we don't have binary compatibility for shared objects, where standard library types appear in interfaces. This would be bad for us, and bad for C++.

libstdc++ doesn't use any special GCC extensions that we don't already have, except in the TR1 and C++0x area. But those intrinsics we have to implement anyway.
The main problem is that we're playing catch-up, and libstdc++ is a moving target. As GCC's support for C++0x grows, so does libstdc++'s usage of those features. We'll probably have to be on par with GCC in features to compile libstdc++.

Sebastian

What are the potential license issues of using libstdc++? While we could
require a different implementation, it would really be nice for clang to
also/optionally support libstdc++. Is there some strange wording in the
library license?

No, libsupc++ is the same license as libstdc++, which is GPL with an
exception for software compiled with the library:

// As a special exception, you may use this file as part of a free
software
// library without restriction. Specifically, if other files instantiate
// templates or use macros or inline functions from this file, or you
compile
// this file and link it with other files to produce an executable, this
// file does not by itself cause the resulting executable to be covered by
// the GNU General Public License. This exception does not however
// invalidate any other reasons why the executable file might be covered
by
// the GNU General Public License.

I don't think anything prevents us from supporting libstdc++, but I'm
guessing we don't want to use it exclusively.

Ok. libgcc (as of GPL2 days, the GPL3 wording *still* isn't finished yet)
has strange wording that basically says "the code is GPL unless compiled by
GCC".

libstdc++ doesn't have this kind of wording, thankfully. The Intel
compiler on Linux compiles with libstdc++, and of course it's
proprietary.

For a random linux distro, for example, it would be fine to compile
libstdc++ with GCC, and clang should be able to work with it.

I think that retaining compatibility with libstdc++ would be a very
worthwhile goal (as is supporting the apache library, stlport, and/or
whatever other ones exist, where reasonable). What is the cost of doing
this? Does it use crazy GCC extensions that we don't want to implement?

libstdc++ aims to be pretty standard-conforming, and they avoid most
crazy GCC extensions. As Sebastian noted, in GCC 4.3 and newer are
using some C++0x features in their TR1 implementation. The biggest
feature there is variadic templates, which---while not trivial---isn't
terribly hard to implement if the template system itself is designed
well.

  - Doug