On the preferred use of C++ in the clang source code

I've heard that clang is meant to be implemented in a "subset of C++" (which I guess means that some core language features are barred from use).

Is there a document anywhere that describes and motivates that subset?

James Widman

It is pretty subjective. We do use almost all C++ features somewhere in the (greater llvm) code base. It's really more about making clear and simple code than it is about banning specific language features. Some coding guidelines are available here:
http://llvm.org/docs/CodingStandards.html

That said, there are two features we don't like: RTTI and EH. This is because they violate the "don't pay for it if you don't use it" principle. If building with GCC, clang disables both RTTI and EH support (-fno-rtti and -fno-exceptions). The main llvm repository has a couple of places that still use RTTI, but we'd like to fix that.

The main problem with RTTI and EH is the impact on binary size. Many clients of LLVM and at least one client of clang use them in a JIT context. Having the binaries be as small as possible makes it easier for people to distribute them with their apps.

-Chris

I've heard that clang is meant to be implemented in a "subset of C+
+" (which I guess means that some core language features are barred
from use).

Is there a document anywhere that describes and motivates that subset?

It is pretty subjective. We do use almost all C++ features somewhere in the (greater llvm) code base. It's really more about making clear and simple code than it is about banning specific language features. Some coding guidelines are available here:
http://llvm.org/docs/CodingStandards.html

Bookmarked; thanks!

Note, the reference to "Effective C++" at the bottom is a little out of date; the third edition was published two years ago and contains some significant changes:

http://tinyurl.com/26gch4

Also the other two links (for the strings "available as well" and "Large-Scale C++ Software Design" appear to be broken.

That said, there are two features we don't like: RTTI and EH. This is because they violate the "don't pay for it if you don't use it" principle. If building with GCC, clang disables both RTTI and EH support (-fno-rtti and -fno-exceptions). The main llvm repository has a couple of places that still use RTTI, but we'd like to fix that.

The main problem with RTTI and EH is the impact on binary size. Many clients of LLVM and at least one client of clang use them in a JIT context. Having the binaries be as small as possible makes it easier for people to distribute them with their apps.

I think that may be the least-flaky objection to EH that I've heard so far. (:

I'll have to meditate on it.

Thanks again!

James Widman

It is pretty subjective. We do use almost all C++ features somewhere in the (greater llvm) code base. It's really more about making clear and simple code than it is about banning specific language features. Some coding guidelines are available here:
http://llvm.org/docs/CodingStandards.html

Bookmarked; thanks!

Note, the reference to "Effective C++" at the bottom is a little out of date; the third edition was published two years ago and contains some significant changes:

http://tinyurl.com/26gch4

Also the other two links (for the strings "available as well" and "Large-Scale C++ Software Design" appear to be broken.

Thanks, fixed!

That said, there are two features we don't like: RTTI and EH. This is because they violate the "don't pay for it if you don't use it" principle. If building with GCC, clang disables both RTTI and EH support (-fno-rtti and -fno-exceptions). The main llvm repository has a couple of places that still use RTTI, but we'd like to fix that.

The main problem with RTTI and EH is the impact on binary size. Many clients of LLVM and at least one client of clang use them in a JIT context. Having the binaries be as small as possible makes it easier for people to distribute them with their apps.

I think that may be the least-flaky objection to EH that I've heard so far. (:

:slight_smile:

It is actually really frustrating, because certain pieces of the compiler (e.g. error recovery in the parser) could be slightly cleaner with exceptions.

-Chris

<nod>

I know the feeling; I've wanted to use exceptions for exactly that purpose in a certain other C/C++ parser.

James Widman

I’ve been playing around with clang/LLVM looking at adding partial support for the draft technical report for embedded C extensions (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf), in particular: memory spaces. It’s been fairly simple to thread memory space ID’s through LLVM so far, but I’m new to FE’s and the language from the TR has left me wondering about the best way to implement this in clang. From TR18037:

Clause 6.2.5 - Types, replace the second sentence of paragraph 25 with:

Each unqualified type has several qualified versions of its type,38) corresponding to the combinations
of one, two, or all three of the const, volatile, and restrict qualifiers, and all combinations
of any subset of these three qualifiers with one address space qualifier. (Syntactically, an address
space qualifier is an address space name, so there is an address space qualifier for each visible
address space name.)

The question I have is, how to track this info without adding memory space id’s to QualType, which seems
(1) infeasible given the implementation of QualType as a smart pointer with only a few bits for additional data, and
(2) would loose the performance benefit of the current QualType implementation (and thus the whole purpose of QualTypes existence, it seems) if QualType were made extensible.

My first thought was to create a new Type subclass called MemSpacedType that would essentially just be used to store the memory space ID in addition to the QualType of the underlying type. Is this the way to go? I’m deep in new territory and need some seasoned advice.

Thanks

Christopher Lamb wrote:-

I've been playing around with clang/LLVM looking at adding partial
support for the draft technical report for embedded C extensions
(TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf),
in particular: memory spaces. It's been fairly simple to thread
memory space ID's through LLVM so far, but I'm new to FE's and the
language from the TR has left me wondering about the best way to
implement this in clang. From TR18037:

Clause 6.2.5 - Types, replace the second sentence of paragraph 25 with:

Each unqualified type has several qualified versions of its type,38)
corresponding to the combinations
of one, two, or all three of the const, volatile, and restrict
qualifiers, and all combinations
of any subset of these three qualifiers with one address space
qualifier. (Syntactically, an address
space qualifier is an address space name, so there is an address
space qualifier for each visible
address space name.)

The question I have is, how to track this info without adding memory
space id's to QualType, which seems
(1) infeasible given the implementation of QualType as a smart
pointer with only a few bits for additional data, and
(2) would loose the performance benefit of the current QualType
implementation (and thus the whole purpose of QualTypes existence, it
seems) if QualType were made extensible.

My first thought was to create a new Type subclass called
MemSpacedType that would essentially just be used to store the memory
space ID in addition to the QualType of the underlying type. Is this
the way to go? I'm deep in new territory and need some seasoned advice.

My opinion is that the only reasonable solution is to abandon the
QualType optimization and have QualTypes be "just another Type". I
noted this issue a while ago. Chris may have other ideas.

Neil.

I’ve been playing around with clang/LLVM looking at adding partial support for the draft technical report for embedded C extensions (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf), in particular: memory spaces.

Nice!

It’s been fairly simple to thread memory space ID’s through LLVM so far, but I’m new to FE’s and the language from the TR has left me wondering about the best way to implement this in clang. From TR18037:

Clause 6.2.5 - Types, replace the second sentence of paragraph 25 with:

Each unqualified type has several qualified versions of its type,38) corresponding to the combinations
of one, two, or all three of the const, volatile, and restrict qualifiers, and all combinations
of any subset of these three qualifiers with one address space qualifier. (Syntactically, an address
space qualifier is an address space name, so there is an address space qualifier for each visible
address space name.)

The question I have is, how to track this info without adding memory space id’s to QualType, which seems
(1) infeasible given the implementation of QualType as a smart pointer with only a few bits for additional data, and
(2) would loose the performance benefit of the current QualType implementation (and thus the whole purpose of QualTypes existence, it seems) if QualType were made extensible.

My first thought was to create a new Type subclass called MemSpacedType that would essentially just be used to store the memory space ID in addition to the QualType of the underlying type. Is this the way to go? I’m deep in new territory and need some seasoned advice.

Yep, I think this is a very reasonable way to go. QualType itself is just an optimization for representing types. Instead of having Type*'s everywhere, and having a “ConstType” type and “RestrictType” type (that wrapped some other type), the information is encoded into QualType.

However, this optimization for CVR qualifiers doesn’t impact other “qualifiers”. It would be very reasonable to have an AddressSpaceQualifiedType class, which takes an address space ID and a QualType. This combines the space/time efficiency niceties of QualType with the generality of having explicit classes for all of these.

-Chris

Chris Lattner wrote:-

Yep, I think this is a very reasonable way to go. QualType itself is
just an optimization for representing types. Instead of having
Type*'s everywhere, and having a "ConstType" type and "RestrictType"
type (that wrapped some other type), the information is encoded into
QualType.

However, this optimization for CVR qualifiers doesn't impact other
"qualifiers". It would be very reasonable to have an
AddressSpaceQualifiedType class, which takes an address space ID and a
QualType. This combines the space/time efficiency niceties of
QualType with the generality of having explicit classes for all of
these.

Having qualifiers in multiple places will make "qualifier calculus" like

o is A unqualified?
o does A have all the qualifiers of B?
o what is the unqualified form of A?

etc., awkward, no?

Neil.

Chris Lattner wrote:-

Yep, I think this is a very reasonable way to go. QualType itself is
just an optimization for representing types. Instead of having
Type*'s everywhere, and having a "ConstType" type and "RestrictType"
type (that wrapped some other type), the information is encoded into
QualType.

However, this optimization for CVR qualifiers doesn't impact other
"qualifiers". It would be very reasonable to have an
AddressSpaceQualifiedType class, which takes an address space ID and a
QualType. This combines the space/time efficiency niceties of
QualType with the generality of having explicit classes for all of
these.

Having qualifiers in multiple places will make "qualifier calculus" like

o is A unqualified?
o does A have all the qualifiers of B?

etc., awkward, no?

Nope, it is very simple. CVR qualifiers *only* exist on QualType. This means that for something like:

"const volatile randomqual int"

The node for "randomqual" would contain *just* a "Type*" (not a qualtype) to int.

The reference to "const volatile randomqual int" would be a qualtype whose Type* points to the above node, but has the "CV" bits set. This means you can have exact type equality checks in constant time with qualtype comparison still.

o what is the unqualified form of A?

This does require an extra check, but that should be hidden in QualType itself.

-Chris

Chris Lattner wrote:

I've heard that clang is meant to be implemented in a "subset of C+
+" (which I guess means that some core language features are barred
from use).

Is there a document anywhere that describes and motivates that subset?

It is pretty subjective. We do use almost all C++ features somewhere in the (greater llvm) code base. It's really more about making clear and simple code than it is about banning specific language features. Some coding guidelines are available here:
http://llvm.org/docs/CodingStandards.html

That said, there are two features we don't like: RTTI and EH. This is because they violate the "don't pay for it if you don't use it" principle.

Is this violation intrinsic to the concepts and requirements of RTTI and/or EH or is this an implementation issue?

I have no personal attachment to RTTI but EH and its associated safety guarantees allow me to read, write and reason about code much better.

If these /are/ implementation issues does CLang plan to do better in its C++ implementation?

Thanks,

Michael Marcin

P.S. Sorry for dredging up such an old post but I'm really behind reading this list :slight_smile:

It is pretty subjective. We do use almost all C++ features somewhere
in the (greater llvm) code base. It's really more about making clear
and simple code than it is about banning specific language features.
Some coding guidelines are available here:
http://llvm.org/docs/CodingStandards.html

That said, there are two features we don't like: RTTI and EH. This
is because they violate the "don't pay for it if you don't use it"
principle.

Is this violation intrinsic to the concepts and requirements of RTTI
and/or EH or is this an implementation issue?

I have no personal attachment to RTTI but EH and its associated safety
guarantees allow me to read, write and reason about code much better.

If these /are/ implementation issues does CLang plan to do better in its
C++ implementation?

I don't see an obvious way an implementation could fix this. The issue is that vtable-based implementations is that you need to know at the point of the vtable emission (often a single .cpp file) whether you will be throwing the class or calling typeof on it. If it is possible, you have to emit a bunch of metadata.

In implementations that don't hang the RTTI info off the vtable, you could certainly do it, but I'm not aware of any that do this, and we have to be compatible with platform ABIs.

-Chris