C vs. C++: Inconsistent type classifications

There are several places where C and C++ have different definitions
for the same type classification terms. For example, "integer types"
in C include enumeration types, but "integer types" in C++ doesn't
include enumeration types. On the other hand, "object types" in C
doesn't include incomplete types, while "object types" in C++ does
include incomplete types.

This presents a dilemma for the Type class, which has methods like
isIntegerType and isObjectType. It isn't clear whether these routines
should follow the C semantics, the C++ semantics, the current language
semantics, or whether we should decide on a case-by-case basis.

The third option turns out to be a bad idea: I prototyped it with
isIntegerType, making it follow the C semantics when we're in C and
the C++ semantics when we're in C++. This is fine for the C-specific
and C++-specific parts of the compiler, but it becomes very messy in
the common subset of C and C++. Let's not go there.

Personally, I favor following the C++ semantics, because I find the
C++ classifications more natural ("integer" type doesn't scream
"enumeration" for me, and whether or not we've seen a definition for a
type shouldn't affect what kind of type it is), and of course, in the
long run, the majority of the code in Clang is going to go toward
supporting C++.

If we go the C++-semantics route, I'll prepare a patch with at least
these changes:

  - isIntegerType will follow the C++ semantics ("false" for enums),
and be renamed to isIntegralType to represent the common term used in
C++
  - isSignedIntegerType and isUnsignedIntegerType will change in the same way
  - isArithmeticType will follow the C++ semantics.
  - add isIntegerOrEnumeralType (if needed, possibly with
signed/unsigned variants) and isArithmeticOrEnumeralType (if needed) ;
this will take some thought to deal with GNU's and C++0x's forward
declaration of enums.
  - isObjectType will follow the C++ semantics (true for incomplete
object types)
  - add isCompleteObjectType to handle the C semantics where we need
them ("complete object type" is also used in C++)

Any comments? Screams?

  - Doug

There are several places where C and C++ have different definitions
for the same type classification terms. For example, "integer types"
in C include enumeration types, but "integer types" in C++ doesn't
include enumeration types. On the other hand, "object types" in C
doesn't include incomplete types, while "object types" in C++ does
include incomplete types.

Ok

Personally, I favor following the C++ semantics,

Hehe, big surprise :wink:

because I find the
C++ classifications more natural ("integer" type doesn't scream
"enumeration" for me, and whether or not we've seen a definition for a
type shouldn't affect what kind of type it is), and of course, in the
long run, the majority of the code in Clang is going to go toward
supporting C++.

Is there any case where the C++ definition is less restricted than the C version? If the code ends up being: "if (isinteger || isenum)" then it is clear what it is testing. In common C/C++ code, there should be some comments that explains what is going on.

A fourth (bad) option is to come up with new nomenclature for these terms that is independent of C and C++. This punishes everyone equally :slight_smile:

In the end, I think this change is fine. Please be very careful to preserve the semantics of C and add comments where appropriate.

-Chris

Personally, I favor following the C++ semantics,

Hehe, big surprise :wink:

:slight_smile:

because I find the
C++ classifications more natural ("integer" type doesn't scream
"enumeration" for me, and whether or not we've seen a definition for a
type shouldn't affect what kind of type it is), and of course, in the
long run, the majority of the code in Clang is going to go toward
supporting C++.

Is there any case where the C++ definition is less restricted than the C
version?

Yeah, the definition of an object type in C++ includes incomplete
types, but the C definition does not include incomplete types, so C is
more restrictive in this case.

I may end up having to restructure some of the checks for incomplete
types (since they are implicit in isObjectType now) as part of making
this change. As you noted, it'll take a bit of care to get the
semantics right. However, I'll use this as an excuse to audit those
parts of the code to make sure they're doing the right thing for both
C and C++ (since they're bound to be C-centric now).

  - Doug

Doug Gregor wrote:

There are several places where C and C++ have different definitions
for the same type classification terms. For example, "integer types"
in C include enumeration types, but "integer types" in C++ doesn't
include enumeration types. On the other hand, "object types" in C
doesn't include incomplete types, while "object types" in C++ does
include incomplete types.

This presents a dilemma for the Type class, which has methods like
isIntegerType and isObjectType. It isn't clear whether these routines
should follow the C semantics, the C++ semantics, the current language
semantics, or whether we should decide on a case-by-case basis.

The third option turns out to be a bad idea: I prototyped it with
isIntegerType, making it follow the C semantics when we're in C and
the C++ semantics when we're in C++. This is fine for the C-specific
and C++-specific parts of the compiler, but it becomes very messy in
the common subset of C and C++. Let's not go there.

Personally, I favor following the C++ semantics, because I find the
C++ classifications more natural ("integer" type doesn't scream
"enumeration" for me, and whether or not we've seen a definition for a
type shouldn't affect what kind of type it is), and of course, in the
long run, the majority of the code in Clang is going to go toward
supporting C++.
  
How about having two methods when there are differences, e.g. isCIntegerType and isCXXIntegerType.
Having checks with C++ semantics called by C-parts will probably lead to confusion and subtle bugs.

-Argiris

That seems a bit like the third option, and I think it has the same
drawbacks: what do you use in the common subset of C and C++, the "C"
version or the "CXX" version? Probably whatever is appropriate, but
that means going back to the documentation each time to see "C" vs.
"CXX" in the name to determine the various subtleties. I think it's
better to have one set of semantics, and where we need to deviate from
those we'll have more explicit code.

  - Doug