[RFC] [Clang] adding strong typedefs

I’m proposing implementing strong typedefs in Clang, a compile-time type
safety feature that creates distinct types to prevent implicit conversions
between types with the same underlying representation, catching semantic errors
at compile time with zero runtime overhead.

My design goals are as follows:

  • Prevents implicit conversions between different strong types
  • Prevents implicit conversions from strong types to underlying types
  • Allows implicit conversions from underlying types to strong types (initialization)
  • Zero runtime cost - desugars to underlying type in LLVM IR
  • Works with arithmetic, pointers, arrays, and all standard C operations
  • C and C++ support
  • This is a language extension, not trying to go directly to the standard

Motivations

Type confusion bugs are a common source of errors in C code. When different
semantic concepts share the same underlying type, the compiler cannot
distinguish between them.

Here’s some examples of type classes that could be better discriminated with strong typedefs:

  • Unit confusion: meters vs yards, seconds vs milliseconds
  • ID confusion: mixing user IDs, product IDs, session IDs
  • Currency handling: USD vs EUR amounts
  • Security boundaries: trusted vs untrusted data, sanitized vs raw strings
  • Domain-specific types: file descriptors vs PIDs, different kinds of handles

Proposed Implementation

Syntax

Strong typedefs use the strong attribute on typedef declarations:

__attribute__((strong)) typedef double Meters;
__attribute__((strong)) typedef double Yards;
__attribute__((strong)) typedef int UserId;

The attribute can be placed before or after the typedef name:

__attribute__((strong)) typedef int T1;
typedef int T2 __attribute__((strong));

The [[strong]] attribute syntax is also supported:

[[strong]] typedef double Meters;
[[strong]] typedef double Yards;
[[strong]] typedef int UserId;

We may also (or exclusively) want a non-attribute spelling (I.e., keyword).

Type Compatibility Rules

The following rules define strong typedef compatibility:

  1. Strong → Strong (same): Compatible

    UserId uid1 = 42;
    UserId uid2 = uid1;  // OK
    
  2. Strong → Strong (different): Incompatible

    __attribute__((strong)) typedef int UserId;
    __attribute__((strong)) typedef int ProductId;
    UserId uid = 42;
    ProductId pid = uid;  // ERROR
    
  3. Non-strong → Strong: Compatible (initialization)

    UserId uid = 42;  // OK: implicit conversion from int
    
  4. Strong → Non-strong: Incompatible

    UserId uid = 42;
    int i = uid;  // ERROR
    
  5. Explicit casts: Always allowed

    UserId uid = 42;
    ProductId pid = (ProductId)uid;  // OK
    int i = (int)uid;                // OK
    

Questions

  • How should we handle ABI differences? Should there be any differences between a strong typedef parameter and its underlying type for the purposes of mangling and other ABI considerations?
  • How should strong typedefs interact with _Generic?
  • Any C++ concerns? templates, overloads, SFINAE

Links

CCs

@kees @rjmccall @efriedma-quic @mizvekov

1 Like

My first impression is, this is very similar to transparent_union. The primary difference is that you want to add a primitive form of operator overloading. Which… sure, operator overloading is nice, but good luck convincing the C committee.

Linux would like this to construct new types (e.g. pid_t, gfp_t, etc) that can’t be mixed up with their base type. i.e. typedef int pid_t; only creates an alias right now. We can do this with anonymous structs (typedef struct { int count; } refcount_t) but that requires explicit accessors, which makes them ungainly.

With the new types, we gain new mangling for function prototypes (i.e. KCFI see pid_t thing(int) separate from int thing(int), which we want). And the big deal is that we can create types that do not undergo implicit truncation, promotion, etc. Any promotion or truncation then must be explicit so we can get away from ambiguous usage.

2 Likes

My first thought for integers and C++ is just useenum class. Which is does not allow implicit casting (normally). But also does not provides <=> either (but that can be added easily in the code).

__attribute__((strong)) is just asking to be confused with __attribute__((weak)) etc in having something to do with symbol binding, so if going down this route I’d recommend a different name.

9 Likes

I have definite concerns regarding the syntax (IMO we should find a keyword that MEANS this, not try to slapdash attributes into this meaningfully changing the type system), but I’m generally in favor of strong typedefs if we can find semantics that are useful.

I’m unfortunately not at a point where I have sufficient time to think this through further (this has horrible timing for me thanks to just returning from LLVM-Dev/WG21), but I’d like to get a chance to spend some cycles on.

1 Like

This proposal mentions use of some _NewType keyword as well. Maybe we could look into that as a viable alternative. I’d think the attribute syntax would be useful too, though. [[strong]] doesn’t look half bad to me but I suppose it’s more about what an attribute should be allowed to describe and perhaps this is a step too far.

Thanks for taking a look :slight_smile:

I am interested in C code as well.

Thank you for the RFC! This is a problem both committees have looked at solving on several different occasions, so I think there’s interest in the feature. However, neither committee has ever come up with a solution they’re happy with and we should keep those design concerns in mind when exploring this space.

I don’t believe either committee will accept an attribute. However, there is precedent for us using an attribute in a position where a keyword could be standardized later (think: [[clang::require_constant_initialization]] which was standardized as constinit in C++). But the attribute would be [[clang::strong]] rather than [[strong]] (vendor attributes always have a vendor prefix).

I think there has to be, otherwise you couldn’t do:

void func(strong_type_to_int i);
void func(int i);

I think parameter passing ABI should be the same, but mangling seems like it has to be different for the feature to work. (Or am I wrong about that?)

C is based entirely around the notion of “compatible types”, so we’d need some kind of specification that explains how a strong typedef fits into type compatibility in C to be able to answer that question. My gut reaction is that a strong typedef is an entirely unique type, and thus not compatible. If they’re not compatible types, then you can have them in the same generic selection expression without violating a constraint. But you’d have to look through the C standard to see where compatible types are important and decide what, if anything, needs to be done for strong typedefs. For example, should you be able to do something like:

typedef int mighty_int [[clang::strong]];

void func(...) {
  va_list list;
  va_start(list);
  int i = va_arg(list, int); // Is this UB?
  va_end(list);
}

int main() {
  mighty_int i = 12;
  func(i);
}

because this will come up for things like printf("%d", some_mighty_int);.

You’ll need to do a similar exercise for C++, which uses different mechanisms that type compatibility for this sort of stuff.

Some previous WG21 efforts you should look at:

tl;dr: I think this is a useful area to explore, but there’s a ton of history in the area too. Whatever we come up with, it needs to keep an eye towards standardization and so we should pay very close attention to what the concerns were in the committees with past attempts.

1 Like

Reading through Toward Opaque Typedefs for C++1Y, v2, it is clear to me that the C++ standards body will have a hard time standardizing strong typedefs for the language. The issue presented on page 6 is a good example of the types of pain to expect from trying to support C++ in predictable ways.

I do have a question, though.

Do we need to have a fully complete design for both languages before an extension can be added to one? C++ is a much more challenging design space and that language even has built-in ways to manufacture opaque types from scratch. C, on the other hand, requires its “opaque types” to take the form of a structure definition with a variety of accessors and tucked-away data members.

Is it possible to start with a language extension in C (with minimal C++ support) and expand to better support C++ later on? Some of the prior art you’ve linked is a couple of decades old. I think it is very possible this highly desirable feature is left to rot on white papers for the next couple of decades too. Perhaps we can push things forward with a best-effort language extension – an instance where the implementation drives the standardization (or whatever the saying is :smiley:).

edit: I should mention that I am willing to drive the implementation and that I have a proof-of-concept tree.

It’s Complicated™ :smiley: More below.

It’s reasonable to start with a language extension in C and no initial support in C++. However, there’s pressure for it to also be supported in C++ eventually (shared header files, but also, if we go through the effort to make the extension and review the extension, we want it to be widely applicable so we get good return on our investment). So whatever design we come up with, we need to make sure it doesn’t do something which makes it impossible to support C++ in the future, even if we don’t support it initially. So extensions often do require some design consideration for both languages.

1 Like

yeah, my take matches Aarons, except impossible I would replace with something like practical. We don’t have to have it IMPLEMENTED in C++, but a really good idea of how we’d solve all the problems that comes with it would be necessary IMO.

I find this RFC severely underdeveloped. For instance, it’s not clear whether you intend to allow defining strong typedefs of class types. I have thoughts on this (potential) aspect of the feature, but I’m not sure if it’s relevant to share them.

One of the benefits of following the documented process for adding extensions (i.e. through the committee(s)) is that you need to write a paper, receive feedback, and if it’s positive, at least try to write actual specification. This process would make you think about a lot of things you’d initially miss, saving ourselves a trouble of shipping a potentially half-baked feature, then trying to fix it without disrupting existing users too much.

Worth noting that someone on WG14 is trying to bring scoped enums to C. It is sure early, but N3568 exists and was discussed during the last WG14 meeting, where the committee (thankfully) preferred doing it along the lines of what C++ did.

So it still might be a fruitful direction to piggyback on scoped enums, defining your extension as e.g. adding implicit comparison operators to scope enums, rather than inventing an entirely new thing and defining how it interacts with both languages

1 Like

I think this is an interesting proposal but I am wary of allowing it in C w/o considering all the issues raised in Walter Brown’s paper n3741 b/c eventually we want to have this in C++ and if we don’t consider the design space we could really limit ourselves.

I think section 3 is probably the easier considerations. Section 4, 7 and 8 bring up much more tricky questions. Perhaps the approach should be to think of way of limiting the design to avoid some of the issues raised there but I have not given it deep thought.

1 Like

We use wrappers in the SPTM to distinguish between values that have been validated and values that haven’t, and there’s been a request on the back burner to create “strong typedefs” instead. The main issue with the feature in general is that everybody agrees this should be possible, but once you get into the details, everybody wants something a little different.

When it comes to integer types in particular, one big issue in C is integer promotion. For instance:

int add(char a, char b) { return a + b; }

Before a and b are added, they are promoted to int. What is the interaction between strong typedefs and integer promotion? Do we say that they are not promoted but they can still participate in arithmetic operations if both operands are the same strong type? Do we say that they cannot participate in arithmetic operations? What about passing them to variadic functions like printf, which requires promotion?

Actually wrapping the type in a struct at least obviates all of these issues–you can’t do anything with your objects, aside from passing them around, unless you unwrap them. There are also no issues with overloading rules, SFINAE, etc. The dearth of operations is understandably not ideal, but it’s the common floor that everyone seems to be OK with.

My intent with this post is three-fold:

  1. bring a bit more attention to strong typedefs

  2. learn about the earthly demons associated with full C++ support (thanks @AaronBallman for the links, I hadn’t found those with my naive web search).

  3. propose some syntax and basic rules around implicit conversions

I am sorry I should have made it clear that my ~300 word write-up was not a full specification. Perhaps I should have used a different tag other than ‘RFC’? I’ll try to make my RFCs more in-depth in the future, again sorry about that.

Also thanks for bringing up the scoped enums proposal, I am researching this now :slight_smile:

What is SPTM?

Right, this is the tricky part.

The idea is that these strong types cannot be implicitly converted at all. In your add example we would fail the build with something along the lines of:

$ cat test.c
[[strong]] typedef int MyInt;

int add(MyInt a, MyInt b) {
  return a + b;
}

$ clang test.c -fsyntax-only
test.c:4:10: error: returning 'MyInt' (aka 'int') from a function with incompatible result type 'int'
    4 |   return a + b;
      |          ^~~~~

… variadic functions would perhaps be an exception where we allow implicit conversions to underlying types. I’m sure that’s a whole can of worms too. I’ll mess around with this in my proof-of-concept tree to see what I can learn here.

SPTM stands for Secure Page Table Monitor. In one sentence, it’s a semi-hypervisor that manages page tables on behalf of xnu.

The case of [[strong]] typedef int MyInt; is the “easy” one. C does not support arithmetic on types smaller than int: in integral arithmetic, anything smaller than int is automatically promoted to int (and the result value is also an int).

[[strong]] typedef short MyShort;
MyShort x = (MyShort)123;
MyShort y = x;
x + y;

Is this legal? If it’s legal, is the result type int or something else?

Regarding strong typedefs of class types (which I believe was not covered in Walter Brown’s paper):

struct A { int i; };
struct B { int j; }; // same member-specification as in A
[[clang::strong]] typedef A C;
union U { A a; B b; C c; };

void f() {
  U u;
  u.a = A{42};
  assert(u.b.j == 42);
  // well-formed (and passes) in both C and C++, 
  // because 'i' and 'j' are in the common initial sequence of A and B

  assert(u.c.i == 42); // ???
  // if this is not well-formed, then using attributes for strong typedefs is a bad idea,
  // because implementations are allowed to ignore attributes,
  // which would change the semantics of the program
}

Units (like meters) are a bad motivation for this feature. Adding meters to meters and getting meters seems cool, but you have to handle multiplication and division correctly, and this feature has no means that I can see of expressing “meters/sec” or “square meters”.

2 Likes