Is there any status regarding 128-bit integer support in clang?

I noticed that gcc has incorporated support for int128_t, but there is no printf conversion specifier to output it.
Also there are no definitions for MIN_INT128, MAX_INT128, or U_INT128.
gcc did add z as a printf specifier for size_t, though.
A mul or div using 64-bits produces a 128-bit result. How is that handled?
An Internet search of “intel processors supporting 128-bit integers” produced some interesting things to read.

As a side note, long long is the same size as a long int. IMHO, long long should be 128 bit.

Welcome to one of the darkest corners (INTMAX) of an already-dark corner (ABI=Application Binary Interface).

An ancient decision to expose a maximal integer type in a way that affects the interoperability of different binaries effectively blocks any progress on this. People are trying to fix it, but it’s an Odyssey, because it involves convincing compiler vendors to do something their customers do not want them to do: break the ABI**.

This article explains it in much more detail (and colour).

** there might be an ABI-compatible solution even further out, but it’s almost certainly going to miss C23, the next C standard, and then it’s an open question when the next non-bugfix C standard will arrive (C11 → C23 will have been 12 years; C17 had no new features). Given the lag from standardization to implementation, to widespread use, it’s unfortunately possible that there’ll be no first-class 128-bit support for 1-2 decades still (in addition to the one already wasted on this).

Thanks for the link to the article. Very enlightening. Is this prison built by shared libraries? In hindsight, were shared libraries the right approach?

It plays a large part yes, because those get distributed through extremely diverse channels and are then hard to change. Shared libraries do have some undeniable advantages (which arguably led to the adoption we’re seeing), but it wouldn’t be such a big issue if things could be recompiled & redistributed more easily.

On the one end of the spectrum are those that control their whole stack and don’t mind recompiling everything. But there’s a very vocal constituency in the C/C++ committees which stubbornly insists that new code must be able to link to old DLLs/SOs (e.g. there are environments where people only get binaries from a contractor, and don’t have the sources to recompile) and basically kills any proposal that touches the ABI.

Arguably this is also due to the nature of shared libraries, so it’s a kind of a chicken-and-egg problem, but certainly, more proactive recompilation would solve the issue.

Finally, as the article lays out, it’s also that the standard grandfathered in so many bad existing practices at the time that are now rearing their ugly head - among them allowing to #undef standard facilities and “occupying” not-yet-standardized syntax or filling currently-undefined-behaviour with behaviour that a given implementation comes to depend upon (preventing standardisation of anything else).

It’s all kinds of upside down, but the standardisation committee is kind of hostage to the implementers, and the committee asserting itself risks one of those vendors walking away and causing even worse balkanisation (e.g. Microsoft’s MSVC not implementing C99 fully until today; and C11 only very recently, after C11 made certain parts of C99 optional).

Sidenote: There’s also a more recent (and very comprehensive) article about the ABI itself (not just INTMAX) by the same author.

In hindsight, were shared libraries the right approach?

I think many people have strong opinions on this, but IMO it’s impossible to say how things would have turned out differently. I think that with very good packaging & distribution infrastructure, dynamic linking is an unequivocal win, but such infrastructure takes a lot of effort and maintenance (for context: I spend a lot of my FLOSS time contributing to conda-forge, a package & environment manager that started out in the python ecosystem, but is essentially its own OS-light distribution, and handles non-python dependencies in a way that it becomes clear which packages need to be recompiled if some dependency’s ABI changes; while also avoiding that incompatible versions get installed).

Again, thanks for the comprehensive answer. I began my career as a NAVY Tactical Data Systems technician, learning the inside of a 30 bit computer, gate-by-gate. I am surprised by the current state of the C language. I loved C, when K&R came out because I knew assembly language, but didn’t want to write large programs at such a low level.

ABI is an important concept even in the absence of shared libraries. That is particularly true in C, where language features are often used to describe interfaces outside of the language’s abstract machine, and where, even within that machine, the weakness of the model often means that ABI coincidence is the only thing keeping code working at all.

I don’t think concerns about intmax_t are the primary blocker for a portable int128_t; it’s mostly a question of implementor priorities.

Did you read the article? The standard effectively forbids having a first-class** integer type larger than 64bits, and it cannot be changed due to ABI. It seems Steve Cannon is not on discourse, but he said:

I am unreasonably angry about this, because the intmax_t situation has kept me from enabling completely-working first-class int128 support out of clang for a decade

Here’s another article by @Quuxplusone on the topic, maybe he can shed some more light?

** this qualifier is important. Of course it’s possible to have a type for int128, but then it has to pretend not to be an “official” integer, which creates a bunch of other sharp edges.

Yeah, I guess I’m assuming it would have to be “extended” in some way, at least until the committee blesses. In practice, I don’t think that’s a problem as long as it’s the largest type; IIRC the restrictions are around things like how promotions involving it work.

Since I was summoned…

I noticed that gcc has incorporated support for int128_t, but there is no printf conversion specifier to output it. Also there are no definitions for MIN_INT128, MAX_INT128, or U_INT128.

Correct. The C standard doesn’t define a printf conversion specifier for __int128, nor any macros by those names, so naturally Clang/libc/libc++ don’t provide them. I don’t see any problem with this at all. After all, there’s no printf conversion specifier for std::vector<int> either, but we get along somehow without it.

gcc did add z as a printf specifier for size_t, though.

Yes, "%z" became standard in C99, IIRC. size_t is a standard type; __int128 is not.

A mul or div using 64-bits produces a 128-bit result. How is that handled?

This question has nothing to do with the __int128 type per se. If signed overflow occurs, it’s UB. (Including INT_MIN/-1.) Unsigned division can never “overflow.” Unsigned multiplication “overflows” by wrapping around in the usual way: If the unsigned multiplication is 32-bit, it wraps around at the 32nd bit. If the multiplication is 64-bit, it wraps around at the 64th bit. If the multiplication is 128-bit, it wraps around at the 128th bit.

IMHO, long long should be 128 bit.

No, it shouldn’t.

h-vetinari wrote:

The standard effectively forbids having a first-class** integer type larger than 64bits

Says Jean-Heyd, sure. In practice, Clang supports __int128 just fine. intmax_t is pegged to 64 bits because ABI, but that doesn’t actually stop any vendor from providing larger integer-like types. Heck, Clang provides _ExtInt(1024) these days!

AFAIK, intmax_t is completely irrelevant in practice, except for some APIs that assume “every possible integer will fit in an intmax_t.” Although I believe such APIs exist, I admit I can’t name any off the top of my head. It’s vastly more common to see APIs that assume “every possible positive integer will fit in a size_t,” and the C preprocessor famously assumes that “every possible integer will fit in a long long,” but those latter two API decisions are going to have trouble with __int128 no matter how your vendor defines intmax_t. (Again, nobody’s ever going to change the definition of intmax_t. It’s 64 bits, just like size_t. Deal with it.)

I don’t entirely understand @scanon’s tweet. Perhaps by “completely-working first-class int128 support” he means nothing more than “128-bit literals”, i.e.

__int128 f() {
    // Today: error. Tomorrow: just work?
    return 0x12345678123456781234567812345678;
}

This would probably need a suffix like (hypothetically) i128 or i128u, both to keep it as a conforming extension instead of a violation of the Standard, and because the existence of _ExtInt makes the unadorned literal somewhat ambiguous: is 0x12345678123456781 supposed to be an __int128 literal or an _ExtInt(68) literal? (Or an unsigned _ExtInt(65) literal, I suppose!?) Anyway, as rjmccall said, it’s not so much that we’re prevented from doing any of this, as that the set intersection of “people capable of adding new features to Clang” and “people who need _ExtInt literals to exist” is currently empty.

Anyway, see these two StackOverflow posts for the state of the art re: extended integer literals and printf-format-specifiers.

The state of the art is “doesn’t exist; roll it yourself if you need it.” Which, just like the situation for std::vector<int>, seems fine to me.

(And since I keep mentioning std::vector<int>, notice that fmt::format("{}", v) actually can format vector<int> just fine; and I believe that functionality is already on the docket for C++23. It can also format __int128 out of the box. Compiler Explorer )

1 Like

Thanks a lot for your perspective on this!

What I meant is that I implemented rough literal support a decade ago (piggybacking on ms-extensions, where someone had defined the suffixes without an implementation), which led to a bunch of discussion and ultimately this post by Richard Smith concerning interaction with the C and C++ standards of the time: [cfe-dev] MS 128-bit literals don't always have the correct type

I didn’t have bandwidth to do any further work on it then (I had only implemented it because the suffixes were already wired up and it was useful for a couple hobby projects I had), and eventually David Majnemer removed support because “most code in clang expects that
evaluation of an integer constant expression won’t give them something
that ‘long long’ can’t represent” (r243243 - [MS Extensions] Remove support for the i128 integer literal suffix). I don’t know if that’s still the case or not.