problems with objects larger than PTRDIFF_MAX

It could be that 32-bit systems are disappearing so rapidly that nobody cares too much about this issue, but this blog post is still worth reading:

   A non-exhaustive list of ways C compilers break for objects larger than PTRDIFF_MAX bytes - TrustInSoft, exhaustive static analysis tools for software security and safety

John

I’ve come across this issue before and came to the following conclusion:

  • We are not obligated to support objects that large, C11 5.2.4.1/1 only requires that we support objects of size 65535! Their guidance for maximum object size is stated to be half of SIZE_MAX in C11 K.3.4/4 which is typically equivalent to PTRDIFF_MAX.
  • The expectation that PTRDIFF_MAX is more or less a proxy for the largest object size is not uncommon. For example, C++'s std::count doesn’t return a size_t but a iterator_traits<>::difference_type which is going to be a ptrdiff_t for things like std::vector.

I agree, LLVM is not obligated to support objects larger than PTRDIFF_MAX bytes.

A reasonable and friendly goal would be to document this constraint and perhaps also arrange for some compile-time warnings and/or runtime errors to be emitted when it is violated.

John

I've come across this issue before and came to the following conclusion:
- We are not obligated to support objects that large, C11 5.2.4.1/1 only
requires that we support objects of size 65535!

Right, the standard doesn't require it. But I guess you don't imply that it's fine for clang to silently miscompile any program that works with objects larger than 65535 bytes.

Their guidance for maximum
object size is stated to be half of SIZE_MAX in C11 K.3.4/4 which is
typically equivalent to PTRDIFF_MAX.

Whose guidance? Annex K is kinda alien to the rest of the standard and its future is not clear. See, e.g., N1969 — Updated Field Experience With Annex K — Bounds Checking Interfaces .

- The expectation that PTRDIFF_MAX is more or less a proxy for the largest
object size is not uncommon. For example, C++'s std::count doesn't return
a size_t but a iterator_traits<>::difference_type which is going to be a
ptrdiff_t for things like std::vector.

Bug in C++? Because max_size() of std::vector<char> has the type `size_type` and returns SIZE_MAX in practice (even on x86-64).

Let's see other examples (on x86-32):

- glibc's malloc is happy to allocate more than PTRDIFF_MAX bytes (and this is important in practice so (at least some) glibc devs are reluctant to "fix" it);

- clang has a compile-time limit for sizes of objects and types which is SIZE_MAX. For example, it compiles "char a[-1ul]; int main() {}" but complains about "char a[-1ul + 1ull]; int main() {}";

- `new` for more than PTRDIFF_MAX works fine when compiled by clang++ (but throws std::bad_array_new_length when compiled by g++).

It could be that 32-bit systems are disappearing so rapidly that nobody
cares too much about this issue,

Please note that there is a whole new 32-bit architecture -- x32.

I've come across this issue before and came to the following conclusion:
- We are not obligated to support objects that large, C11 5.2.4.1/1 only
requires that we support objects of size 65535!

Right, the standard doesn't require it. But I guess you don't imply that
it's fine for clang to silently miscompile any program that works with
objects larger than 65535 bytes.

I think that we should error when we can statically determine that an
object is larger than PTRDIFF_MAX.

Their guidance for maximum

object size is stated to be half of SIZE_MAX in C11 K.3.4/4 which is
typically equivalent to PTRDIFF_MAX.

Whose guidance? Annex K is kinda alien to the rest of the standard and its
future is not clear. See, e.g.,
N1969 — Updated Field Experience With Annex K — Bounds Checking Interfaces .

That document objects to much of Annex K but doesn't say anything about
K.3.4/4.

- The expectation that PTRDIFF_MAX is more or less a proxy for the largest

object size is not uncommon. For example, C++'s std::count doesn't
return
a size_t but a iterator_traits<>::difference_type which is going to be a
ptrdiff_t for things like std::vector.

Bug in C++? Because max_size() of std::vector<char> has the type
`size_type` and returns SIZE_MAX in practice (even on x86-64).

Some folks in the C++ community believe that using unsigned types in those
places was the wrong decision.

Let's see other examples (on x86-32):

- glibc's malloc is happy to allocate more than PTRDIFF_MAX bytes (and
this is important in practice so (at least some) glibc devs are reluctant
to "fix" it);

glibc is not the only game in town.
Android prevents it: https://android-review.googlesource.com/#/c/170800/
jemalloc prevents it:
Make *allocx() size class overflow behavior defined. · jemalloc/jemalloc@0c516a0 · GitHub

There are other implementations of malloc which also act similarly.

I've come across this issue before and came to the following conclusion:
- We are not obligated to support objects that large, C11 5.2.4.1/1 only
requires that we support objects of size 65535!

Right, the standard doesn't require it. But I guess you don't imply that
it's fine for clang to silently miscompile any program that works with
objects larger than 65535 bytes.

I think that we should error when we can statically determine that an
object is larger than PTRDIFF_MAX.

It seems we agree that the current situation is not good and something has to be fixed in clang/llvm. Ok.

Why don't you think that clang/llvm should just be fixed to support objects larger than PTRDIFF_MAX bytes?

- The expectation that PTRDIFF_MAX is more or less a proxy for the largest

object size is not uncommon. For example, C++'s std::count doesn't
return
a size_t but a iterator_traits<>::difference_type which is going to be a
ptrdiff_t for things like std::vector.

Bug in C++? Because max_size() of std::vector<char> has the type
`size_type` and returns SIZE_MAX in practice (even on x86-64).

Some folks in the C++ community believe that using unsigned types in those
places was the wrong decision.

Result of max_size() cannot be changed without changing its type?

Let's see other examples (on x86-32):

- glibc's malloc is happy to allocate more than PTRDIFF_MAX bytes (and
this is important in practice so (at least some) glibc devs are reluctant
to "fix" it);

glibc is not the only game in town.

Right, for example musl doesn't create objects larger than PTRDIFF_MAX bytes. I think this is good that different libc implementations are free to choose their own policies.

The problem is that this is only possible when compilers support large objects. When compilers don't support large objects (like now), support in libc is not enough -- you get your large objects but your accesses to it will be miscompiled. Sometimes.

Android prevents it: https://android-review.googlesource.com/#/c/170800/
jemalloc prevents it:
Make *allocx() size class overflow behavior defined. · jemalloc/jemalloc@0c516a0 · GitHub

A good share of the reasons for this is exactly the fact that clang/llvm don't support large objects. So this reasoning is circular:-)