C as used/implemented in practice: analysis of responses

Your option 3 is the preferred way to handle your example, because it allows one to reason about the behaviour of the program. As a bonus, it does so without interfering in any way with optimisation.

What most users really want - and what many of them /expect/ even though it’s currently not actually the case - is for C to behave like high-level assembly language, for code to translate straightforwardly into machine operations. Yes, it would be nice to get consistent behaviour between compilers. But that’s not going to happen. Failing that, given that there is no way to improve all compilers, improving one compiler would be much better than nothing.

1. The performance gain from this on real programs is small. I will suggest
that the total performance gain from optimisations that rely on exploiting
undefined behaviour - let's call them monkey's paw optimisations for short -
is practically never more than a few percent, and often less than one
percent.

2. For most programs, having the program work is worth far more than having
it run a few percent faster.

Which may or may not be fine until you decide to switch
compilers/platforms. Encouraging programmers to use Clang-specific
interpretations of these constructs would promote vendor lock-in and
be a blow for portability, which I think is worse than UB. At least
now we can tell people "you're doing it wrong".

Cheers.

Tim.

Not all code needs to be moved between compilers.

That having been said, if my proposal were implemented, you would still be perfectly free to tell people ‘you’re doing it wrong, you shouldn’t actually depend on things the standard doesn’t cover’ (even though in practice, people use compiler extensions, system specific code and suchlike all the time, and nobody seems to think this is a reason such things shouldn’t be provided).

Even if you want to have the compiler warn at compile time about code that depends on undefined behaviour (in cases where it can deduce that such is occurring), that’s okay.

But having programs miscompiled so that they silently fail, in many cases starting only years after the code in question was written, is very much not okay. That’s far worse than documented portability problems.

But having programs miscompiled so that they silently fail, in many cases
starting only years after the code in question was written, is very much not
okay. That's far worse than documented portability problems.

When given a certain spin...

Tim.

Why do you say spin? I’m not making any of this up; there have been published cases of bugs creeping into code that had worked correctly for years, without any change to the code itself, because a new version of GCC started applying a monkey’s paw optimisation. That’s the sort of thing that prompted the survey that started this thread.

You're dismissing all use-cases other than this very narrow one I'd
(with my own spin) characterise as "Do What I Mean, I Can't Be
Bothered To Get My Code Right". Fair enough, you're arguing in favour
of a point; but it's not one I agree with.

Tim.

Ø I’m not making any of this up; there have been published cases of bugs creeping into code that had worked correctly

Ø for years

The generated machine code may have implemented what the programmer intended. However, the bugs were there all along

in that the program relied on undefined behavior. The specified semantics of the programming language, or of LLVM IR since

that itself is a programming language is what the user (or FE that uses LLVM IR as its output) can rely on, nothing else.

Kevin B. Smith

I am arguing in favor of a point, and I understand you disagree with it, but I don’t think I’m dismissing any use cases except a very small performance increment. Furthermore, the compiler would still be free to perform such optimisations where it can prove they won’t break the program. That’s not all cases, to be sure, but at least we would then be back to the normal scenario where over the years as the compiler gets smarter, things get better, as opposed to monkey’s paw optimisations which cause a smarter compiler to make things worse.

And what I advocate is that the specified semantics of LLVM IR be changed
to improve the reliability of typical code (which, as the survey shows, in
practice does already rely on things most programmers intuitively rely on).

And yet, this thread died a long time ago in the GCC list, and it
seems you're not having luck here either.

You could say that it's confirmation bias, since we're all compiler
guys anyway, but go ask on the LKML and you'll find that they pretty
much ask for it.

Just maybe, you thought that this was a much bigger issue that it
really is, and most developers do like when they're told they messed
up, or when things explode in their faces.

If you like the safety of one-vendor buy-in implementation-defined,
maybe you ought to look at Java instead?

cheers,
--renato

And yet, this thread died a long time ago in the GCC list, and it
seems you're not having luck here either.

Yep. At least this way I know I tried.

If you like the safety of one-vendor buy-in implementation-defined,

maybe you ought to look at Java instead?

For myself, I don't care so much either way; I'm more familiar with C++
than most people who use it, and I'm not trying to manage a large project.

Your option 3 is the preferred way to handle your example, because it
allows one to reason about the behaviour of the program. As a bonus, it
does so without interfering in any way with optimisation.

The problem is that setting it bit-wise is platform dependent. For example,
that concrete example (with 95 replaced by 94) would have padding after the
char array depending on sizeof(long), so some of the stores might not even
affect any of the fields declared in the struct. So you're basically asking
the compiler to do something pretty arbitrary and platform dependent, which
is what it is already basically doing (minus a lot of the platform
dependence). This is the sort of thing that I was talking about upthread.

It could be argued that deleting the loop (which would lead to an obvious
error immediately, with any basic testing) is better than a "works on my
machine" situation where the latent bug is one due to bitwise layout
differences -- I wouldn't want to debug that. Obviously an intelligible
static diagnosis of the situation is the ideal case, but doing so is a hard
problem (and may be too noisy anyway).

-- Sean Silva

From: "Russell Wallace" <russell.wallace@gmail.com>
To: "Tim Northover" <t.p.northover@gmail.com>
Cc: "Hal Finkel" <hfinkel@anl.gov>, llvmdev@cs.uiuc.edu
Sent: Wednesday, July 1, 2015 2:22:24 PM
Subject: Re: [LLVMdev] C as used/implemented in practice: analysis of responses

I am arguing in favor of a point, and I understand you disagree with
it, but I don't think I'm dismissing any use cases except a very
small performance increment.

You seem to be implying that these optimizations only yield small performance improvements. To be sure, most optimizations yield only small (or no) improvement to most programs. But this is simply because there are a large number of optimizations (LLVM has millions of lines of code, and a significant number of them are dedicated to code optimization). Furthermore, most user code is not in hot regions, and thus does not matter for performance. Thus, the number of optimizations that matter to hot code regions in any particular program is often (although not always) limited. However, there is wide variety in hot code, and essentially all of these optimizations were implemented because they yielded a significant improvement to someone's hot code region (or were a closely-related case).

-Hal

I am arguing in favor of a point, and I understand you disagree with it,
but I don't think I'm dismissing any use cases except a very small
performance increment.

I'm sure Google has numbers about how much electricity/server cost they
save for X% performance improvement.
I'm sure Apple has numbers about how much money they make with X% improved
battery life.
I'm not convinced that the cost of some of these bugs is actually larger
than the benefit of faster programs. Nor am I convinced about the inverse.
I'm just pointing out that pointing to a "bad bug" caused by a certain
optimization without comparing the cost of the bug to the benefit of the
optimization is basically meaningless. You'll need to quantify "very small
performance improvement" and put it in context of the bugs you're talking
about.

Furthermore, the compiler would still be free to perform such
optimisations where it can prove they won't break the program.

"won't break the program" is very hard to know...

-- Sean Silva

We already perform optimizations only when the compiler can prove they won’t break the program.

The only difference between that and what you suggest is in the definition of “won’t break the program”. We define it as “won’t break the program with respect to the semantics implied by the C/C++ spec”.

You want to redefine it, by specifying a new abstract machine, which is more conservative than standard C/C++. The proper way to do that would, I believe, be to work towards setting up a working group within the relevant committees, and come up with a uniformly accepted definition for this abstract machine, which could then be implemented (assuming there is, indeed, wide enough agreement in the implementer community – something that does not look at all likely) by next-generation compilers.

Point is – I think you’re barking up the wrong tree.

This isn’t an llvm-dev issue, it’s a standards committee issue.

Michael

The problem is that setting it bit-wise is platform dependent.

To expand on this slightly: This platform dependence includes config registers on the same processor. MIPS implementations often support both big and little-endian (selectable at boot time) which will write the halves of the double’s in opposite orders in the given example.

We already perform optimizations only when the compiler can prove they won’t
break the program.

The only difference between that and what you suggest is in the definition
of “won’t break the program”. We define it as “won’t break the program with
respect to the semantics implied by the C/C++ spec”.

You want to redefine it, by specifying a new abstract machine, which is more
conservative than standard C/C++. The proper way to do that would, I
believe, be to work towards setting up a working group within the relevant
committees, and come up with a uniformly accepted definition for this
abstract machine, which could then be implemented (assuming there is,
indeed, wide enough agreement in the implementer community – something that
does not look at all likely) by next-generation compilers.

Point is – I think you’re barking up the wrong tree.

This isn’t an llvm-dev issue, it’s a standards committee issue.

This thread has diverged a bit since I was last here, and I'm not the
one you were responding to above. But as far as I'm concerned, the
question for the LLVM community (and similarly the GCC community and
other compiler developers) is, for each of the ways in which systems
code and OS developers clearly are relying on behaviour that you don't
think you currently support, what is the least-runtime-cost and
least-effort way of doing so (and what is that cost and effort)? If
people could comment on that concretely, e.g. for each of the
questions of the survey, that would be really helpful.

A tighter semantics does not necessarily mean a global change to C -
it might just need a particular choice of existing or new options that
makes sense for OS developers. And there are plenty of precedents
here that have not involved the standards committee, e.g.
fno-strict-aliasing. In general, the standards committee prefers to
codify existing practice, so establishing something workable in
practice is the best first step in any case.

Peter

Bingo!

--renato

This one is interesting, because the biggest problem with strict
aliasing is that there is no standard compliant way to override it.
The most basic issue is how the allocator is supposed to work
internally. If you can fully inline malloc/free pairs, it is practically
impossible to avoid aliasing conflicts.

Other important use cases are things like vectorizing access, which
often means checking for the alignment of the data and casting to a more
appropiate type. Not everyone wants to implement strlen in assembler,
but writing a standard compliant and still fast implementation in C
seems impossible.

Joerg

This is blatantly wrong. There are quite a few things where hot code
would get seriously penalized if you want to "fix" UB. A very basic
example is over-large shifts. If the compiler can't prove that the shift
is within the size of the type, it would have to add additional branches
on at least one major CPU architecture as x86 and ARM implement
different results in that case. There is code where such additional
branches is providing a significant penalty. It is just one example.

If you say "performance gains from artifical UB", you might get more
agreement. The aggressive nonnull optimisations are likely a good
candidate here. I do hope that the definition of what is UB and what not
can be changed in the C standard, but there are still limits. What
happens on access to a NULL pointer should remain UB as the behavior
does differ between embedded systems and a desktop/server environment
using virtual memory with page protection.

Joerg