Calling main() is ub based optimizations

Hey,
I have question. Does clang take advantage of the fact that calling main or taking address of main function in C++ is undefined behavior?
I’ve heard that some compilers do so and AFAIRC I’ve heard that it was important to some benchmarks.
I guess compiler can assume the values of globals after calling main, because it knows that it is the first function called after initializing them.

Is there any optimization that do this in clang?

Piotr

main is marked as “norecurse” in C++ and not in C:

$ echo "int main() {}" | clang -x c - -emit-llvm -o - -S | grep recurse
$ echo "int main() {}" | clang -x c++ - -emit-llvm -o - -S | grep recurse
; Function Attrs: norecurse nounwind ssp uwtable
attributes #0 = { norecurse nounwind ssp uwtable ...

main is marked as “norecurse” in C++ and not in C:

I would be interested in seeing the standardese associated with each
behavior.

/Eric

Indeed. If I implement main in C++ and call it from C, or implement main in C and call it from C++, is it undefined behaviour in both cases? This seems like a very odd inconsistency if I have to know the source language for both the caller and callee to determine whether something is UB.

David

The relevant piece of standardese seems to be 3.6.1.3 in the C++17 working draft (n4606) standard, which reads in part:

“The function main shall not be used within a program. The linkage (3.5) of main is implementation-defined. A program that defines main as deleted or that declares main to be inline, static, or constexpr is ill-formed. The main function shall not be declared with a linkage-specification (7.5). A program that declares a variable main at global scope or that declares the name main with C language linkage (in any namespace) is ill-formed.”

So the cases you suggest, by my reading, should not be UB but rather a compile error.

–Ben

It sounds as if calling main from C++ should be a compile error. It’s not clear what’s expected to happen if a C compilation unit calls a C++ main. The C compiler doesn’t know to warn about this, because it doesn’t know that main is implemented in C++. The C++ compiler can’t emit main with a linkage type that prevents it, because otherwise _start (or equivalent) can’t call main, yet apparently is free to optimise on the assumption that this will never be the case. It sounds like either WG14, WG21, or both need a defect report about interoperability here.

If we’re doing LTO, then we could warn on attempting to merge a non-norecurse declaration of main with a norecurse definition. If we’re not, then I don’t know if we have enough information by the link stage to know if a relocation against main is safe or not (particularly as the C++ standard doesn’t appear to prevent taking the address of main or comparing against the result.

David

I suspect that this restriction in C++ came about in the first place
because some compilers were putting the code for dynamic initialization of
static variables at the beginning of the function called "main", and thus
recursively calling it would rerun the static initialization.

Isn’t the very beginning "The function main shall not be used within a program” enough?

It should also cover the “taking the address” part as well.

No, because this rule does not exist in the C standard and, from the perspective of a C compilation unit the main function is just another C function. The fact that a function in another compilation unit is implemented in a different language makes calling it UB. Consider two compilation units written in the common subset of C and C++:

Compilation unit A contains main(), which calls foo().

Compilation unit B calls main() the first time that it’s called, and returns otherwise.

If these are both compiled as C, this is valid code.
If these are both compiled as C++, then compilation unit B should be a compile-time error.
If A is compiled as C and B is compiled as C++, then B should be a compile-time error.
If A is compiled as C++ and B is compiled as C then we have a problem: is is well-defined behaviour in C for B to contain a call to main(), but it is UB in C++ for the main defined in A to be called.

This means that we don’t know if B is relying on UB unless we know whether A is compiled as C or C++ code. This is a horrible situation: flipping a compiler switch in A changes whether B is UB or not.

David

The relevant piece of standardese seems to be 3.6.1.3 in the C++17 working draft (n4606) standard, which reads in part:

“The function main shall not be used within a program. The linkage (3.5) of main is implementation-defined. A program that defines main as deleted or that declares main to be inline, static, or constexpr is ill-formed. The main function shall not be declared with a linkage-specification (7.5). A program that declares a variable main at global scope or that declares the name main with C language linkage (in any namespace) is ill-formed.”

So the cases you suggest, by my reading, should not be UB but rather a compile error.

It sounds as if calling main from C++ should be a compile error. It’s not clear what’s expected to happen if a C compilation unit calls a C++ main.

Isn’t the very beginning "The function main shall not be used within a program” enough?

No, because this rule does not exist in the C standard and, from the perspective of a C compilation unit the main function is just another C function.

Right, but when you form a program by mixing C and C++, I’m not sure how you can escape rules from both standard that applies to “the program” as whole.

The fact that a function in another compilation unit is implemented in a different language makes calling it UB.

Consider two compilation units written in the common subset of C and C++:

Compilation unit A contains main(), which calls foo().

Compilation unit B calls main() the first time that it’s called, and returns otherwise.

If these are both compiled as C, this is valid code.
If these are both compiled as C++, then compilation unit B should be a compile-time error.
If A is compiled as C and B is compiled as C++, then B should be a compile-time error.
If A is compiled as C++ and B is compiled as C then we have a problem: is is well-defined behaviour in C for B to contain a call to main(), but it is UB in C++ for the main defined in A to be called.

This means that we don’t know if B is relying on UB unless we know whether A is compiled as C or C++ code. This is a horrible situation: flipping a compiler switch in A changes whether B is UB or not.

Yes, that’s not great, but you can run into this issue with any “program-wide” rules in standards (luckily there’s not that many).