[RFC] Changing/clarifying clang's handling of -fno-builtin and -ffreestanding

Hi,

I’m trying to “fix” a longstanding issue about TLI options and LTO: we’re not preserving -ffreestanding of -veclib options from the compile phase to the link phase.

As a starting point, I need to clarify our handling of -ffreestanding and -fno-builtin. Right now clang does not differentiate between the two (except that -ffreestanding disable special handling of function “main”).

Here is the proposed behavior:

1) -ffreestanding carries the fact that the compiler can’t assume anything about the environment: i.e. the libc is not present and the compiler should not create calls to libc functions. -ffreestanding implies -fno-builtin.
2) -fno-builtin describe the handling of the compiler with respect to the *source*. It tells the compiler to not assume anything about a call to, let say, malloc() in the source. We implement this already using the LLVM attribute “nobuiltin” on such calls. But I’d like to stop removing these builtins from the TLI and allow LLVM to create a call to (for example) memset() even with -fno-builtin.

LTO will happily implement freestanding and veclib through module flags, without having to care about the list of “no-builtin”, which will be carried by the function declaration and the call sites attributes (we can discuss further the merging strategy in case of incoherency between modules, but I’d like to get us to agree on the high level bits first).

I’d be interested to know if this is in line with GCC handling of such options (I can’t be 100% sure just by reading the doc).

Note also that this is going quite against (some of) the previous views developed in this thread: http://lists.llvm.org/pipermail/llvm-dev/2013-February/059562.html

Best,

Hi,

I’m trying to “fix” a longstanding issue about TLI options and LTO: we’re not preserving -ffreestanding of -veclib options from the compile phase to the link phase.

As a starting point, I need to clarify our handling of -ffreestanding and -fno-builtin. Right now clang does not differentiate between the two (except that -ffreestanding disable special handling of function “main”).

Here is the proposed behavior:

1) -ffreestanding carries the fact that the compiler can’t assume anything about the environment: i.e. the libc is not present and the compiler should not create calls to libc functions. -ffreestanding implies -fno-builtin.

Even freestanding implementations are required to support certain functions (e.g. memcpy). -ffreestanding should only disable those not required.

  2) -fno-builtin describe the handling of the compiler with respect to the *source*. It tells the compiler to not assume anything about a call to, let say, malloc() in the source. We implement this already using the LLVM attribute “nobuiltin” on such calls. But I’d like to stop removing these builtins from the TLI and allow LLVM to create a call to (for example) memset() even with -fno-builtin.

This makes sense to me.

LTO will happily implement freestanding and veclib through module flags, without having to care about the list of “no-builtin”, which will be carried by the function declaration and the call sites attributes (we can discuss further the merging strategy in case of incoherency between modules, but I’d like to get us to agree on the high level bits first).

I’d be interested to know if this is in line with GCC handling of such options (I can’t be 100% sure just by reading the doc).

Note also that this is going quite against (some of) the previous views developed in this thread: http://lists.llvm.org/pipermail/llvm-dev/2013-February/059562.html

I skimmed the thread, but can you be more specific about to which views you're referring? The thread is also somewhat out of date because we do now implement -fno-builtin-FOO.

Thanks again,
Hal

Hi,

I’m trying to “fix” a longstanding issue about TLI options and LTO: we’re not preserving -ffreestanding of -veclib options from the compile phase to the link phase.

As a starting point, I need to clarify our handling of -ffreestanding and -fno-builtin. Right now clang does not differentiate between the two (except that -ffreestanding disable special handling of function “main”).

Here is the proposed behavior:

1) -ffreestanding carries the fact that the compiler can’t assume anything about the environment: i.e. the libc is not present and the compiler should not create calls to libc functions. -ffreestanding implies -fno-builtin.

Even freestanding implementations are required to support certain functions (e.g. memcpy). -ffreestanding should only disable those not required.

Right there are a few exceptions.

2) -fno-builtin describe the handling of the compiler with respect to the *source*. It tells the compiler to not assume anything about a call to, let say, malloc() in the source. We implement this already using the LLVM attribute “nobuiltin” on such calls. But I’d like to stop removing these builtins from the TLI and allow LLVM to create a call to (for example) memset() even with -fno-builtin.

This makes sense to me.

LTO will happily implement freestanding and veclib through module flags, without having to care about the list of “no-builtin”, which will be carried by the function declaration and the call sites attributes (we can discuss further the merging strategy in case of incoherency between modules, but I’d like to get us to agree on the high level bits first).

I’d be interested to know if this is in line with GCC handling of such options (I can’t be 100% sure just by reading the doc).

Note also that this is going quite against (some of) the previous views developed in this thread: http://lists.llvm.org/pipermail/llvm-dev/2013-February/059562.html

I skimmed the thread, but can you be more specific about to which views you're referring?

The thread had multiple views about what to support and how, but re-reading it I may have misunderstood (part of) it last night, it is not that different.

The thread is also somewhat out of date because we do now implement -fno-builtin-FOO.

Well we don’t serialize the -fno-builtin-FOO to the IR, do we?

For instance the “devirtualization” case mentioned by Chris here: http://lists.llvm.org/pipermail/llvm-dev/2013-February/059621.html is not handled with LTO and -fno-builtin-printf today, and it wouldn’t be handled by a non-LTO compilation with my plan.

Hi,

I’m trying to “fix” a longstanding issue about TLI options and LTO: we’re not preserving -ffreestanding of -veclib options from the compile phase to the link phase.

As a starting point, I need to clarify our handling of -ffreestanding and -fno-builtin. Right now clang does not differentiate between the two (except that -ffreestanding disable special handling of function “main”).

Here is the proposed behavior:

1) -ffreestanding carries the fact that the compiler can’t assume anything about the environment: i.e. the libc is not present and the compiler should not create calls to libc functions. -ffreestanding implies -fno-builtin.

Even freestanding implementations are required to support certain functions (e.g. memcpy). -ffreestanding should only disable those not required.

Right there are a few exceptions.

  2) -fno-builtin describe the handling of the compiler with respect to the *source*. It tells the compiler to not assume anything about a call to, let say, malloc() in the source. We implement this already using the LLVM attribute “nobuiltin” on such calls. But I’d like to stop removing these builtins from the TLI and allow LLVM to create a call to (for example) memset() even with -fno-builtin.

This makes sense to me.

LTO will happily implement freestanding and veclib through module flags, without having to care about the list of “no-builtin”, which will be carried by the function declaration and the call sites attributes (we can discuss further the merging strategy in case of incoherency between modules, but I’d like to get us to agree on the high level bits first).

I’d be interested to know if this is in line with GCC handling of such options (I can’t be 100% sure just by reading the doc).

Note also that this is going quite against (some of) the previous views developed in this thread: http://lists.llvm.org/pipermail/llvm-dev/2013-February/059562.html

I skimmed the thread, but can you be more specific about to which views you're referring?

The thread had multiple views about what to support and how, but re-reading it I may have misunderstood (part of) it last night, it is not that different.

The thread is also somewhat out of date because we do now implement -fno-builtin-FOO.

Well we don’t serialize the -fno-builtin-FOO to the IR, do we?

AFAIK, no.

For instance the “devirtualization” case mentioned by Chris here: http://lists.llvm.org/pipermail/llvm-dev/2013-February/059621.html is not handled with LTO and -fno-builtin-printf today, and it wouldn’t be handled by a non-LTO compilation with my plan.

This seems somewhat orthogonal, except for the fact that the changes you're proposing will regress anyone depending on this in non-LTO compilation, but we should fix this also.

  -Hal

I think that whether the implementation is freestanding versus hosted is independent of whether or not the libraries are available and hence, whether the builtins can be used or not.

I don't have a copy of C11, but C90 and C99 both permit a conforming freestanding implementation to provide the ISO C libraries, it does not mandate that they must NOT be present. These Standards also state that a conforming freestanding application cannot assume that the Standard libraries are present.

This means that the statement:

   1) -ffreestanding carries the fact that the compiler can’t assume anything about the environment

is not really true, since the compiler plus its libraries "is" the implementation and can assume whatever is has implemented. However, a program written by the user and built with this implementation may not assume the existence of the libraries if it intends to claim to be a conforming freestanding application.

This means that a given implementation should be perfectly free to provide or not provide the libraries (or subsets of the libraries), and that the compiler should be free to use them even if the conforming freestanding application does not refer to them.

Would it not be better to allow the target to decide whether the builtins should be used or not? Simply implying no builtins loses a lot of significant optimisation opportunities that are very valuable to embedded freestanding applications.

As it happens, our implementation is freestanding because we have no operating system, but we still provide the almost the entire ISO C libraries, although some routines will return an error (e.g. 'fopen' will return a valid ISO C failure [in 'errno'] saying that it was unable to open the file - we have no file-system).

We do suppress some builtins that we have decided not to support be automatically adding '-fno-builtin-XXX' in our implementation of 'void Clang::AddShaveTargetArgs(const ArgList &args, ArgStringList &cmdArgs) const'.

As an aside, we also support initialisation, finalisation and normal 'main' even though the platform has no OS.

So I don't think that '-ffreestanding' should imply '-fno-builtin' automatically.

  MartinO

I think that whether the implementation is freestanding versus hosted is independent of whether or not the libraries are available and hence, whether the builtins can be used or not.

I don't have a copy of C11, but C90 and C99 both permit a conforming freestanding implementation to provide the ISO C libraries, it does not mandate that they must NOT be present. These Standards also state that a conforming freestanding application cannot assume that the Standard libraries are present.

This means that the statement:

    1) -ffreestanding carries the fact that the compiler can’t assume anything about the environment

is not really true, since the compiler plus its libraries "is" the implementation and can assume whatever is has implemented. However, a program written by the user and built with this implementation may not assume the existence of the libraries if it intends to claim to be a conforming freestanding application.

This means that a given implementation should be perfectly free to provide or not provide the libraries (or subsets of the libraries), and that the compiler should be free to use them even if the conforming freestanding application does not refer to them.

Would it not be better to allow the target to decide whether the builtins should be used or not? Simply implying no builtins loses a lot of significant optimisation opportunities that are very valuable to embedded freestanding applications.

As it happens, our implementation is freestanding because we have no operating system, but we still provide the almost the entire ISO C libraries, although some routines will return an error (e.g. 'fopen' will return a valid ISO C failure [in 'errno'] saying that it was unable to open the file - we have no file-system).

We do suppress some builtins that we have decided not to support be automatically adding '-fno-builtin-XXX' in our implementation of 'void Clang::AddShaveTargetArgs(const ArgList &args, ArgStringList &cmdArgs) const'.

As an aside, we also support initialisation, finalisation and normal 'main' even though the platform has no OS.

So I don't think that '-ffreestanding' should imply '-fno-builtin' automatically.

So you believe that the only thing that -ffreestanding should do is disable special handling of main? I'm unclear on what you think that -ffreestanding should do.

  -Hal

It’s a very good question. Why should it even change 'main'? The Standards permit a freestanding implementation to use 'main' as the entry-point too. Indeed, I wonder is it a useful option at all.

In a nutshell, ISO C (90 and 99 at least) allow a conforming program to claim freestanding conformance so long as it does not rely on 'main' having special meaning (and being the natural entry-point to the program) and so long as it restricts itself to using only the ISO C library features described in a restricted set of headers. From C99 (C90 is very similar):

  Section 4 "Conformance":

  A conforming freestanding implementation shall accept
  any strictly conforming program that does not use complex
  types and in which the use of the features specified in
  the library clause (clause 7) is confined to the contents
  of the standard headers <float.h>, <iso646.h>, <limits.h>,
  <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. A
  conforming implementation may have extensions (including
  additional library functions), provided they do not alter
  the behavior of any strictly conforming program.

and:

  Section 5.1.2.1 "Freestanding environment":

  In a freestanding environment (in which C program execution
  may take place without any benefit of an operating system),
  the name and type of the function called at program startup
  are implementation-defined. Any library facilities available
  to a freestanding program, other than the minimal set
  required by clause 4, are implementation-defined. The effect
  of program termination in a freestanding environment is
  implementation-defined.

and finally in Section J.3 "Implementation-defined behaviour":

  Section J.3.12 "Library functions":

  Any library facilities available to a freestanding program,
  other than the minimal set required by clause 4 (5.1.2.1).

The Standard does not provide a pre-defined macro or other mechanism by which a program can conditionally determine whether it is being compiled for a hosted or freestanding environment, so it is up to the programmer to ensure that they do not use features of a hosted environment not explicitly supported by the freestanding environment if they want to claim that their program is a conforming freestanding program. Such a program is a strict subset of a conforming program that does not claim to be freestanding, so it should be possible to compile it with a freestanding or hosted implementation.

Its hens teeth really.

C++14 further adds this wrinkle:

  Section 3.6.1 "Main function":

  It is implementation-defined whether a program in a
  freestanding environment is required to define a main
  function. [ Note: In a freestanding environment, start-up
  and termination is implementation-defined; startup contains
  the execution of constructors for objects of namespace
  scope with static storage duration; termination contains
  the execution of destructors for objects with static
  storage duration. —end note ]

I think at best, the compiler or perhaps the static analysers could diagnose when a user's program uses features not explicitly supported in a freestanding environment (when '-ffreestanding' is selected). Perhaps this might be a valid use of '-ffreestanding' and '-fhosted'.

My own preference would be simply remove the option, but provide the target with a suitable API or framework to allow it to choose and direct how builtins are used. After all, is an implied use of 'memcpy' by the compiler - even when the program makes no reference to it - any different to calling builtin helper functions from 'compiler-rt' such as '__divdi3'?

  MartinO

It’s a very good question. Why should it even change 'main'?

As I recall, we do this only in C++ (where it is illegal for a program to refer to main, so we know it is norecurse).

   The Standards permit a freestanding implementation to use 'main' as the entry-point too. Indeed, I wonder is it a useful option at all.

In a nutshell, ISO C (90 and 99 at least) allow a conforming program to claim freestanding conformance so long as it does not rely on 'main' having special meaning (and being the natural entry-point to the program) and so long as it restricts itself to using only the ISO C library features described in a restricted set of headers. From C99 (C90 is very similar):

  Section 4 "Conformance":

  A conforming freestanding implementation shall accept
  any strictly conforming program that does not use complex
  types and in which the use of the features specified in
  the library clause (clause 7) is confined to the contents
  of the standard headers <float.h>, <iso646.h>, <limits.h>,
  <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. A
  conforming implementation may have extensions (including
  additional library functions), provided they do not alter
  the behavior of any strictly conforming program.

and:

  Section 5.1.2.1 "Freestanding environment":

  In a freestanding environment (in which C program execution
  may take place without any benefit of an operating system),
  the name and type of the function called at program startup
  are implementation-defined. Any library facilities available
  to a freestanding program, other than the minimal set
  required by clause 4, are implementation-defined. The effect
  of program termination in a freestanding environment is
  implementation-defined.

and finally in Section J.3 "Implementation-defined behaviour":

  Section J.3.12 "Library functions":

  Any library facilities available to a freestanding program,
  other than the minimal set required by clause 4 (5.1.2.1).

The Standard does not provide a pre-defined macro or other mechanism by which a program can conditionally determine whether it is being compiled for a hosted or freestanding environment, so it is up to the programmer to ensure that they do not use features of a hosted environment not explicitly supported by the freestanding environment if they want to claim that their program is a conforming freestanding program. Such a program is a strict subset of a conforming program that does not claim to be freestanding, so it should be possible to compile it with a freestanding or hosted implementation.

Its hens teeth really.

C++14 further adds this wrinkle:

  Section 3.6.1 "Main function":

  It is implementation-defined whether a program in a
  freestanding environment is required to define a main
  function. [ Note: In a freestanding environment, start-up
  and termination is implementation-defined; startup contains
  the execution of constructors for objects of namespace
  scope with static storage duration; termination contains
  the execution of destructors for objects with static
  storage duration. —end note ]

I think at best, the compiler or perhaps the static analysers could diagnose when a user's program uses features not explicitly supported in a freestanding environment (when '-ffreestanding' is selected). Perhaps this might be a valid use of '-ffreestanding' and '-fhosted'.

My own preference would be simply remove the option, but provide the target with a suitable API or framework to allow it to choose and direct how builtins are used. After all, is an implied use of 'memcpy' by the compiler - even when the program makes no reference to it - any different to calling builtin helper functions from 'compiler-rt' such as '__divdi3'?

Maybe not, but even if the standard does not require __divdi3, our implementation might. This is not a statement about the program itself, from the standard's perspective, but about the implementation. We could decide what our implementation always requires a __memcpy, that's just unnecessary and inconvenient because the standard says a memcpy needs to be provided regardless.

  -Hal