[RFC] add Function Attribute to disable optimization

Hi,

I previously made a proposal for adding a pragma for per-function
optimization level control due to a number of requests from our customers
(See http://comments.gmane.org/gmane.comp.compilers.clang.devel/28958 for
the previous discussion), however the discussion was inconclusive. Some
of my colleagues recently had the opportunity to discuss the proposal with
a number of people at and before the recent Bay Area social where it was
generally agreed that we should resubmit a new scaled-down proposal that
still addresses our users' primary use-case without introducing the
complexity of full per-function optimization level control at this time.

This proposal is to create a new function-level attribute which would tell
the compiler to not to perform any optimizing transformations on the
specified function.

The use-case is to be able to selectively disable optimizations when
debugging a small number of functions in a compilation unit to provide an
-O0-like quality of debugging in cases where compiling the whole unit at
anything less than full optimization would make the program run too
slowly. A useful secondary-effect of this feature would be to allow users
to temporarily work-around optimization bugs in LLVM without having to
reduce the optimization level for the whole compilation unit, however we
do not consider this the most important use-case.

Our suggestion for the name for this attribute is "optnone" which seems to
be in keeping with the existing "optsize" attribute, although it could
equally be called "noopt" or something else entirely. It would be exposed
to Clang users through __attribute__((optnone)) or [[optnone]].

I would like to discuss this proposal with the rest of the community to
share opinions and have feedback on this.

Hi Andrea,

This would be very useful.

I previously made a proposal for adding a pragma for per-function
optimization level control due to a number of requests from our customers
(See http://comments.gmane.org/gmane.comp.compilers.clang.devel/28958 for
the previous discussion), however the discussion was inconclusive. Some
of my colleagues recently had the opportunity to discuss the proposal with
a number of people at and before the recent Bay Area social where it was
generally agreed that we should resubmit a new scaled-down proposal that
still addresses our users' primary use-case without introducing the
complexity of full per-function optimization level control at this time.

Unlike the cited use case, I encounter two others in the field during
code reviews.

First is to ensure dead-writes are not removed. For example, a
function that zeroizes or wipes memory is subject to removal during
optimization. I often have to look at program's disassembly to ensure
the memset is not removed by the optimizer.

Second is to ensure that questionable code (undefined behavior) is not
removed. I see this most often in overflow checks when the programmer
uses an `int` rather than an `usigned int`. This would also likely
relieve the need for -fwrapv since we don't have to worry about the
compiler or optimizer dropping code.

There are a few ways to get around the issues, but the easiest would
be to apply per-function optimizations through instrumentation (such
as a pragma). Microsoft compilers already allow us to do the same.

Jeff

Wouldn’t implementing this proposal be a red herring? By this I mean, it is possible that
throughout the optimization phases, there is an implied assumption that all functions
are similarly optimized. An example would be under certain optimization flag, compiler changes
calling convention of static functions.

  • Fariborz

Wouldn’t implementing this proposal be a red herring? By this I mean, it is
possible that
throughout the optimization phases, there is an implied assumption that all
functions
are similarly optimized. An example would be under certain optimization
flag, compiler changes
calling convention of static functions.

Forgive my ignorance, but aren't all functions in the compilation unit
optimized at the same level? It does not matter if its a user supplied
command line or a makefile recipe.

Jeff

Wouldn’t implementing this proposal be a red herring? By this I mean, it is
possible that
throughout the optimization phases, there is an implied assumption that all
functions
are similarly optimized. An example would be under certain optimization
flag, compiler changes
calling convention of static functions.

Forgive my ignorance, but aren’t all functions in the compilation unit
optimized at the same level? It does not matter if its a user supplied
command line or a makefile recipe.

Yes. This proposal wants to change this.

  • Fariborz

Wouldn’t implementing this proposal be a red herring? By this I mean, it
is possible that
throughout the optimization phases, there is an implied assumption that
all functions
are similarly optimized.

There is no such intrinsic assumption.

The optimizer is already perfectly capable of handling these scenarios, and
they already come up. Imagine LTO of a two translation units with different
optimization levels. Also see the optsize attribute.

An example would be under certain optimization flag, compiler changes
calling convention of static functions.

This is always (and must always) done by a pass that looks at all of the
callers together. Thus, it will be able to (and must be taught) respect any
attribute on a function which is designed to turn off optimizations.

The only real challenge to this proposal is going through and teaching as
much of the optimizer as possible to respect this function attribute. I
don't see any really fundamentally problematic aspects to it.

Dropping opt level should not lead to ABI changes. Otherwise you won't
be able to mix-match O2 and O0 objects either.

David

Dropping opt level should not lead to ABI changes. Otherwise you won’t
be able to mix-match O2 and O0 objects either.

I was referring to “static functions”. Not that it happens, but something to consider.

  • Fariborz

Dropping opt level should not lead to ABI changes. Otherwise you won't
be able to mix-match O2 and O0 objects either.

I was referring to “static functions”. Not that it happens, but something to
consider.

Not address exposed static functions.

As Chandler mentions, this would be done by an IPA pass that looks at
all the callsites -- the enclosing function's opt level should not
matter here.

David

Andrea_DiBiagio@sn.scee.net wrote:

Hi,

I previously made a proposal for adding a pragma for per-function
optimization level control due to a number of requests from our customers
(See http://comments.gmane.org/gmane.comp.compilers.clang.devel/28958 for
the previous discussion), however the discussion was inconclusive. Some
of my colleagues recently had the opportunity to discuss the proposal with
a number of people at and before the recent Bay Area social where it was
generally agreed that we should resubmit a new scaled-down proposal that
still addresses our users' primary use-case without introducing the
complexity of full per-function optimization level control at this time.

This proposal is to create a new function-level attribute which would tell
the compiler to not to perform any optimizing transformations on the
specified function.

What about module passes? Do you want to disable all module passes in a TU which contains a single one of these? I'll be unhappy if we need to litter checks throughout the module passes that determine whether a given instruction is inside an unoptimizable function or not. Saying that module passes are exempt from checking the 'noopt' attribute is fine to me, but then somebody needs to know how to module passes (and users may be surprised to discover that adding such an annotation to one function will cause seemingly-unrelated functions to become less optimized).

Nick

Appropriate use of `volatile` is probably sufficient for this use case.

-- Sean Silva

That brings up a good point. As I understand it, volatile is
essentially implementation defined. What is Clang/LLVM's
interpretation?

Here's what I know. Microsoft's interpretation allows me to use
volatile for the situation under MSVC++ [1]. GCC's interpretation of
volatile is for memory mapped hardware, so it does not allow me to use
the qualifier to tame the optimizer [2].

Jeff

[1] http://msdn.microsoft.com/en-us/library/vstudio/12a04hfd(v=vs.90).aspx
[2] http://gcc.gnu.org/ml/gcc-help/2012-03/msg00242.html

clang doesn't treat volatile loads/stores as aquire/release barriers, if
that's what you're asking.

Actually, if you look at the 2012 version of the docs "
http://msdn.microsoft.com/en-us/library/vstudio/12a04hfd(v=vs.110).aspx",
you can see Microsoft was forced to change its own rules so as to not
completely screw over performance on non-X86 platforms.

-Eli

First is to ensure dead-writes are not removed. For example, a
function that zeroizes or wipes memory is subject to removal during
optimization. I often have to look at program's disassembly to ensure
the memset is not removed by the optimizer.

Appropriate use of `volatile` is probably sufficient for this use case.

That brings up a good point. As I understand it, volatile is
essentially implementation defined. What is Clang/LLVM's
interpretation?

Volatile has an explicit definition in C99/C11/C++03/C++11, and it's roughly the same in all of them. Volatile objects "may be modified in ways unknown to the
implementation or have other unknown side effects" (to quote C99), and the implementation must therefore preserve the accesses and their order even when optimizing.

Here's what I know. Microsoft's interpretation allows me to use
volatile for the situation under MSVC++ [1]. GCC's interpretation of
volatile is for memory mapped hardware, so it does not allow me to use
the qualifier to tame the optimizer [2].

Microsoft decided that they wanted to add additional semantics to volatile to make it enforce load/store barriers for multithreaded code. Since C11 and C++11 added explicit support for multithreaded atomic operations and memory models, those extra semantics are unnecessary; with the introduction of a processor with a relaxed memory model, it's also undesirable.

For as vague as your description is, volatile appears to be sufficient for your use case (it's incorrect with respect to multithreaded memory visibility issues, but it's no less correct than not optimizing the code).

`volatile` roughly means "this memory access may have side effects beyond
those of regular memory". Since the compiler can't reason about these
memory accesses, it is forced to carry them down to the assembly "as
written" without tampering with them. AFAIK `volatile` is sufficient for
making loads and stores unelidable (which was your original use case).

First is to ensure dead-writes are not removed. For example, a
function that zeroizes or wipes memory is subject to removal during
optimization. I often have to look at program's disassembly to ensure
the memset is not removed by the optimizer.

Appropriate use of `volatile` is probably sufficient for this use case.

That brings up a good point. As I understand it, volatile is
essentially implementation defined. What is Clang/LLVM's
interpretation?

Volatile has an explicit definition in C99/C11/C++03/C++11, and it's roughly
the same in all of them. Volatile objects "may be modified in ways unknown
to the
implementation or have other unknown side effects" (to quote C99), and the
implementation must therefore preserve the accesses and their order even
when optimizing.

OK, thanks. I must have been quoted something else on another list.
I'll try and locate the email.

Here's what I know. Microsoft's interpretation allows me to use
volatile for the situation under MSVC++ [1]. GCC's interpretation of
volatile is for memory mapped hardware, so it does not allow me to use
the qualifier to tame the optimizer [2].

Microsoft decided that they wanted to add additional semantics to volatile
to make it enforce load/store barriers for multithreaded code. Since C11 and
C++11 added explicit support for multithreaded atomic operations and memory
models, those extra semantics are unnecessary; with the introduction of a
processor with a relaxed memory model, it's also undesirable.

Microsoft was not the problem - it was GCC since the only use of
volatile is memory mapped hardware.

For as vague as your description is, volatile appears to be sufficient for
your use case (it's incorrect with respect to multithreaded memory
visibility issues, but it's no less correct than not optimizing the code).

Here was the sample I asked about some time ago. It was from a slide
deck at [1].

volatile void clean_memory(volatile void* dest, size_t len)
{
    volatile unsigned char* p;
    for(p = (volatile unsigned char*)dest; len; dest[--len] = 0)
      ;;
}

Because the pointers above ('dest' and `p`) were not memory mapped
addresses, the GCC folks consider it an abuse. The same is true for
this too:

static volatile void* g_dummy;

static void clean_memory(volatile void* dest, size_t len) {
    memset(dest, 0x00, len);
    g_dummy = dest;
}

Again, the GCC folks consider it an abuse since the memory is not
mapped from hardware.

Jeff

[1] http://www.slideshare.net/guanzhi/crypto-with-openssl

Here's what I found. It was from Andrew Haley, who I beleve is one of
the GCC devs.

GCC List >>> A good discussion on the subject can be found at
GCC List >>> http://gcc.gnu.org/ml/gcc-help/2012-03/msg00239.html.
GCC List >>>
GCC List >> The thread includes a discussion of Microsoft's and GCC's
GCC List >> interpretation of the keyword. The interpretations were so
different I
GCC List >> wondered if it was 'implementation defined' in the standard.
GCC List >
GCC List > Yes: 6.7.3 Type qualifiers, "What constitutes an access to an object
GCC List > that has volatile-qualified type is implementation-defined."

I'm pretty sure that's where I latched onto 'volatile' being
implementation defined. I tend to trust the opinions of Ian Lance
Taylor, Jonathan Wakely, and Andrew Haley when it comes to GCC. Andrew
cited the standard above.

Jeff

For GCC, I usually just look at what Linux does to coerce the compiler into
submission, since if it breaks, the GCC devs will catch a lot of heat from
the Linux developers. For example, one macro they have is the ACCESS_ONCE
macro <https://lwn.net/Articles/508991/> which is literally just a cast to
a volatile-qualified type.

-- Sean Silva

Microsoft was not the problem - it was GCC since the only use of volatile is memory mapped hardware.

That is a gross misrepresentation of volatile, I think. The intent of volatile is that it represents memory-mapped I/O registers. The practical effect of volatile is that it prohibits optimization of loads and stores--it makes sure that when you access that pointer, the appropriate LD or ST instruction is actually inserted in the output assembly code. I think you'd be hard-pressed to find a compiler writer who would disagree with that latter statement.

The real issue comes with multiple threads. Since C99 pretends that multiple threads don't exist, POSIX ducks the issue, and x86 has a strong memory model, it is very easy to fall into the trap that a volatile memory access is sufficient to guarantee cross-thread visibility. Since most discussion about volatile comes up with respect to this issue, the discussion mostly tends to be summarized as "don't use volatile for that because it's not intended for that; it's designed for memory-mapped I/O"

Another reason why it's untrue is found in §7.14.1: volatile sig_atomic_t variables are the only variables whose values are valid during execution of a signal handler. If you volatile as merely programmer intent of "make sure this store happens when I say it does," this is a very natural thing to expect from C.

volatile void clean_memory(volatile void* dest, size_t len)
{
     volatile unsigned char* p;
     for(p = (volatile unsigned char*)dest; len; dest[--len] = 0)
       ;;
}

Because the pointers above ('dest' and `p`) were not memory mapped
addresses, the GCC folks consider it an abuse.

Did you actively ask them this question, or are you surmising from what you've read on lists? For your use case--zeroing out memory so people can't sneak read your memory map [1]--I would consider it a valid use myself; I would be surprised if a compiler developer disagreed with this solely on the basis that it's not clearing out a memory mapped address space.

[1] The link has crypto in it, so I'm guessing as to its use from that context.

Microsoft was not the problem - it was GCC since the only use of volatile
is memory mapped hardware.

That is a gross misrepresentation of volatile, I think.

Well... "The volatile qualifier is designed for working with memory
mapped hardware,"
http://gcc.gnu.org/ml/gcc-help/2012-03/msg00242.html.

(For what its worth, I don't disagree with you. Its a PITA to work
around GCC at times when the code is otherwise portable).

...

volatile void clean_memory(volatile void* dest, size_t len)
{
     volatile unsigned char* p;
     for(p = (volatile unsigned char*)dest; len; dest[--len] = 0)
       ;;
}

Because the pointers above ('dest' and `p`) were not memory mapped
addresses, the GCC folks consider it an abuse.

Did you actively ask them this question, or are you surmising from what
you've read on lists?

Yes, it was actively asked. I lifted it a bit because I knew that
volatile was reserved for memory mapped addresses in GCC (from
previous discussions). This question asked how a function could be
volatile if the qualifier was reserved for memory mapped addresses.
http://gcc.gnu.org/ml/gcc-help/2013-03/msg00024.html.

For your use case--zeroing out memory so people can't
sneak read your memory map [1]--I would consider it a valid use myself; I
would be surprised if a compiler developer disagreed with this solely on the
basis that it's not clearing out a memory mapped address space.

Yes, its crypto. You should see they way OpenSSL handles it (see
OpenSSL_cleanse).

Jeff