RFC: New function attribute HasInaccessibleState

Hi,

This email is in continuation to the mail thread http://lists.llvm.org/pipermail/llvm-dev/2015-December/092996.html, to propose a new function attribute that can convey that a function maintains state, but this state is inaccessible to the rest of the program under compilation.

Such a flag could be added to most libc/system calls such as printf/malloc/free. (libc and system calls do access/modify internal variables such as errno).

Example attributes (in addition to what are already set):

malloc/free: HasInaccessibleState, ReadNone

printf: HasInaccessibleState, ArgMemOnly

realloc: HasInaccessibleState, ReadOnly (not sure).

The intention behind introducing this attribute is to relax the conditions in GlobalsAA as below:

(this code is in GlobalsAAResult::AnalyzeCallGraph)

       if (F->isDeclaration()) {
         // Try to get mod/ref behaviour from function attributes.
-        if (F->doesNotAccessMemory()) {
+        if (F->doesNotAccessMemory() || F->onlyAccessesArgMemory()) {
           // Can't do better than that!
         } else if (F->onlyReadsMemory()) {
           FunctionEffect |= Ref;
           if (!F->isIntrinsic())
             // This function might call back into the module and read a global -
             // consider every global as possibly being read by this function.
             FR.MayReadAnyGlobal = true;
         } else {
           FunctionEffect |= ModRef;
           // Can't say anything useful unless it's an intrinsic - they don't
           // read or write global variables of the kind considered here.
           KnowNothing = !F->isIntrinsic();
         }
         continue;
       }

This relaxation allows functions that (transitively) call library functions (such as printf/malloc) to still maintain and propagate GlobalsAA info. In general, this adds more precision to the description of these functions.

Concerns regarding impact on other optimizations (I’m repeating a few examples that Hal mentioned earlier).

A readnone function is one whose output is a function only of its inputs, and if you have this:

int *x = malloc(4);
*x = 2;
int *y = malloc(4);
*y = 4;
you certainly don’t want EarlyCSE to replace the second call to malloc with the result of the first (which it will happily do if you mark malloc as readnone).

For malloc, even though ReadNone is set now (as proposed above), EarlyCSE should be taught to respect the HasInaccessibleState and not combine functions that maintain internal states. Similarly other optimizations (such as DCE) must be taught to respect the flag.

void foo(char * restrict s1, char * restrict s2) {
printf(s1);
printf(s2);
}

If printf is argmemonly, then we could interchange the two printf calls.

In this example too, printf maintains an internal state, preventing the calls from being internchanged. Also, it is now correct to add ArgMemOnly to printf as it does not access any other program memory.

For malloc this is still a problem, in the following sense, if we have:

p1 = malloc(really_big);

free(p1);

p2 = malloc(really_big);

free(p2);
allowing a transformation into:
p1 = malloc(really_big);
p2 = malloc(really_big);

free(p1); free(p2);
could be problematic.

Both free and malloc would be marked with having an internal state. This should prevent this kind of an optimization. Note that having the ReadNone attribute should not cause problems anymore.

Hi,

is this "internal state” supposed to be private to the function?

It could be private or not. Hence the name “inaccessible”, to mean that the program under compilation has no access to the state. So while printf and malloc (for example) could share state in libc, the program under compilation cannot access this state.

how this flag would prevent the last “optimization” you’re illustrating

Assuming you are referring to the quoted examples, currently these optimizations are not happening anyway (from what I understand). The issue is that, after malloc/free are tagged with “ReadNone”, such transforms may happen. Hence to prevent that, the additional flag denoting that these functions maintain an internal state.

Hi,

I don’t think the attribute as is is strong enough to do what you wish. “HasInaccessibleState” is in fact a no-op because it implies nothing about the accessible state. OK, there’s inaccessible state but is there or is there not accessible, visible state, is the question that optimizers need to ask.

So I’d rephrase it to something like “HasNoAccessibleState” ?

James

but is there or is there not accessible, visible state,

Wouldn’t ReadNone and/or ReadOnly cover that? If ReadNone is set, it means it doesn’t access any of the visible (accessible) states.

No, that’d be redefining the semantics of ReadNone. ReadNone allows elision of a call if its result is unused, which would break some “hasinaccessiblestate” functions (although not malloc).

From: "James Molloy via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Vaivaswatha Nagaraj" <vn@compilertree.com>
Cc: "LLVM Dev" <llvm-dev@lists.llvm.org>
Sent: Friday, December 4, 2015 4:00:24 AM
Subject: Re: [llvm-dev] RFC: New function attribute HasInaccessibleState

No, that'd be redefining the semantics of ReadNone. ReadNone allows
elision of a call if its result is unused, which would break some
"hasinaccessiblestate" functions (although not malloc).

To make a (perhaps incorrect) general statement: We currently only have 'additive' attributes, but no 'subtractive' ones (builtin/nobuiltin aside, as those are exactly paired). Having an attribute that subtracts from the strength of ReadNone would be a new concept in the design of our IR, and a change that I'd be hesitant to make. At the cost of some redundancy, I think a new attribute is needed.

-Hal

is this "internal state” supposed to be private to the function?

It could be private or not. Hence the name “inaccessible”, to mean that the program under compilation has no access to the state. So while printf and malloc (for example) could share state in libc, the program under compilation cannot access this state.

This is still not clear to me, you’re saying “it could be private or not”: what is a non-public state that no-one but you can access? (I’d call that private).

Now, from the point of view of the compiler, malloc and free are two separate functions, if you’re attribute is saying they have some internal state, then malloc() cannot access the state of free() and vice versa.

how this flag would prevent the last “optimization” you’re illustrating

Assuming you are referring to the quoted examples, currently these optimizations are not happening anyway (from what I understand). The issue is that, after malloc/free are tagged with “ReadNone”, such transforms may happen. Hence to prevent that, the additional flag denoting that these functions maintain an internal state.

I’m questioning why would this flag solve that, it does not seem to to me. It would prevent to swap two mallocs but not moving freely a malloc with respect to a free.

what is a non-public state that no-one but you can access? (I’d call that private).

malloc and free could both use global variables that are defined in libc, but are inaccessible to the program under compilation.

if you’re attribute is saying they have some internal state, then malloc() cannot access the state of free() and vice versa.

Which is why it would be preferable to call it “inaccessible” state rather than “internal”.

It would prevent to swap two mallocs but not moving freely a malloc with respect to a free.

No, it would also prevent interchanging the order of malloc and free, since they both maintain states (which can be shared, but not accessible to the program under compilation) and the swapping order could result in a different final state.

At the cost of some redundancy, I think a new attribute is needed.
@hal. I’m not sure what this implies. Does the semantics of the attribute in the first mail sound right to you?

that’d be redefining the semantics of ReadNone. ReadNone allows elision of a call if its result is unused,

@James. That right. Optimizations should hereafter (if the proposed attribute is accepted) be more careful in interpreting ReadNone. If the call also has HasInaccessibleState, it shouldn’t remove the call, even if the call takes no arguments or its return value isn’t, because it could be modifying some internal state.

If malloc and free can both use global variables (there is no notion of library in the compiler), then from what I understand you are actually creating another global state: i.e. there would be two disjoint global states: the usual default one and another one that is only accessed by function having this attribute.

there would be two disjoint global states

In some sense yes, but technically not disjoint. Functions marked with this attribute should still be able to access the globals within the program under compilation, if its not marked with ReadNone.

If malloc and free can both use global variables (there is no notion of library in the compiler)

Inaccessible state here refers to any global that is not visible to the program under compilation. The key idea (behind the new attribute) is to convey that these external functions do things inside that the compiler cannot know about, and hence deal with them conservatively.

Hi,

I’m still a bit dubious about this, I don’t think it’s bombproof. How does this fit with LTO? What if you had bitcode versions of your C library (not entirely crazy - it might allow a lot of LTO) - you’d collapse those two domains into one in a rather messy way.

This also seems a bit tailored to malloc/free, and can’t work for user-defined allocation functions. Our current attributes mechanism has the ability to infer noalias on such functions, so here you’ll be making malloc more powerful than user-defined functions.

All in all it just smells a bit specialist. I’d welcome it if we could bombproof the semantics and extend the scope somewhat somehow.

Cheers,

James

What if you had bitcode versions of your C library

Then the definitions for these functions would be available. Would we still set function attributes to these functions in FunctionAttrs.cpp if their definitions were available?

This also seems a bit tailored to malloc/free, and can’t work for user-defined allocation functions

I don’t think so. For example, printf would have the flag set, preventing two calls to printfs (without the return value being used) from being interchanged. In the case of user-defined allocation functions, the definitions for those functions are available, and what state they modify is directly visible. I’m not sure I understand how malloc alone would be more powerful. This point however reminds me to add, functions that transitively call functions with HasInaccessibleState must also have the flag set.

Then the definitions for these functions would be available. Would we still set function attributes to these functions in FunctionAttrs.cpp if their definitions were available?

Yes. Definitions being available should only increase the set of attributes that can be added to them, never decrease.

For example, printf would have the flag set, preventing two calls to printfs (without the return value being used) from being interchanged.

Printf() is a very nasty one because it can actually affect a lot of state. The %n modifier can cause an argument to be written to.

In the case of user-defined allocation functions, the definitions for those functions are available

Are they? probably not unless you’re in an LTO build.

James

That's in practice impossible to guarantee, both by the compiler and by the programmer.

-Krzysztof

I’d like to suggest a different direction which should accomplish similar ends.

I think it would make sense to introduce an attribute: csecandidate.

If we see a call-site “X” marked as csecandidate, it would imply that it can be replaced with any other call-site “Y” with the same arguments which dominate the call-site at “X”.

However, there are some cases that this would not be able to optimize. For example, if we have two csecandidates (“X” and “Y”) which do not dominate one another. This is solvable using another attribute: speculatable. This attribute would tell us that we can hoist the call out of arbitrary conditions. This would allow us to move our csecandidate call-site to the common dominator of “X” and “Y” and RAUW.

In the case of user-defined allocation functions, the definitions for those functions are available

Are they? probably not unless you’re in an LTO build.

Yes, I’m assuming an LTO build.

Printf() is a very nasty one because it can actually affect a lot of state. The %n modifier can cause an argument to be written to.

hence it would have HasInaccessibleState and ArgMemOnly set, but not ReadNone or ReadOnly set.

Yes. Definitions being available should only increase the set of attributes that can be added to them, never decrease.
I agree with that. But what I meant was, during a compilation invocation, we either have the definitions available or not. That means, the attributes we set once (or not set if definition isn’t available) does not change.

That’s in practice impossible to guarantee, both by the compiler and by the programmer.

I’m not sure I understand this. Why would it be impossible for the compiler to propagate this flag along the call graph upwards? As an example, malloc has the flag set, and this is propagated to whoever calls malloc, and then to whoever calls that function and so on.

Most of the time you don't have the entire call graph information. Imagine that you are developing a module that is a part of a larger project. Functions in the other modules would need to have their flags updated based on what your functions do.

If you want to have an attribute stating that a function behaves better than what the compiler would normally assume, then the callers may but don't have to have that attribute set. If you require that attribute for correctness, you will run into problems. The sentence "For example, printf would have the flag set, preventing two calls to printfs (without the return value being used) from being interchanged." and the rest of that paragraph suggests the latter.

-Krzysztof

From: "Vaivaswatha Nagaraj via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Krzysztof Parzyszek" <kparzysz@codeaurora.org>
Cc: "LLVM Dev" <llvm-dev@lists.llvm.org>
Sent: Friday, December 4, 2015 11:21:03 AM
Subject: Re: [llvm-dev] RFC: New function attribute
HasInaccessibleState

>> In the case of user-defined allocation functions, the definitions
>> for those functions are available

>Are they? probably not unless you're in an LTO build.

Yes, I'm assuming an LTO build.

The concerns around LTO here, while legitimate, apply only to a very-specific kind of LTO: An LTO which includes the definitions of the libc. This is actually quite tricky to support, semantically, and already breaks our malloc aliasing assumptions. There are many legitimate uses of LLVM, both for statically-compiled code and for JIT'd code, that depend on a visibility boundary between certain core runtime services and the user code being compiled to provide for effective optimization.

So, yes, this will break LTO when you include libc itself in the optimization process. We already don't support this (we'd need, at least, to adjust our malloc noalias assumptions, if not many other things). I don't think this is a major concern.

I think we need to go back and look at the underlying use case (as I understand it): GlobalAA should be able to figure out that calls to malloc/free don't touch global variables visible to the optimizer. How do we address this problem?

Thanks again,
Hal

...

Most of the time you don’t have the entire call graph information. Imagine that you are developing a module that is a part of a larger project.

I now understand the concern. It looks to me that we will need to set the flag by default to all functions whose definitions aren’t available (external), and then propagate from there on. I don’t see any optimizations being inhibited by such a setting, so it should be okay.

I think we need to go back and look at the underlying use case (as I understand it): GlobalAA should be able to figure out that calls to malloc/free don’t touch global variables visible to the optimizer. How do we address this problem?

Yes, this is the primary concern. Most libc functions (including printf, malloc, free) fall into the same category.