[RFC] "Properly" Derive Function/Argument/Parameter Attributes

After I spend some time working with the function attribute* deduction
pass** [1,3], I would like to propose a "proper" organization***.

Why?

  Because we do not derive nearly as many attributes as we could****,
  while we do maintain various (separate and diffently organized)
  "data-flow-like analyses" to do so.

What else?

  I propose a single optimistic data-flow (fixpoint) loop to perform the
  deduction (similar to [2,3] but for all propagation forms and not only
  call sites -> callee). Given that a pessimistic initialization allows
  to terminate early, we can then also vary the compile time investment.

Where is the catch?

  It will require a substantial rewrite with (some) parts that cannot be
  split in small independent patches. While I believe these to be easy
  to review*****, I want to know if
    (1) there is interest in having better attribute deduction, and
    (2) there are volunteers to review such patches.

I do appreciate any input on this proposal.

Cheers,
  Johannes

scc.ll (6.6 KB)

nonnull.ll (1.33 KB)

Hi, Johannes,

I’m certainly in favor of rewriting the attribute deduction so that it’s more consistent and more powerful. There’s a lot more that we could do. I’ll help to review the patches.

Right now, as I recall, we apply an optimistic fixed-point optimization to each SCC, moving up the call graph from leaves to root. When you say that you want a more-general inference, besides just call sites → callees (and the existing callees → callers), can you please elaborate? Doing both of these at the same time as part of a optimistic fixed-point optimization seems to imply that the iteration will involve the entire call graph at once (or, at least, each connected component at once). Is that what you have in mind? This is certainly a true statement, but I don’t understand what you’re proposing. Are you saying that we should have both an optimistic and pessimistic mode? Thanks again, Hal

Hi Hal,

thanks for your interest and questions.

Hi, Johannes,

I'm certainly in favor of rewriting the attribute deduction so that it's
more consistent and more powerful. There's a lot more that we could do.
I'll help to review the patches.

Even if we only do it to clean up the code, it is probably worth it.
Deriving new/different kinds of attributes, e.g., "dereferencable" and
"align", is another good reason. Finally, I detailed how we currently
derive attributes (below) to highlight the traversal duplications and
pessimistic choices we currently make.

> After I spend some time working with the function attribute* deduction
> pass** [1,3], I would like to propose a "proper" organization***.
>
>
>
> Why?
>
> Because we do not derive nearly as many attributes as we could****,
> while we do maintain various (separate and diffently organized)
> "data-flow-like analyses" to do so.
>
>
>
> What else?
>
> I propose a single optimistic data-flow (fixpoint) loop to perform the
> deduction (similar to [2,3] but for all propagation forms and not only
> call sites -> callee).

Right now, as I recall, we apply an optimistic fixed-point optimization
to each SCC, moving up the call graph from leaves to root.

Before I describe what I plan to below, I want to elaborate on the
current situation. Let me try to summarize it in a bit more detail:

We have two inference passes post order (po) and reverse post order
(rpo) of the call graph (CG). The second is not very interesting as it
is neither actual rpo nor do we use it for anything but to detect
non-recursive functions based on non-recursive callers (which seems to
be a special situation but I was unable to find any details on that).
Now the po traversal does many things, separately, in the following order:

  - addArgumentReturnedAttrs() -- "returned" parameter attribute.
    Traversal of all blocks to find an argument that (1) has the same
    type as the return type, and (2) is returned by all "return"
    instructions. This is not optimistic and return values of other
    functions in the SCC will prevent the deduction.

  - addReadAttrs() -- "readonly"/"readnone" function attributes.
    Traverses all instructions in the SCC and optimistically determines
    "readonly"/"readnone" function attributes.

  - addArgumentAttrs() -- "nonnull"/"nocapture"/"readonly"/"readnone"
                          "writeonly" parameter attributes
    Traverse the uses of arguments through the SCC to determine
    "nocapture"/"readonly"/"readnone"/"writeonly" attributes.
    Additionally perform the callee -> call site deduction for "nonnull"
    attributes if the call site is for sure executed. Note that
    "returned" pointers (in the SCC) are assumed captured and therefore
    neither "readonly", "readnone" nor "writeonly" even though they do
    not outlive the function/SCC.

  - addNoAliasAttrs() -- "noalias" function return value
    Check for "malloc-like" SCCs by verifying that if all return values
    are "NULL", "undef" or "noalias" return values of calls.

  - addNonNullAttrs() -- "nonnull" function return value
    Check for "nonnull" return values by verifying all function return
    values are non-null. However, this is not optimistic. Returning
    the return value of another function in the SCC is considered
    potentially null.

  - removeConvergentAttrs() -- remove "convergent" function attribute
    Traverse all instructions in the SCC and remove the "convergent"
    attribute if all contained call sites are "non-convergent".

  - addNoRecurseAttrs() -- "norecurse" function attribute
    Traverse all instructions in a singleton SCC and mark it as "norecurse"
    if there is no self or potentially recursive call.

When you say that you want a more-general inference, besides just call
sites -> callees (and the existing callees -> callers), can you please
elaborate? Doing both of these at the same time as part of a
optimistic fixed-point optimization seems to imply that the iteration
will involve the entire call graph at once (or, at least, each
connected component at once). Is that what you have in mind?

There is certainly a large design space that allows variations. If we
properly traverse the SCCs in post order and optimistically derive all
attributes, it should already improve on the current state. However, the
callees -> call sites deduction requires to look into the callers of the
SCC as well, assuming all are known.

For the best result we probably need to combine three deduction methods:
  - If something is known for all call sites it is known for the callee.
  - If something is know for the callee it is known for all call sites.
  - If something can be derived from the code, e.g., guaranteed executed
    calls, it is known for the function.

One optimistic fixpoint iteration for the whole call graph is beneficial
if we can use information derived by one method for another one. This is
the case if we have domain knowledge, e.g., "read-only" annotations from
the user or the front-end on an interior node in the call graph. For
descriptive properties, like "non-null", this can also work very well.
If we expect additional attributes of the latter kind, or more
annotations through the front-ends/users, we should consider this
solution.

> Given that a pessimistic initialization allows to terminate early,
> we can then also vary the compile time investment.

This is certainly a true statement, but I don't understand what you're
proposing. Are you saying that we should have both an optimistic and
pessimistic mode?

I was just saying that we can easily use the same framework for both,
optimistic and pessimistic deduction:
  - If we want a fast result, or be able to stop the fixpoint iteration
    early, we start with a pessimistic state,
  - If we want the best possible result, we start with an optimistic
    state.

Does this make sense?

Cheers,
  Johannes