thanks for your interest and questions.
I'm certainly in favor of rewriting the attribute deduction so that it's
more consistent and more powerful. There's a lot more that we could do.
I'll help to review the patches.
Even if we only do it to clean up the code, it is probably worth it.
Deriving new/different kinds of attributes, e.g., "dereferencable" and
"align", is another good reason. Finally, I detailed how we currently
derive attributes (below) to highlight the traversal duplications and
pessimistic choices we currently make.
> After I spend some time working with the function attribute* deduction
> pass** [1,3], I would like to propose a "proper" organization***.
> Because we do not derive nearly as many attributes as we could****,
> while we do maintain various (separate and diffently organized)
> "data-flow-like analyses" to do so.
> What else?
> I propose a single optimistic data-flow (fixpoint) loop to perform the
> deduction (similar to [2,3] but for all propagation forms and not only
> call sites -> callee).
Right now, as I recall, we apply an optimistic fixed-point optimization
to each SCC, moving up the call graph from leaves to root.
Before I describe what I plan to below, I want to elaborate on the
current situation. Let me try to summarize it in a bit more detail:
We have two inference passes post order (po) and reverse post order
(rpo) of the call graph (CG). The second is not very interesting as it
is neither actual rpo nor do we use it for anything but to detect
non-recursive functions based on non-recursive callers (which seems to
be a special situation but I was unable to find any details on that).
Now the po traversal does many things, separately, in the following order:
- addArgumentReturnedAttrs() -- "returned" parameter attribute.
Traversal of all blocks to find an argument that (1) has the same
type as the return type, and (2) is returned by all "return"
instructions. This is not optimistic and return values of other
functions in the SCC will prevent the deduction.
- addReadAttrs() -- "readonly"/"readnone" function attributes.
Traverses all instructions in the SCC and optimistically determines
"readonly"/"readnone" function attributes.
- addArgumentAttrs() -- "nonnull"/"nocapture"/"readonly"/"readnone"
"writeonly" parameter attributes
Traverse the uses of arguments through the SCC to determine
Additionally perform the callee -> call site deduction for "nonnull"
attributes if the call site is for sure executed. Note that
"returned" pointers (in the SCC) are assumed captured and therefore
neither "readonly", "readnone" nor "writeonly" even though they do
not outlive the function/SCC.
- addNoAliasAttrs() -- "noalias" function return value
Check for "malloc-like" SCCs by verifying that if all return values
are "NULL", "undef" or "noalias" return values of calls.
- addNonNullAttrs() -- "nonnull" function return value
Check for "nonnull" return values by verifying all function return
values are non-null. However, this is not optimistic. Returning
the return value of another function in the SCC is considered
- removeConvergentAttrs() -- remove "convergent" function attribute
Traverse all instructions in the SCC and remove the "convergent"
attribute if all contained call sites are "non-convergent".
- addNoRecurseAttrs() -- "norecurse" function attribute
Traverse all instructions in a singleton SCC and mark it as "norecurse"
if there is no self or potentially recursive call.
When you say that you want a more-general inference, besides just call
sites -> callees (and the existing callees -> callers), can you please
elaborate? Doing both of these at the same time as part of a
optimistic fixed-point optimization seems to imply that the iteration
will involve the entire call graph at once (or, at least, each
connected component at once). Is that what you have in mind?
There is certainly a large design space that allows variations. If we
properly traverse the SCCs in post order and optimistically derive all
attributes, it should already improve on the current state. However, the
callees -> call sites deduction requires to look into the callers of the
SCC as well, assuming all are known.
For the best result we probably need to combine three deduction methods:
- If something is known for all call sites it is known for the callee.
- If something is know for the callee it is known for all call sites.
- If something can be derived from the code, e.g., guaranteed executed
calls, it is known for the function.
One optimistic fixpoint iteration for the whole call graph is beneficial
if we can use information derived by one method for another one. This is
the case if we have domain knowledge, e.g., "read-only" annotations from
the user or the front-end on an interior node in the call graph. For
descriptive properties, like "non-null", this can also work very well.
If we expect additional attributes of the latter kind, or more
annotations through the front-ends/users, we should consider this
> Given that a pessimistic initialization allows to terminate early,
> we can then also vary the compile time investment.
This is certainly a true statement, but I don't understand what you're
proposing. Are you saying that we should have both an optimistic and
I was just saying that we can easily use the same framework for both,
optimistic and pessimistic deduction:
- If we want a fast result, or be able to stop the fixpoint iteration
early, we start with a pessimistic state,
- If we want the best possible result, we start with an optimistic
Does this make sense?