Some df_iterator and po_iterator issues

Hi @llvm,

below is a copy of a message I sent to clang a hour before. I guess it's more appropriate here.

--snip--

While trying to eleminate as much std::tr1::function as possible I stumbled over a design flaw in llvm::df_iterator.
Consider the following code:

void for_all_stmts(Stmt* S, const std::tr1::function<void(Stmt*)>& fn)
{
  if (S)
  {
    fn(S);
    for (Stmt::child_iterator i = S->child_begin(), e = S->child_end(); i != e; ++i)
    {
      for_all_stmts(*i, fn);
    }
  }
}

In a first step I want to replace this with:

void for_all_stmts(Stmt* S, const std::tr1::function<void(Stmt*)>& fn)
{
  for (llvm::df_iterator<Stmt*> i = llvm::df_begin(S), e = llvm::df_end(S); i != e; ++i)
  {
    if (*i) fn(*i);
  }
}

However if fn replaces childrens of a just processed statement (which happens a lot), the iteration may crash. Looking at df_iterator reveals the reason: the first child of a particular statement is stored too early for this kind of usage. IMHO this could be fixed by delaying the computation of the first child until it's needed (that is in the preincrement operator). The only open question would be: how do we mark the children iterator invalid before its first use.

--snip--

I added two patches to this mail. The first one (dfi) is a proposal fixing the problem described in the cut-out above.
The second one (poi) have to be applied anyway, as it fixes a compile bug rendering po_ext_iterator currently unusable.
However this second issued occured as part of another problem: a child Stmt of a Stmt in a clang AST can be 0 (e.g. in for(;;)).
I could work around it by providing po_ext_iterator an external set of visited nodes and just inserted 0 beforehand. However I'm quite sure it would be a better idea to catch the 0 case in po_iterator and df_iterator at all. Otherwise using these classes become tedious very quickly.

Best
Olaf Krzikalla

dfi.patch (2.6 KB)

poi.patch (791 Bytes)

While trying to eleminate as much std::tr1::function as possible I stumbled over a design flaw in llvm::df_iterator.

Ok.

However if fn replaces childrens of a just processed statement (which happens a lot), the iteration may crash. Looking at df_iterator reveals the reason: the first child of a particular statement is stored too early for this kind of usage. IMHO this could be fixed by delaying the computation of the first child until it's needed (that is in the preincrement operator). The only open question would be: how do we mark the children iterator invalid before its first use.

This does sound like a useful use-case, but I am uncomfortable with making DepthFirstIterator any heavier (by adding "CurrentTopNode" to it), is there another way to solve this problem?

The second one (poi) have to be applied anyway, as it fixes a compile bug rendering po_ext_iterator currently unusable.

Thanks, applied!
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090706/080251.html

-Chris

Chris Lattner schrieb:

However if fn replaces childrens of a just processed statement (which happens a lot), the iteration may crash. Looking at df_iterator reveals the reason: the first child of a particular statement is stored too early for this kind of usage. IMHO this could be fixed by delaying the computation of the first child until it's needed (that is in the preincrement operator). The only open question would be: how do we mark the children iterator invalid before its first use.
    
This does sound like a useful use-case, but I am uncomfortable with making DepthFirstIterator any heavier (by adding "CurrentTopNode" to it), is there another way to solve this problem?
  

I've just brooded over the issue. I think the only way is to move CurrentTopNode back to VisitStack.
Indeed a very first version (which I have never published here) did exactly that and used GT::child_end
as a marker, that we are actually the "CurrentTopNode". Of course this never made it into production as even
GT::child_end might change when the children structure changes. That is we have no singular value for ChildItTy that could be used as a marker.
The only other place to store this information would be the pointer itself. And this is the only solution I can think of: we use a
llvm::PointerIntPair<Node*, 1> here instead of Node*. That is, line 74 would read:

std::vector<std::pair<llvm::PointerIntPair<Node*, 1> , ChildItTy> > VisitStack;

CurrentTopNode would disappear and I initialze ChildItTy with GT::child_end (or GT::child_begin, whatever), which however afterwards will never be read.
Accessing the pointer would become a (very) little bit slower due to the masking in PointerIntPair.
What do you think about it?

Best
Olaf Krzikalla

Chris,

attached you will find a patch implementing the approach I've posted yesterday. I don't want the topic fall by the wayside*.
*Note that this patch here contains the df_iterator changes only, some (IMHO non-critical but needed) po_iterator changes are still in the pipeline and posted by me in another thread. I hope, the current approach fits all needs.

Chris Lattner schrieb:

This does sound like a useful use-case, but I am uncomfortable with making DepthFirstIterator any heavier
  

BTW: in order to make DepthFirstIterator more lightweight wouldn't it be better to make the visit tracking an opt-in. Thus don't provide llvm::SmallPtrSet as a default for SetType but a simple stub class implementing count and insert only. I've never used the visit tracking option so far and it costs a lot in terms of speed and memory.

Best
Olaf Krzikalla

df_iterator_v2.patch (4.11 KB)

Hi @llvm,

I'm just curious about the state of my patches. Have they any chance to get applied?

Best
Olaf Krzikalla

This patch looks great to me, I applied it here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090720/082329.html

Thank you for working on this, I'm sorry for the delay reviewing the patch!

-Chris