AST processing toolbox

Hi @clang,

now I'm quite finished with the very first part of my AST processing - function inlining at AST level.
One result of the development are a lot of AST processing tools that can become handy for others too - cloning, garbage collecting aso. Of course I'm eager to share these stuff. However there are some issues to solve before:
1. I use boost, in particular boost::bind and boost::function (that is, header only for the moment). Unless there is something similiar in llvm (which I didn't found yet), I won't like to change this. However this would add dependencies to boost and I don't know the opinions of the main developers regarding this issue.
2. Sometimes it seemed to me that having a parent member in a statement would be better then the ParentMap. Nevertheless I worked with ParentMap but needed to add ParentMap::addStmt, which adds and/or updates the parent/child relations of the given Stmt tree. However I'm not really satisfied with the function name yet.
3. I'd like to extend the PrintingPolicy so we can rewrite in other styles too, e.g.

if (x)
{
  y;
}

Also there are some fixes needed: I already mentioned the DoStmt bug. A second thing striking me regulary is the wrong indentation:
voif foo()
{
  //...
  { // here reprinting starts;
}
  //... here it ends
}

Maybe this issue goes away if I stop using Rewriter::ReplaceStmt, which I need to do anyway. I have to test this.

For all the new stuff: where to put in? Or is it better to create a separate ASTProcessing library?

Best,
Olaf Krzikalla

I'm too new around here to know Clang policy wrt the C++ standard library,
but I notice that both the libraries you mention are part of TR1, so it
might be better to use the TR1 names for those components instead?

Note that Boost has a pretty good TR1 configuration for those that would
be using Boost rather than their compiler's TR1. Both GCC and MSVC ship
with TR1 now, although I'm not sure what the state of TR1 was in GCC 4.2.
Other vendors are doing the same, although I presume GCC/MSVC cover most
of our builds at the moment.

There is a bonus here for those building with a GCC 4.3 or later - the
GCC TR1 implementation is based on variadic templates which should show
as a minor compile-time optimization.

Also, if we ever transition to the C++0x standard library (and I would
love to do this for unique_ptr alone!) then all we need to do is remove
the tr1 qualification.

Actually, if we set a precedent for TR1 we get shared_ptr, array and
tuple along for the ride, which would be a nice bonus. Not sure the
other TR1 components buy us much in this project though (regex, random
numbers, hash containers [vs. llvm], special math, type traits and C99
standard library - which I think we assume is already in std?)

AlisdairM

In general, we would prefer to not suck in pieces of external libraries unless they significantly add value to the development of LLVM/Clang. This is both for portability as well as maintainability. FWIW, the discussion on adding pieces of Boost to LLVM/Clang has been discussed before, and unless there is a compelling need to pull in a specific piece of Boost it is something we won't do for the reasons I've given. This has nothing to do with Boost per se; it's really about keeping the codebase simple and free of unneeded, extraneous dependencies on other libraries. Please keep in mind that we've managed to build an entire working compiler without these libraries.

My thought here was not so much to adopt (elements of) Boost, but of
std::tr1. Most modern C++ compilers ship with TR1, and Boost provide
a freely available 3rd party solution for older compilers the vendor
no longer supports. In principle though, this is merely another part
of your standard library now, lurking in namespace tr1 ahead of
formal standardisation in C++0x.

Now that might still be more than we want to depend on - as you say
we have come this far without it. I believe it offers significant
value though, if it is something we can assume from a 'standard'
C++ installation.

AlisdairM

I think even depending on TR1 might be too high (at least at this point), as we want Clang to compile on as many platforms as possible that have a "reasonable" C++ compiler. I could be wrong, but I don't think we want people to have to install Boost in order to compile Clang if they don't have an available TR1 solution.

Another possibility, when Clang eventually has the capability to bootstrap itself, libraries not part of the core compiler could be compiled using a bootstrapped Clang and use whatever C++ features Clang supports.

That said, there is an ongoing discussion on which C++ features we should actually use in LLVM/Clang. At the end of the day, not everyone who contributes to LLVM/Clang is a C++ guru, and using esoteric C++ features may actually be more of a detriment than a blessing as it may scare off potential contributors. So when we look at whether or not to pull in library X or language feature Y into the code base, we need to consider the tradeoffs both in terms of technical benefits (which may be marginal) versus (a) the approachability and readability of the code base and (b) the portability of the code base. We aren't luddites, however, as we do indeed use some of the more "specialized" features of C++ in LLVM/Clang, but they are buried deep in the code and help provide fundamental infrastructure instead of being used as part of the basic APIs.

Right. Another option is to pull in and "llvm-ize" libraries that are really useful. For example, llvm/ADT/OwningPtr.h is a simple smart pointer that is very similar to boost and other standard libraries.

There are several goals here, but the goal is to keep the code simple, self contained, and portable across a wide variety of compilers. TR1 may be getting widespread enough that it might be reasonable to start depending on it.

-Chris

now I'm quite finished with the very first part of my AST processing -
function inlining at AST level.
One result of the development are a lot of AST processing tools that can
become handy for others too - cloning, garbage collecting aso. Of course
I'm eager to share these stuff. However there are some issues to solve
before:

Cool.

1. I use boost, in particular boost::bind and boost::function (that is,

discussed elsewhere :). We do have a simple llvm::tie function in llvm/ADT/STLExtras.h if you want, this allows you to write code like:

int x, y;
tie(x, y) = my_fn_returning_pair_of_ints();

However, there is a bigger issue here: in the absence of c++ lambdas, c++ code that uses a highly functional style is often very difficult to read. Are you sure that using these actually helps improve the readability of your code?

2. Sometimes it seemed to me that having a parent member in a statement
would be better then the ParentMap. Nevertheless I worked with ParentMap
but needed to add ParentMap::addStmt, which adds and/or updates the
parent/child relations of the given Stmt tree. However I'm not really
satisfied with the function name yet.

Improving parentmap sounds great! Feel free to propose this as a patch independent of your other changes.

3. I'd like to extend the PrintingPolicy so we can rewrite in other
styles too, e.g.

This seems reasonable.

For all the new stuff: where to put in? Or is it better to create a
separate ASTProcessing library?

It is best to split up your changes into logically independent ones: different pieces probably go in different places. Thanks for working on this!

-Chris

From: Ted Kremenek [mailto:kremenek@apple.com]
Sent: 12 June 2009 21:20
To: AlisdairM(public)
Cc: cfe-dev@cs.uiuc.edu
Subject: Re: [cfe-dev] AST processing toolbox

I think even depending on TR1 might be too high (at least at this
point), as we want Clang to compile on as many platforms as possible
that have a "reasonable" C++ compiler. I could be wrong, but I don't
think we want people to have to install Boost in order to compile
Clang if they don't have an available TR1 solution.

OK, this is a good starting point?
Do we have a reasonable list of targets we would expect to build
Clang on? This would give us a chance to survey and see if TR1
support really is an issue.

Likewise, Boost can be a big and bulky dependency to install, so
agree requiring it where not absolutely necessary would be a big
barrier to the more casual user. What if we had a cut-down
Boost::TR1 distribution? If we have a known set of target platforms
without native TR1 support we know which platforms to focus on
for specific 'one-click' installers.

Not that I want to force TR1 on the community! But it would be
good to know if making it available is a realistic possibility,
and that starts with knowing our target environment.

Another possibility, when Clang eventually has the capability to
bootstrap itself, libraries not part of the core compiler could be
compiled using a bootstrapped Clang and use whatever C++ features
Clang supports.

I hope this means what I think it means for future C++0x support ;¬)

That said, there is an ongoing discussion on which C++ features we
should actually use in LLVM/Clang. At the end of the day, not
everyone who contributes to LLVM/Clang is a C++ guru, and using
esoteric C++ features may actually be more of a detriment than a
blessing as it may scare off potential contributors. So when we look
at whether or not to pull in library X or language feature Y into the
code base, we need to consider the tradeoffs both in terms of
technical benefits (which may be marginal) versus (a) the
approachability and readability of the code base and (b) the
portability of the code base. We aren't luddites, however, as we do
indeed use some of the more "specialized" features of C++ in LLVM/
Clang, but they are buried deep in the code and help provide
fundamental infrastructure instead of being used as part of the basic
APIs.

Well, some of the trickier parts of TR1 are exploiting guru-level
Implementations to deliver simpler end-user interfaces. tr1::function
is probably the best example of this. Likewise, shared_ptr has a very
simple interface but is an extremely powerful component. Conversely,
while I find tr1::bind invaluable myself, reaction from colleagues in
the past suggests it is a guru-level API. The functional idiom is
simply not well enough understood by the 'average' C++ developer.

Now if LLVM is already providing equivalents for some of these features
it would be helpful to have documentation summarising and pointing us
in the right direction. So far I have only stumbled over OwningPtr
which seems to be an attempt to fix auto_ptr, more like C++0x
unique_ptr than shared_ptr. It is quite likely I have missed more
though! (I believe we use LLVM supplied hashing containers?)

AlisdairM

See my earlier reply <g>
What would be most useful here is some small developer document pointing
out facilities are available through LLVM that we might find familiar.
Smart pointers, containers, other basic utilities, etc.

I have no problems with this approach, TR1 merely gives us some valuable
and reasonably well understood APIs. Nothing against providing self
hosted implementations if that is what is needed, at least for the
simpler components. I don't particularly fancy writing a pre-variadic
template version of tr1::function myself ;¬)

AlisdairM

Pulling back to cfe-dev.

Hi AlisdairM,

For the moment, I’d rather the discussion center on what technical problem you are trying to solve rather on whether or not we can depend on TR1. I’m not yet convinced we have a technical need. And, as Chris said in his other email, sometimes having an LLVM-ized library that just cherry picks out the feature you want from TR1/Boost may be sufficient. Without a technical need we’re just adding dependencies for no reason.

From: Ted Kremenek [mailto:kremenek@apple.com]

Sent: 12 June 2009 21:20

To: AlisdairM(public)

Cc: cfe-dev@cs.uiuc.edu

Subject: Re: [cfe-dev] AST processing toolbox

I think even depending on TR1 might be too high (at least at this

point), as we want Clang to compile on as many platforms as possible

that have a “reasonable” C++ compiler. I could be wrong, but I don’t

think we want people to have to install Boost in order to compile

Clang if they don’t have an available TR1 solution.

OK, this is a good starting point?
Do we have a reasonable list of targets we would expect to build
Clang on? This would give us a chance to survey and see if TR1
support really is an issue.

Not really. We want to encourage as broad as possible adoption at this point. We certainly want Clang to be easily available (among others) on the major Linux and BSD varieties, but there is also Windows support to consider. Having support on Solaris is also something we don’t want to flatly rule out.

Likewise, Boost can be a big and bulky dependency to install, so
agree requiring it where not absolutely necessary would be a big
barrier to the more casual user. What if we had a cut-down
Boost::TR1 distribution? If we have a known set of target platforms
without native TR1 support we know which platforms to focus on
for specific ‘one-click’ installers.

I think the consensus is that we rather cherry pick features as we need them, and pull them into LLVM-ized libraries. That makes us really decide whether or not we need some special feature in the first place, rather than bloating the codebase.

Not that I want to force TR1 on the community! But it would be
good to know if making it available is a realistic possibility,
and that starts with knowing our target environment.

Understood.

Another possibility, when Clang eventually has the capability to

bootstrap itself, libraries not part of the core compiler could be

compiled using a bootstrapped Clang and use whatever C++ features

Clang supports.

I hope this means what I think it means for future C++0x support ;¬)

That is certainly a long term goal.

I realized that a problem with this suggestion is that it only allows development of Clang on machines where LLVM supports codegen.

That said, there is an ongoing discussion on which C++ features we

should actually use in LLVM/Clang. At the end of the day, not

everyone who contributes to LLVM/Clang is a C++ guru, and using

esoteric C++ features may actually be more of a detriment than a

blessing as it may scare off potential contributors. So when we look

at whether or not to pull in library X or language feature Y into the

code base, we need to consider the tradeoffs both in terms of

technical benefits (which may be marginal) versus (a) the

approachability and readability of the code base and (b) the

portability of the code base. We aren’t luddites, however, as we do

indeed use some of the more “specialized” features of C++ in LLVM/

Clang, but they are buried deep in the code and help provide

fundamental infrastructure instead of being used as part of the basic

APIs.

Well, some of the trickier parts of TR1 are exploiting guru-level
Implementations to deliver simpler end-user interfaces.

Absolutely.

tr1::function
is probably the best example of this. Likewise, shared_ptr has a very
simple interface but is an extremely powerful component. Conversely,
while I find tr1::bind invaluable myself, reaction from colleagues in
the past suggests it is a guru-level API.

That is my feeling as well.

The functional idiom is
simply not well enough understood by the ‘average’ C++ developer.

I’m not certain if the functional idiom is the problem, but I’m not going to digress on that point. We’re very much open to different “algorithmic” approaches, including a functional perspective, as certain approaches lend themselves elegantly to solving specific problems.

Now if LLVM is already providing equivalents for some of these features
it would be helpful to have documentation summarising and pointing us
in the right direction. So far I have only stumbled over OwningPtr
which seems to be an attempt to fix auto_ptr, more like C++0x
unique_ptr than shared_ptr. It is quite likely I have missed more
though! (I believe we use LLVM supplied hashing containers?)

Yes, most of such support libraries are in llvm/include/ADT, but there is also llvm/include/Support and llvm/include/System.

From: Chris Lattner [mailto:clattner@apple.com]

Sent: 12 June 2009 21:41

To: Ted Kremenek

Cc: AlisdairM (public); cfe-dev@cs.uiuc.edu

Subject: Re: [cfe-dev] AST processing toolbox

Right. Another option is to pull in and “llvm-ize” libraries that are

really useful. For example, llvm/ADT/OwningPtr.h is a simple smart

pointer that is very similar to boost and other standard libraries.

See my earlier reply
What would be most useful here is some small developer document pointing
out facilities are available through LLVM that we might find familiar.
Smart pointers, containers, other basic utilities, etc.

Agreed, throughout LLVM/Clang the documentation can be greatly improved. We encourage everyone to submit patches for documentation where they feel it is lacking.

I have no problems with this approach, TR1 merely gives us some valuable
and reasonably well understood APIs. Nothing against providing self
hosted implementations if that is what is needed, at least for the
simpler components. I don’t particularly fancy writing a pre-variadic
template version of tr1::function myself ;¬)

Sounds good.

More specifically, LLVM has DenseMap/DenseSet, FoldingSet, and StringMap, all of which are hashtable implementations with specific uses and performance characteristics. DenseMap is used widely, and performs far better than std::hashmap.

Some more explanation here:
http://llvm.org/docs/ProgrammersManual.html#datastructure

-Chris

So then, clang isn’t going to be a drop-in replacement for g++ until
there are “enough” other compilers that implement the more
esoteric parts of C++?

What do the requirements on the Clang codebase have to do with its
suitability as a GCC drop-in replacement? I think you're misunderstanding
something here.

Sebastian

Hi Brian,

I’m not certain how that statement follows from anything that I said. Could you please clarify? Our intent is to develop a C++ compiler that could be widely used.

Right, we want Clang to be able to build all the insane C++ constructs in the world, but we don't want it to be *written* using them. :slight_smile: The goal is to keep the LLVM/Clang code base simple, readable, and portable.

-Chris

So much for my speed reading being a good way to parse the mail list.

Thanks for having the patience to explain my mistake, Chris. :slight_smile:

Hi @all,

Ted Kremenek schrieb:

In general, we would prefer to not suck in pieces of external libraries unless they significantly add value to the development of LLVM/Clang. This is both for portability as well as maintainability. FWIW, the discussion on adding pieces of Boost to LLVM/Clang has been discussed before, and unless there is a compelling need to pull in a specific piece of Boost it is something we won't do for the reasons I've given. This has nothing to do with Boost per se; it's really about keeping the codebase simple and free of unneeded, extraneous dependencies on other libraries. Please keep in mind that we've managed to build an entire working compiler without these libraries

after beeing back from weekend it looks like I have stirred up a hornet's nest. As Alisdair already pointed out I don't depend on boost anymore. In turn I now depend on std::tr1 and the fancy features of std::tr1::function. And after a quick view through my code I came to the conclusion that I don't want to miss these features. However you should note that only my AST processing depends on tr1. No tr1 dependency is injected in existing modules. That said I don't like the argument about the usage of 'insane' C++ constructs only. Whether std::tr1::function is still insane I don't want to justify. But IMHO everone working on AST processing should have a basic understanding of lambda calculus and all that stuff around.

Best
Olaf Krzikalla