RFC: Adding a rename refactoring tool to clang

Hi,

This is where clang-based refactoring tools are really going to shine.
Leveraging the *production quality* parser and resulting lossless AST
just puts you miles ahead in the game.

Have you evaluated MSVC+Visually Assist plugin? While it's not IntelliJ, I
have heard good things about it. I think that's a professional level
baseline which other tools could work towards matching/beating..

This is OT for the list, but you and others are welcome to follow-up with
me offlist for the refactoring stuff we are doing where I work.

The refactoring work I have experience with ends up relying on things
which go sorta outside the typical libclang tooling realm. You really have
a lot to track beyond just a single source file and dealing with that can
get tricky.

I have much experience with refactoring large C++ codebases, all depending
on WinAPI. My conclusion is: Visual Assist X seems to currently be the best
tool at the market. However, it is quite useless, since most - if not any -
refactoring-attempts at an over 20-year-old codebase with roughly 2 million
lines of Microsoft-infested C++ code (DWORD, BOOL, etc.) failed. Even
small, local refactorings take a) very long and b) mostly yield incorrect
results.
For example, there was a paradigm to have a nested "data"-class inside of
classes that manage the ui. They all share the same name, albeit being
nested inside of totally different classes. VA X failed miserably analysing
("Go to definition") and refactoring this.

BTW, this is similar to Microsoft's effort regarding Roslyn: They already
understood that there can be no sufficiently functioning refactoring tool
that does not actually use the compiler's parser itself.

If clang-rename will be able to handle the codebase I am working with, I
will be in sheer joy.

Greetings,
Daniel Albuschat

Hi Daniel, it would be interesting to hear if clang is able to parse/compile your codebase and where it fails if not.

In article <CA+Yaqnhj7Vu-NnMtt4JD96gpePawideni7GsOcWxwT9pgCaG4w@mail.gmail.com>,
    Daniel Albuschat <d.albuschat@gmail.com> writes:

For example, there was a paradigm to have a nested "data"-class inside of
classes that manage the ui. They all share the same name, albeit being
nested inside of totally different classes. VA X failed miserably analysing
("Go to definition") and refactoring this.

Yeah, this is the downside to an ad-hoc/heuristic based refactoring
tool. It doesn't know enough about the semantics/syntax of the
language to identify that all these classes with the same local
context name have a different global context and therefore all
different.

If clang-rename will be able to handle the codebase I am working with, I
will be in sheer joy.

Yes, this is why too am excited about clang-based refactoring tools
because all this ad-hoc/heuristic approach is just a dead-end for
real world code bases. Those huge legacy code bases are also the ones
most desperately in need of refactoring!

In article <
CAMfx8+eLTwdAbHw64L-O71hChrsKJKBOsgB-cuYaXN4xU7Qp5Q@mail.gmail.com>,
    Matthew Plant <mplant@google.com> writes:

> The code does not need to be compile-able, but it does need to be at
least
> some-what parseable.

In this regard, clang's rename won't be any worse than any other
language's rename.

For instance, refactoring tools are pretty mature in .NET/Java, but
none of them successfully rename an identifier whose type can't be parsed.
You're fundamentally missing the semantic information needed to decide
where else this identifier needs to be changed.

As far as C++ refactoring tools goes, *anything* based on clang's
parser is going to be light years ahead of other tools that are based
on ad-hoc homegrown parsers.

not to be argumentative, but in what ways is Clang's parser not ad-hoc and
homegrown? How many C++ parsers aren't in that category?

I've been kicking the tires of C++
refactoring tools since roughly 2007 when I started refactoring a crusty
old code base. Obviously the older tools use their own parser. Just
having a parser that gets the job done correctly is difficult, never
mind keeping that parser up to date with the C++11 and C++14
standards.

This is where clang-based refactoring tools are really going to shine.
Leveraging the *production quality* parser and resulting lossless AST
just puts you miles ahead in the game.

JetBeans has been promising:
http://www.jetbrains.com/resharper/features/cpp.html
It would be interesting to see the two of you sort it out on that kind of
field :slight_smile:

-- Gaby

not to be argumentative, but in what ways is Clang’s parser not ad-hoc and homegrown? How many C++ parsers aren’t in that category?

I think the intended distinction is that ad-hoc parsers try to discard as much information that is unrelated to the task at hand as possible.
For example, one could imagine an ad-hoc parser intended for indenting that simply determined how many unmatched curly braces there are.

As far as I can tell, ad-hoc, when applied to parses, is synonymous with “bad”

The distinction is primarily between tools that can actually parse C++
properly (such as Clang's parser, and a small number of others -- I'd
prefer not to argue about exactly which qualify as doing the job
"properly", given that none is bug-free) and the many, many homegrown,
ad-hoc tools that (by design) fall far short of that. As I'm sure
you're aware, parsing C++ robustly means handling overload resolution,
template instantiation, constexpr evaluation, and a pile of other
things that are too much work for most tool creators to do from
scratch. Typically they do something "good enough" for their
purposes, and where necessary sometimes even re-write their C++ source
code to avoid needing a more fully-featured parser.

To give an example: Doxygen has what might be termed an "ad-hoc,
homegrown parser" for C++, which for the most part worked well enough
to extract documentation and associated structure, but had
limitations. clang-format uses an ad-hoc parser (sometimes called a
"recognizer" to distinguish it from a tool that aims for formally
correct parsing). I've a feeling SWIG tries to parse C++ (not always
successfully).

-- James

In article <CAAiZkiANTEcWTBY2LkRPxyB7=ajjGy81x-zJToUC=sxtOX5cmQ@mail.gmail.com>,
    Gabriel Dos Reis <gdr@integrable-solutions.net> writes:

> As far as C++ refactoring tools goes, *anything* based on clang's
> parser is going to be light years ahead of other tools that are based
> on ad-hoc homegrown parsers.

not to be argumentative, but in what ways is Clang's parser not ad-hoc and
homegrown? How many C++ parsers aren't in that category?

Allow me to clarify my terminology:

"ad hoc" and heuristic based parsers are things that are based on
pattern matching, searching for matching names, etc., without actually
building any representation of the language. Clang's parser is (as I
understand it) a hand-written recursive descent parser that builds a
model of the source text that represents the syntactical constructs
present in the language. It may be "ad hoc" in the sense that it is
a hand-written parser and not generated using a compiler generator
like YACC, but it isn't "ad hoc" in the sense I was using above.

In the sense I'm using "ad hoc" parser, such parsers fail to properly
parse valid source text. Unless clang's C++11 support is vastly
over-advertised, I don't think you can say it falls into that
category.

JetBeans has been promising:
http://www.jetbrains.com/resharper/features/cpp.html

Ah, that's the link I've been looking for... I didn't know they were
going to bundle that with ReSharper; I thought it was going to be part
of their AppCode product.

I've been wanting to try this and I've been hearing good things about
it. But, you know what? It wouldn't surprise me at all if it was
built on top of clang :-).

I know if *I* were going to try and make a commercial refactoring tool
for C++ as a VS plug-in, that's where I'd start.

I'm going to email JetBrains and ask for access to this. If anyone
can make a kick-ass add-on for VS that "just works" it will be them.

In article <CAMfx8+eN_p7uyvPHKLOFUXcifnouB-jTm9ZM+XDbBm06qdjmrg@mail.gmail.com>,
    Matthew Plant <mplant@google.com> writes:

As far as I can tell, ad-hoc, when applied to parses, is synonymous with
"bad"

Yes, because the implication is that they mis-parse invalid constructs
as valid and mis-parse valid constructs as invalid or as other valid
constructs that have different meanings.

To use an archaic colloquiallism adapted to the current sitaution:
"That parser don't match."
<http://english.stackexchange.com/questions/52755/meaning-and-origin-of-that-dog-dont-hunt>