Proposal: A "Const tool" for clang

This is the first cut of a proposal for a refactoring tool built on clang.
I have tried to break out what I think are the important issues - but this is only a first cut.

The entire proposal is at <http://marshall.calepin.co/a-const-tool-for-llvm.html>, only because I think it's a lot easier to read with formatting.

Here's the first section:

== Motivation ==
Adapting an existing code base to use `const` can be a daunting, frustrating task. Simply adding "const" to a `Foo &` parameter can have repercussions throughout the code base, and may involve adding const to an arbitrary number of other parameters (of other functions), or, after making changes in several places, finding that you have to back all (or most) of your changes out.

Consider the code snippet:

    void ncfunc1 ( std::string &str ) { str.append ( ' ' ); }
    int cfunc1 ( std::string &str ) { return str.length(); }
    
    void function1 ( std::string &one, std::string& two ) {
      if ( cfunc1 ( one ) > 10 )
        ncfunc ( two );
      }

  * Changing the first parameter to function1 to be `const std::string &` also requires changing `cfunc1` to take a `const std::string &`

  * You cannot change the second parameter because it gets passed to `ncfunc1`, which cannot be modified to take a `const std::string &`, because it calls the non-const member function `append` using that object. (and `std::string::append(char)` cannot be made const, because it is part of the standard library)

Broadly speaking, there are three places where const can be added to an existing code base:
  * Parameters that are passed by reference or pointer can be passed by const pointer or const reference.
  * object methods can be marked as const
  * Return values that are returned by pointer or reference can be returned by const reference or const pointer. This may involve adding an additional function alongside the original, differing only in the return value.

Read the whole thing at: <http://marshall.calepin.co/a-const-tool-for-llvm.html>, and send me comments, please - on list or off.
If you know of tools that already do this, I'd love to know about them, too.

-- Marshall

Marshall Clow Idio Software <mailto:mclow.lists@gmail.com>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki

Local variables might require adding 'const', too. A dumb example:

int f(Z *z) {
  Z *a = z->getA();
  return a->getInt();
}

If there's a const and non-const overload of getA() and getInt(), then
one can actually add const everywhere in this example.

Dmitri

Batch
[...]
Repeat this process as long as progress ensues.

Have you done a ballpark estimate of how much resident memory this
would need for, say LLVM's codebase? Are you planning to do this
entirely in-core? If not, then how are you planning to make this
scale?

Interactive

Pick as specific function parameter or method and make it const.
This would involve checking all of the routines that it calls with that parameter, and making changes to fix their interfaces (if necessary).

This seems pretty ambitious. AFAIK nobody has even written a robust
and useful tool based on Clang that coherently lets you pick a
function and see all the call sites in a codebase (otherwise I would
be using it!). I would recommend starting with just that aspect. My
impression is that the new Modules functionality that dgregor is
working on will go a long way to making this easier, so pragmatically
your time might be best spent helping to move that feature along.

-- Sean Silva

> Batch
> [...]
> Repeat this process as long as progress ensues.

Have you done a ballpark estimate of how much resident memory this
would need for, say LLVM's codebase? Are you planning to do this
entirely in-core? If not, then how are you planning to make this
scale?

> Interactive
>
> Pick as specific function parameter or method and make it const.
> This would involve checking all of the routines that it calls with that
parameter, and making changes to fix their interfaces (if necessary).

This seems pretty ambitious. AFAIK nobody has even written a robust
and useful tool based on Clang that coherently lets you pick a
function and see all the call sites in a codebase (otherwise I would
be using it!). I would recommend starting with just that aspect. My

That is not entirely true. There might not be an open sourced tool, but
there are publicly available ones.
This is an example from Chromium's code search:
https://code.google.com/p/chromium/codesearch#chrome/src/base/memory/ref_counted.h&q=class:scoped_refptr&sq=package:chrome&type=cs&l=254
You can click for example on the operator T*() const and get a list where
this is called throughout the Chromium codebase.

Generally, we've also internally done transformations that are very similar
in scale and complexity on our internal ~100MLOC codebase. You can make
such things scale by doing MapReduce like algorithms, which first map all
possible function calls / parameters as keys with locations, and then
basically reduce the graph of const-dependencies. Since I've implemented
MapReduce style algorithms with standard unix tools (sort, cat) this also
works for smaller problems if you don't have large infrastructure at hand.

Cheers,
/Manuel

impression is that the new Modules functionality that dgregor is

Hi Marshall,

as you noticed I had an attempt to write such a tool. It is published, contributors are welcome. <https://github.com/rizsotto/Constantine> My goals were very similar what you are proposing here. I see these points are the major differences:

  • do topological sort on the call graph, and fix function signature in that order is a good idea, (got smaller problems to fix first.)
  • do rewrite the code instead of just print warning would be a great improvement,
  • write this tool separately from LLVM/Clang source tree.

And there is another interesting approach to solve the constness problem. <https://bitbucket.org/grrussel/constcpp/wiki/Home>

Regards,
Laszlo

DXR can, although it has a problem trying to output the results of clang due to the lexer tests breaking the internal tokenizer. And resolving virtual edges along the callgraph appears to have broken again (sigh). See <https://github.com/mozilla/dxr> for source.

As a side note, I've also been working on integrating documentation output (akin to Doxygen) with this tool.

I made http://code.woboq.org that lets you see all the use of a symbol in the
tooltips. (and also the documentation)

Nice example - thanks!

-- Marshall

Marshall Clow Idio Software <mailto:mclow.lists@gmail.com>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki

I guess my point was not that such a tool hasn't been made (I was
aware of the chromium code search, DXR, and woboq code browser
mentioned in this thread and thought about them when writing that
response), but that it would have to be implemented essentially from
scratch, whereas the proposal seems to talk about "find all calls" as
a primitive operation that is taken for granted, and the complexity of
which isn't mentioned in the proposal.

Also, the "interactive" aspect, e.g. "if the user says: Make the
second parameter to function1 const" really suggests integration into,
say, an editor (something like how clang-format can be called from
vim), and AFAIK Chromium Code Search, DXR, and woboq all treat the
source code as a static object not subject to modification, and are
not prepared for interactive, incremental modification.

I'm not trying to say that Marshall's tool isn't a good idea (I think
it's a great idea!), but I think that the development of a tool with
its aims is quite an endeavor and that Marshall should consider
breaking it up into reusable pieces (preferably integrated into e.g.
Tooling). For example, some way to effectively reason and search
across translation units (which is a basic presupposition of the tool
in Marshall's proposal) would be useful to a large number of other
tools; one example is the Static Analyzer, which could use this
functionality to do cross-TU analyses.

Having said that: Marshall, I would really like to see the proposal
beefed up to delineate the current infrastructure that this tool will
be built on, and what infrastructure will need to be developed to
enable the creation of the tool.

-- Sean Silva