I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).
Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on
LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong).
There are known problems with running IWYU over LLVM/Clang -- Zachary
Turner made an attempt a while back to get it up and running. Since
the LLVM tree uses all sorts of modern and moderately complex
patterns, we're struggling to keep up.
If we also can tweak it a bit to make it choose more human-like (~more
conservative) decisions, we would be able to just apply what it suggests!
Different humans appear to have different preferences
It'd be great to hear more about specifics, maybe you can bring them
up on the IWYU list: Redirecting to Google Groups?
Some of IWYU's controversial changes are by design, but some are just
bugs, and if you have ideas for a more principled design, we
appreciate all the input we can get.
Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on
LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong).
There are known problems with running IWYU over LLVM/Clang – Zachary
Turner made an attempt a while back to get it up and running. Since
the LLVM tree uses all sorts of modern and moderately complex
patterns, we’re struggling to keep up.
I see.
If we also can tweak it a bit to make it choose more human-like (~more
conservative) decisions, we would be able to just apply what it suggests!
Different humans appear to have different preferences
True, what I meant hear is to make the changes more conservative: e.g. if we can replace #include “MyClass.h”
with
class MyClass;
then this change is probably desirable in every way: it documents the code better, it decreases coupling, it improves compile time.
But if for instance we suggest replacing it with #include “BaseClass1.h” // previously included from MyClass.h #include “ExtraStuffForMyClass.h”
then it’s less straight-forward, at least from the code-selfdocumentation point of view.
If we could make IWYU to only suggest changes of the first type, then we probably could’ve just blindly apply all the suggested changes.
There are known problems with running IWYU over LLVM/Clang – Zachary
Turner made an attempt a while back to get it up and running. Since
the LLVM tree uses all sorts of modern and moderately complex
patterns, we’re struggling to keep up
Re-implementing IWYU based on Clang tooling seems like it would help a lot with that kind of problem.
This is not a transform which is clearly "desirable in every way", because
it _increases_ coupling between the user of the class and the
implementation. The owner of the class can't add optional template
arguments, change a class into a typedef, change the namespace, or other
such refactorings. It also introduces the possibility of code which changes
behavior depending on whether the full or forward decl are available
(which, then, may be an ODR-violation).
Effectively the same reasons why the standard forbids users from
forward-declaring std:: names apply to doing so to user-defined names.
Of course you do have the upside is that it can improve compile time. Which
is certainly of value, and perhaps that's a worthwhile trade-off when the
implementation and forward-declare are both within a single project and
thus easy to coordinate. But, it's not by any means a _pure_ win.
That’s correct. I was speaking about the LLVM codebase though (I should’ve stated that clearer), and in LLVM I don’t remember many occasions of refactorings you mentioned. For LLVM forward declaration is recommended by the style guide: http://llvm.org/docs/CodingStandards.html#minimal-list-of-includes
I’m a little late to the party, but one observation that I haven’t seen mentioned is that simply removing #includes and testing that the program compiles is not guaranteed to be a correct transformation. Imagine, for example, that a header file provides an overload of a function that is a better match than one found elsewhere. It will compile either way, but without the #include, you will call a different function. Note that I’m not encouraging this kind of pattern, but the point is just to illustrate that this is not necessarily a correct transformation.
Another example is in the use of a macro definitions. Suppose a header defines certain macros, and other code uses ifdefs to test for the definition of those macros. If it is defined, one code path is taken, if it is not defined, another code path (which will compile but do the wrong thing) will be taken. Does the tool handle this?
Yes, that’s a valid concern, and that’s why I wanted to review all the changes before committing them (right now I’m going through all of them exactly to catch cases like you mention). It is actually happening sometimes, a good example is include of “config.h” - usually everything compiles even without it, but that’s not what we want.
My plan is to go through all the changes, remove undesired (like the ones you mentioned) and commit remaining changes in chunks (e.g. a folder at a time). Then people can do post-commit review, while the changes will get more exposure for testing. I hope it will go smoothly, but the time will show.
If there are any objections to my plan, I can put the patches for a pre-commit review first, though I consider them pretty safe.
Can you just compare CRC32s of the binaries before and after your changes? And then just don’t commit unless they’re the same. You’ll have to do this with debug info disabled, as white space changes will affect debug info.
That said, I’m not lgtm’ing this, just saying that if the community does decide this is a good plan, let’s make sure generated code is identical
Can you just compare CRC32s of the binaries before and after your changes? And then just don’t commit unless they’re the same. You’ll have to do this with debug info disabled, as white space changes will affect debug info.
No, I haven’t. Do you have any specific experiment in mind that you wanted to try?
When you configure clang with -DLLVM_ENABLE_MODULES=1 (like we do on the stage2 bots on green dragon) I suspect the win to be much smaller, since clang modules will eliminate most of the parsing overhead.
As an aside, there is a standard idiom used in some code bases which might be applicable. For each header “Header.h” which contains a class MyClass, you introduce a header called “Header.FwdDecls.h” which contains forward decls for all of the classes declared in the header. There are also variants which declare “MyClass.FwdDecl.h”. Both of these schemes have the advantage of only putting the forward decl in one place while reducing the transitive include set greatly. I’ve seen environments where these forward decl headers are automatically generated by the build system. Philip
Do you consider transitive includes as redundant? Why should we remove
these?
One drawback I see with removing these is that if I need `class Foo` but I
get it through `bar.h` which includes `foo.h`, it means that removing the
include to `foo.h` in `bar.h` would break the client of bar that relied on
this transitive includes.
That makes refactoring of a components harder since it may break some of
the clients of the components for no other reason than this transitive
include.
I suspect this is why IWYU would instead *add* these includes when they're
missing.
2017-12-09 12:54 GMT-08:00 Chris Lattner via llvm-dev <
llvm-dev@lists.llvm.org>:
Hi,
I tweaked my scripts to avoid removing includes when it doesn't give any
significant benefits, which made the patches significantly smaller. This
time the patches should not try to remove includes of header files, which
are transitively included from other included header files. The gains
mostly remained the same (plus/minus noise), the tables are in the end of
the email. I also included size of preprocessed files (measured in 1000
lines of code).
Do you consider transitive includes as redundant? Why should we remove
these?
One drawback I see with removing these is that if I need `class Foo` but I
get it through `bar.h` which includes `foo.h`, it means that removing the
include to `foo.h` in `bar.h` would break the client of bar that relied on
this transitive includes.
That makes refactoring of a components harder since it may break some of
the clients of the components for no other reason than this transitive
include.
I suspect this is why IWYU would instead *add* these includes when they're
missing.
+1, relying on transitive includes is fragile.
In our fork, a random #include of raw_ostream.h snuck into APInt.h at one
point. When I went to remove it (perhaps a few months after it was
introduced), there were already a handful of files that were relying on it
that were broken by my "remove debugging code that snuck in" change.
I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).
Do you consider transitive includes as redundant? Why should we remove these?
Hi Mehdi,
I don’t consider them redundant, and in the patches I’ve recently committed there shouldn’t be such removals - if you spot any, please let me know. All your arguments (as well as mentioned earlier by other people) are completely valid.
I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).