Who wants faster LLVM/Clang builds?

clattner · December 9, 2017, 8:54pm

Hi,

I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).

Just my opinion, but it is good to remove redundant header file includes, even if it doesn’t speed up the build.
http://llvm.org/docs/CodingStandards.html#minimal-list-of-includes

-Chris

kimgr · December 10, 2017, 3:39pm

Hi Michael,

Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on
LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong).

There are known problems with running IWYU over LLVM/Clang -- Zachary
Turner made an attempt a while back to get it up and running. Since
the LLVM tree uses all sorts of modern and moderately complex
patterns, we're struggling to keep up.

If we also can tweak it a bit to make it choose more human-like (~more
conservative) decisions, we would be able to just apply what it suggests!

Different humans appear to have different preferences

It'd be great to hear more about specifics, maybe you can bring them
up on the IWYU list:
Redirecting to Google Groups?

Some of IWYU's controversial changes are by design, but some are just
bugs, and if you have ideas for a more principled design, we
appreciate all the input we can get.

Thanks,
- Kim

Michael_Zolotukhin · December 11, 2017, 8:37pm

Hi Kim,

Hi Michael,

Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on
LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong).

There are known problems with running IWYU over LLVM/Clang – Zachary
Turner made an attempt a while back to get it up and running. Since
the LLVM tree uses all sorts of modern and moderately complex
patterns, we’re struggling to keep up.

I see.

If we also can tweak it a bit to make it choose more human-like (~more
conservative) decisions, we would be able to just apply what it suggests!

Different humans appear to have different preferences

True, what I meant hear is to make the changes more conservative: e.g. if we can replace
#include “MyClass.h”
with
class MyClass;
then this change is probably desirable in every way: it documents the code better, it decreases coupling, it improves compile time.

But if for instance we suggest replacing it with
#include “BaseClass1.h” // previously included from MyClass.h
#include “ExtraStuffForMyClass.h”
then it’s less straight-forward, at least from the code-selfdocumentation point of view.

If we could make IWYU to only suggest changes of the first type, then we probably could’ve just blindly apply all the suggested changes.

It’d be great to hear more about specifics, maybe you can bring them
up on the IWYU list:
https://groups.google.com/forum/#!forum/include-what-you-use?

Will do, thanks.

Michael

pogo59 · December 11, 2017, 10:36pm

There are known problems with running IWYU over LLVM/Clang – Zachary
Turner made an attempt a while back to get it up and running. Since
the LLVM tree uses all sorts of modern and moderately complex
patterns, we’re struggling to keep up

Re-implementing IWYU based on Clang tooling seems like it would help a lot with that kind of problem.

–paulr

chapuni · December 12, 2017, 3:14am

By the way, have you tried -fmodules ?

Takumi

Michael_Zolotukhin · December 12, 2017, 8:08pm

By the way, have you tried -fmodules ?

No, I haven’t. Do you have any specific experiment in mind that you wanted to try?

Thanks,
Michael

jyknight · December 12, 2017, 8:57pm

This is not a transform which is clearly "desirable in every way", because
it _increases_ coupling between the user of the class and the
implementation. The owner of the class can't add optional template
arguments, change a class into a typedef, change the namespace, or other
such refactorings. It also introduces the possibility of code which changes
behavior depending on whether the full or forward decl are available
(which, then, may be an ODR-violation).

Effectively the same reasons why the standard forbids users from
forward-declaring std:: names apply to doing so to user-defined names.

https://google.github.io/styleguide/cppguide.html#Forward_Declarations
lists some of the issues, and a recommendation not to do so.

Of course you do have the upside is that it can improve compile time. Which
is certainly of value, and perhaps that's a worthwhile trade-off when the
implementation and forward-declare are both within a single project and
thus easy to coordinate. But, it's not by any means a _pure_ win.

Michael_Zolotukhin · December 12, 2017, 9:38pm

That’s correct. I was speaking about the LLVM codebase though (I should’ve stated that clearer), and in LLVM I don’t remember many occasions of refactorings you mentioned. For LLVM forward declaration is recommended by the style guide:
http://llvm.org/docs/CodingStandards.html#minimal-list-of-includes

Thanks,
Michael

Zachary_Turner1 · December 13, 2017, 1:11am

I’m a little late to the party, but one observation that I haven’t seen mentioned is that simply removing #includes and testing that the program compiles is not guaranteed to be a correct transformation. Imagine, for example, that a header file provides an overload of a function that is a better match than one found elsewhere. It will compile either way, but without the #include, you will call a different function. Note that I’m not encouraging this kind of pattern, but the point is just to illustrate that this is not necessarily a correct transformation.

Another example is in the use of a macro definitions. Suppose a header defines certain macros, and other code uses ifdefs to test for the definition of those macros. If it is defined, one code path is taken, if it is not defined, another code path (which will compile but do the wrong thing) will be taken. Does the tool handle this?

Michael_Zolotukhin · December 13, 2017, 1:42am

Yes, that’s a valid concern, and that’s why I wanted to review all the changes before committing them (right now I’m going through all of them exactly to catch cases like you mention). It is actually happening sometimes, a good example is include of “config.h” - usually everything compiles even without it, but that’s not what we want.

My plan is to go through all the changes, remove undesired (like the ones you mentioned) and commit remaining changes in chunks (e.g. a folder at a time). Then people can do post-commit review, while the changes will get more exposure for testing. I hope it will go smoothly, but the time will show.

If there are any objections to my plan, I can put the patches for a pre-commit review first, though I consider them pretty safe.

Michael

Zachary_Turner1 · December 13, 2017, 1:47am

Can you just compare CRC32s of the binaries before and after your changes? And then just don’t commit unless they’re the same. You’ll have to do this with debug info disabled, as white space changes will affect debug info.

That said, I’m not lgtm’ing this, just saying that if the community does decide this is a good plan, let’s make sure generated code is identical

Michael_Zolotukhin · December 13, 2017, 1:53am

Can you just compare CRC32s of the binaries before and after your changes? And then just don’t commit unless they’re the same. You’ll have to do this with debug info disabled, as white space changes will affect debug info.

That’s a good idea, I’ll do that!

Thanks,
Michael

adrian.prantl · December 13, 2017, 7:35pm

By the way, have you tried -fmodules ?

No, I haven’t. Do you have any specific experiment in mind that you wanted to try?

When you configure clang with -DLLVM_ENABLE_MODULES=1 (like we do on the stage2 bots on green dragon) I suspect the win to be much smaller, since clang modules will eliminate most of the parsing overhead.

– adrian

preames · December 14, 2017, 8:04pm

As an aside, there is a standard idiom used in some code bases which might be applicable. For each header “Header.h” which contains a class MyClass, you introduce a header called “Header.FwdDecls.h” which contains forward decls for all of the classes declared in the header. There are also variants which declare “MyClass.FwdDecl.h”. Both of these schemes have the advantage of only putting the forward decl in one place while reducing the transitive include set greatly. I’ve seen environments where these forward decl headers are automatically generated by the build system. Philip

mehdi_amini · December 15, 2017, 6:58pm

Do you consider transitive includes as redundant? Why should we remove
these?
One drawback I see with removing these is that if I need `class Foo` but I
get it through `bar.h` which includes `foo.h`, it means that removing the
include to `foo.h` in `bar.h` would break the client of bar that relied on
this transitive includes.
That makes refactoring of a components harder since it may break some of
the clients of the components for no other reason than this transitive
include.
I suspect this is why IWYU would instead *add* these includes when they're
missing.

_sean_silva · December 15, 2017, 7:23pm

2017-12-09 12:54 GMT-08:00 Chris Lattner via llvm-dev <
llvm-dev@lists.llvm.org>:

Hi,

I tweaked my scripts to avoid removing includes when it doesn't give any
significant benefits, which made the patches significantly smaller. This
time the patches should not try to remove includes of header files, which
are transitively included from other included header files. The gains
mostly remained the same (plus/minus noise), the tables are in the end of
the email. I also included size of preprocessed files (measured in 1000
lines of code).

Just my opinion, but it is *good* to remove redundant header file
includes, even if it doesn’t speed up the build.
LLVM Coding Standards — LLVM 18.0.0git documentation

Do you consider transitive includes as redundant? Why should we remove
these?
One drawback I see with removing these is that if I need `class Foo` but I
get it through `bar.h` which includes `foo.h`, it means that removing the
include to `foo.h` in `bar.h` would break the client of bar that relied on
this transitive includes.
That makes refactoring of a components harder since it may break some of
the clients of the components for no other reason than this transitive
include.
I suspect this is why IWYU would instead *add* these includes when they're
missing.

+1, relying on transitive includes is fragile.

In our fork, a random #include of raw_ostream.h snuck into APInt.h at one
point. When I went to remove it (perhaps a few months after it was
introduced), there were already a handful of files that were relying on it
that were broken by my "remove debugging code that snuck in" change.

-- Sean Silva

Michael_Zolotukhin · December 15, 2017, 8:45pm

2017-12-09 12:54 GMT-08:00 Chris Lattner via llvm-dev <llvm-dev@lists.llvm.org>:

Hi,

I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).

Just my opinion, but it is good to remove redundant header file includes, even if it doesn’t speed up the build.
http://llvm.org/docs/CodingStandards.html#minimal-list-of-includes

Do you consider transitive includes as redundant? Why should we remove these?

Hi Mehdi,

I don’t consider them redundant, and in the patches I’ve recently committed there shouldn’t be such removals - if you spot any, please let me know. All your arguments (as well as mentioned earlier by other people) are completely valid.

Thanks,
Michael

Michael_Zolotukhin · December 15, 2017, 9:25pm

2017-12-09 12:54 GMT-08:00 Chris Lattner via llvm-dev <llvm-dev@lists.llvm.org>:

Hi,

I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).

Just my opinion, but it is good to remove redundant header file includes, even if it doesn’t speed up the build.
http://llvm.org/docs/CodingStandards.html#minimal-list-of-includes

Do you consider transitive includes as redundant? Why should we remove these?

Hi Mehdi,

I don’t consider them redundant, and in the patches I’ve recently committed there shouldn’t be such removals - if you spot any, please let me know.

For the record, they were committed in r320617-r320636 with a couple of follow-ups in r320645 and r320648. The clang part isn’t committed yet.

Michael

Topic		Replies	Views
llvm and clang are getting slower LLVM Dev List Archives	37	173	April 1, 2016
Who wants faster LLVM/Clang builds? LLVM Dev List Archives	1	85	December 7, 2017
too slow to build clang Clang Frontend	15	129	January 19, 2009
How do I improve time and optimization quality in LLVM projects? Clang Frontend riscv , llvm	1	185	July 15, 2024
llvm.org pre-built clang significantly slower than apple/xcode clang LLVM Dev List Archives	7	158	November 29, 2018

Who wants faster LLVM/Clang builds?

Related topics