Hi lto experts,
I am using bazel together with thinlto to build my program.
In my program, there is a.cc, b.cc and c.cc.
In the thin imports file of a.cc, there is b.cc.thinlto, c.cc.thinlto and a.cc.index.bc (.thinlto is the output of thincompile and .index.bc is the index file created by thinlink).
In two separate builds, b.cc.thinlto and c.cc.thinlto are the same. However, .index.bc has a slight difference. The number of blockcout changes. This causes the cache of a.cc.o (thinbackend) being invalidated even though there is no change to a.cc, b.cc and c.cc.
Two questions:
Why the blockcount could change between those two separate builds?
Could the behavior be fixed? So that the cache of a.cc.o could still be used if all its dependencies have no change.
BTW, besides a.cc, b.cc and c.cc, there are other source files in the program. Those files may have changed between those two builds. However, those files are not related to a.cc.
I don’t quite get why this patch would lead to non-determinism? That is it seems to count a number of blocks: either the count is broken and non-deterministic, or the number of blocks actually changes non-deterministically and the problem is only exposed by this patch.
Sorry for the a little bit misleading title.
The situation is that the change in an unrelated source will affect those unchanged source file, and cause the latter to be recompiled during thinbackend phase.
Think about this example: a.cc, b.cc and c.cc. a.cc depends on b.cc, but not on c.cc.
In c.cc, there is some change, and cause the number of total blocks (a.cc+b.cc+c.cc) to increase from 100 to 110. In the summary files of a.cc and b.cc, although a.cc and b.cc haven’t been changed, the total block count also changed from 100 to 110. And because bazel uses the summary file to calculate the hash value for cache, a.cc and b.cc also need to be re-processed in thinbackend.
I know the number of total blocks does take effect during thinbackend. However, this number doesn’t need to be accurate. If it could be rounded, e.g., from 110 to 100, then small changes like the example above won’t cause a.cc and b.cc to be re-processed, and the optimization in thinbackend shouldn’t be affected by a lot either.
From your description it seems than any non trivial change to a single source would force recompilation of the entire program / all the files in the thin backends? That seems … not good!
It isn’t clear to me if the patch author weighed this, at least I don’t see the tradeoff called out in the review?
I am looking right now. This is an unintended side effect of the change, which is used for only certain types of SamplePGO compiles with partial profiles. At the very least we should not emit this field in other case. Thinking about how else we could do this without this type of impact.
I am testing a patch that removes this except in the case of partial profiles, which is the only use, and which is a feature I believe is only in use at Google. I plan to send that for review today barring an unforeseen difficulty. I will also investigate a way to remove/replace it in the partial profile case as well, as it is definitely going to hurt incremental builds (my bad as I reviewed the patch that added it, and somehow missed the incremental build problem it introduces).
FYI mailed ⚙ D148746 [ThinLTO] Remove BlockCount for non partial sample profile builds for review yesterday to remove this for most ThinLTO builds. We’ll do some experimentation to figure out whether it can be completely removed or needs to be redesigned in a more incremental build friendly way for partial sample profiles.
@bo.yang.bd please see if this fixes your issues - which it should unless you are using partial sample profiles (for which we will need more time to investigate a fix).
That only adds the field to the index if it existed in the bitcode file, which after the earlier patch it should not (except for partial sample profiles where we still currently need it).
We use a distributed build system other than bazel, which has some issue with how the hash of artifacts is calculated. In short, the change of clang doesn’t trigger previous artifacts (generated by clang before disabling block count by default) to be re-generated. As a result, the file generated by thin compile still have the block count in it, and the combined index still accumulate them.
Is it possible to use the same flag in the previous change to disable the block count accumulation for the combined index?