Undeterministic thin index file

bo.yang.bd · April 14, 2023, 8:36pm

Hi lto experts,
I am using bazel together with thinlto to build my program.

In my program, there is a.cc, b.cc and c.cc.
In the thin imports file of a.cc, there is b.cc.thinlto, c.cc.thinlto and a.cc.index.bc (.thinlto is the output of thincompile and .index.bc is the index file created by thinlink).
In two separate builds, b.cc.thinlto and c.cc.thinlto are the same. However, .index.bc has a slight difference. The number of blockcout changes. This causes the cache of a.cc.o (thinbackend) being invalidated even though there is no change to a.cc, b.cc and c.cc.

Two questions:

Why the blockcount could change between those two separate builds?
Could the behavior be fixed? So that the cache of a.cc.o could still be used if all its dependencies have no change.

bo.yang.bd · April 14, 2023, 8:50pm

BTW, besides a.cc, b.cc and c.cc, there are other source files in the program. Those files may have changed between those two builds. However, those files are not related to a.cc.

bo.yang.bd · April 14, 2023, 10:30pm

github.com/llvm/llvm-project

[ThinLTO] Compute the basic block count across modules.

committed 05:33PM - 28 May 20 UTC

hjyamauchi

+77 -7

Summary: Count the per-module number of basic blocks when the module summary is …computed and sum them up during Thin LTO indexing. This is used to estimate the working set size under the partial sample PGO. This is split off of D79831. Reviewers: davidxl, espindola Subscribers: emaste, inglorion, hiraditya, MaskRay, steven_wu, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80403

Seems to be related to this change

bo.yang.bd · April 14, 2023, 10:57pm

Could this count be rounded? so that files that are not changed in adjacent two builds (with slight changes) could keep their cache valid.

bo.yang.bd · April 14, 2023, 10:58pm

e.g., 10001 and 10002 both rounded to 10000

mehdi_amini · April 18, 2023, 4:30pm

I don’t quite get why this patch would lead to non-determinism? That is it seems to count a number of blocks: either the count is broken and non-deterministic, or the number of blocks actually changes non-deterministically and the problem is only exposed by this patch.

bo.yang.bd · April 18, 2023, 6:37pm

Sorry for the a little bit misleading title.
The situation is that the change in an unrelated source will affect those unchanged source file, and cause the latter to be recompiled during thinbackend phase.

Think about this example: a.cc, b.cc and c.cc. a.cc depends on b.cc, but not on c.cc.
In c.cc, there is some change, and cause the number of total blocks (a.cc+b.cc+c.cc) to increase from 100 to 110. In the summary files of a.cc and b.cc, although a.cc and b.cc haven’t been changed, the total block count also changed from 100 to 110. And because bazel uses the summary file to calculate the hash value for cache, a.cc and b.cc also need to be re-processed in thinbackend.

I know the number of total blocks does take effect during thinbackend. However, this number doesn’t need to be accurate. If it could be rounded, e.g., from 110 to 100, then small changes like the example above won’t cause a.cc and b.cc to be re-processed, and the optimization in thinbackend shouldn’t be affected by a lot either.

mehdi_amini · April 18, 2023, 7:18pm

From your description it seems than any non trivial change to a single source would force recompilation of the entire program / all the files in the thin backends? That seems … not good!
It isn’t clear to me if the patch author weighed this, at least I don’t see the tradeoff called out in the review?

mehdi_amini · April 18, 2023, 7:20pm

@hjyamauchi , @davidxl, @teresajohnson ?

teresajohnson · April 18, 2023, 7:29pm

I am looking right now. This is an unintended side effect of the change, which is used for only certain types of SamplePGO compiles with partial profiles. At the very least we should not emit this field in other case. Thinking about how else we could do this without this type of impact.

teresajohnson · April 19, 2023, 5:00pm

I am testing a patch that removes this except in the case of partial profiles, which is the only use, and which is a feature I believe is only in use at Google. I plan to send that for review today barring an unforeseen difficulty. I will also investigate a way to remove/replace it in the partial profile case as well, as it is definitely going to hurt incremental builds (my bad as I reviewed the patch that added it, and somehow missed the incremental build problem it introduces).

teresajohnson · April 20, 2023, 1:43pm

FYI mailed ⚙ D148746 [ThinLTO] Remove BlockCount for non partial sample profile builds for review yesterday to remove this for most ThinLTO builds. We’ll do some experimentation to figure out whether it can be completely removed or needs to be redesigned in a more incremental build friendly way for partial sample profiles.

teresajohnson · April 20, 2023, 7:38pm

Committed at rG3adc6e03080c.

@bo.yang.bd please see if this fixes your issues - which it should unless you are using partial sample profiles (for which we will need more time to investigate a fix).

bo.yang.bd · April 24, 2023, 5:21pm

@teresajohnson that commit fixed our issue. Thanks for help!

bo.yang.bd · August 2, 2023, 5:55pm

@teresajohnson one question, https://github.com/llvm/llvm-project/blob/315939fd61cc346f699a2e4468c7880d48d4eb5f/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7557, does this also need to be disabled?

teresajohnson · August 2, 2023, 6:00pm

That only adds the field to the index if it existed in the bitcode file, which after the earlier patch it should not (except for partial sample profiles where we still currently need it).

bo.yang.bd · August 2, 2023, 10:17pm

We use a distributed build system other than bazel, which has some issue with how the hash of artifacts is calculated. In short, the change of clang doesn’t trigger previous artifacts (generated by clang before disabling block count by default) to be re-generated. As a result, the file generated by thin compile still have the block count in it, and the combined index still accumulate them.
Is it possible to use the same flag in the previous change to disable the block count accumulation for the combined index?

Topic		Replies	Views
How to use distributed thin lto？ IR & Optimizations lto , thinlto	13	1313	November 16, 2023
[ThinLTO] RFC: ThinLTO distributed backend interface LLVM Dev List Archives	4	130	May 6, 2016
getting nowhere with thinLTO LLVM Dev List Archives	1	137	November 9, 2017
exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline LLVM Dev List Archives	14	350	April 30, 2018
Is it possible to output the optimizations performed by thinlto IR & Optimizations lto , thinlto	1	252	November 29, 2023

Undeterministic thin index file

Related topics