Where does LTO remove unused functions?

Hi!

LLVM newbie here, I was mainly working on the frontend so far. We had a small hackathon project idea to piggyback on LTO to detect dead code (unused functions). The basic idea is to compile the code for every target and dump the removed functions. Intersect the function symbol names for each target and those functions should be safe to remove from the source code (unless there were some configuration that was not compiled). Is this reasonable?

I started to play around with some toy examples and got stuck on the very beginning not being able to figure out where the unused functions are actually getting removed.

Here is what I did:
tu1.cpp:
int unused(int a);
int probably_inlined(int a);
int main(int argc, const char *argv[]) {
return probably_inlined(argc);
}

tu2.cpp:
int unused(int a) {
return a + 1;
}
int probably_inlined(int a) {
return a + 2;
}

I produced two object files with bitcode:
clang -c -flto tu1.cpp -o tu1.o

And I run LTO and attempted to dump the IR before each pass:
clang -O2 -Wl,-mllvm -Wl,-print-before-all tu1.o tu2.o -o optimized

In my dumps I saw the function unused removed even before the first pass. Where did that happen? I tried to invoke llvm-link manually and that did not remove unused.

I also tried to dump optimization remarks and was no trace of a function being removed (I only saw a function being inlined).

Thanks in advance,
Gabor

By default even regular LTO now has module summaries (like the kind used for ThinLTO). LTO will then run index based dead symbol analysis here:
http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#923. Then when linkRegularLTO is called here: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#935, it indicates that the index should be consulted for liveness, and that routine skips even adding the dead symbols to the Keep set. So they never make it into the combined module.

Teresa

Thanks!

I looked into this and printing the list of functions omitted from the combined module is pretty straightforward. My only problem is that I am not sure what is the most idiomatic way to surface this information for the user. I initially wanted to add an optimization remark, but optimization remarks are set up later on in the LTO backend (just before running opt).

Do you think it is ok to set the remarks up earlier or should I plumb the list of removed symbols and emit them later? Or is there another way to surface this information?

Thanks,
Gabor

I looked into this and printing the list of functions omitted from the combined module is pretty straightforward. My only problem is that I am not sure what is the most idiomatic way to surface this information for the user. I initially wanted to add an optimization remark, but optimization remarks are set up later on in the LTO backend (just before running opt).

Do you think it is ok to set the remarks up earlier or should I plumb the list of removed symbols and emit them later? Or is there another way to surface this information?

+Peter for thoughts.

That’s a good question. A few possibilities come to mind. You could add a custom internal option to LTO.cpp to emit the list of dead functions. If you do want to use the opt remarks infrastructure, which isn’t a bad idea, you will need to add the setup earlier as you noted. However, note that this is only possible for regular LTO, not any ThinLTO modules, since we don’t have those until later. What I would suggest in this case is to move the invocation of linkRegularLTO for modules with summaries into runRegularLTO and do the setup there, to minimize the handling along both LTO type paths. It looks like this should be doable.

Thanks, this approach did work! I created https://reviews.llvm.org/D73597 in case this is something we want to have upstream :slight_smile:

you can find unused functions easily using static analysis tools; codeql comes to mind:

https://help.semmle.com/QL/learn-ql/cpp/function-classes.html

import cpp

from Function f
where not exists(FunctionCall fc | fc.getTarget() = f)
select f, "This function is never called."