[Clang] [lld] [llvm-link] Whole program / dead-code optimization

Hi All,

After the initial learning curve, we’re excited to have put together a completely gcc/binutils-free toolchain based on LLVM. Now that we have things working, we desperately need to optimize the resulting binaries. Our bin files are up to 10x their fully optimized gcc equivalent (1.5k vs 16k). This is for a bare metal ARM based system so this is significant.

We’re using lld for linking and the following dead code elimination techniques seem to be dead ends:

  1. whole program optimization on our most egregious space waster (-fwhole-program not supported by clang)
  2. link time optimization (looks like this is only supported by lld for the COFF path not the ELF path)
  3. using a linker plugin like gold (-fuse-linker-plugin doesn’t seem to be supported by clang)

We have control over the whole codebase and could essentially compile all of our C/C++ code as single file if there was a way to tell clang that it is seeing the whole program.

Any thoughts on how we could achieve this? This slidedeck suggests using llvm-link to accomplish this: http://llvm.org/devmtg/2013-11/slides/Gao-LTO.pdf. Is this the most promising way forward?

Thanks,
Ed

Is there a reason why LLVM’s link-time optimization won’t work for you? Regards, John Criswell

Is there a reason why LLVM's link-time optimization won't work for you?

http://llvm.org/docs/GoldPlugin.html
http://llvm.org/docs/LinkTimeOptimization.html

Well the primary motivation to move to LLVM is licensing which is why we
also ditched binutils since we can't package gcc for iOS due to the GPL.
So in the end the gold plugin wouldn't work for licensing reasons even if
we can get it to work technically but thanks for the links I'm still trying
to wrap my head around the problem and any info helps.

-Ed

ed@modk.it wrote:

    Is there a reason why LLVM's link-time optimization won't work for you?

    http://llvm.org/docs/GoldPlugin.html
    <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_GoldPlugin.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PR31BXeMANGrAQP2Tt9Eg5psH82vj8Oq1WmyprGhyn8&e=&gt;
    http://llvm.org/docs/LinkTimeOptimization.html
    <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LinkTimeOptimization.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PoqmeRXrssdG9xj6Fko_SKttwLPWqUVkxFH41dOcg4w&e=&gt;

Well the primary motivation to move to LLVM is licensing which is why we
also ditched binutils since we can't package gcc for iOS due to the
GPL. So in the end the gold plugin wouldn't work for licensing reasons
even if we can get it to work technically but thanks for the links I'm
still trying to wrap my head around the problem and any info helps.

The right future is a world where lld performs llvm lto for you.

Until then, the technique in Gao's PDF is what I would recommend.

Nick

Thanks Nick. I’ve been pursuing Gao’s technique but can’t seem to get opt to remove obviously dead code from even the following trivial example:

int mult(int a, int b){

return a*b;

}

int main(void){

return 0;

}

While mult is never called it still is not removed. I just can’t seem to get opt to understand it’s seeing the whole program so it can remove this globally accessible function. What am I missing? Seems related to the missing -fwhole-program flag in clang. Perhaps this is not even possible? If I can’t get any answers here I may repost that specific question since I didn’t list [opt] in the original question subject.

Thanks,

Ed

After digging a bit more it seems we can achieve the same as gcc’s -fwhole-program by simply marking the mult function as “static” which is all -fwhole-program does anyway. From https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html:

-fwhole-program

Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attributeexternally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.

So we can accomplish that for now with a simple pass on the source. But that had me thinking, how do we accomplish the same for unused C++ classes or member functions within classes. I figured we could accomplish that by changing the linkage type within the llvm IR. But it turns out these already get linkonce_odr linkage. http://llvm.org/docs/LangRef.html states “Unreferenced linkonce globals are allowed to be discarded”

class num{

private:

int number;

public:

num(int n):number(n){}

int mult(int other){

return number*other;

}

};

int main(void){

return 0;

}

If I compile the above to LLVM IR there is actually no trace of the num class which kind of baffles me because what if I was compiling a library and this class was needed in the library consumer?

Either way with this knowledge I think we can get the results we’re looking for in the short term and will follow up if we find or come up with anything that could be generally useful to others.

Thanks,
Ed

ed@modk.it wrote:

After digging a bit more it seems we can achieve the same as gcc's
-fwhole-program by simply marking the mult function as "static" which is
all -fwhole-program does anyway. From
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html:

/-fwhole-program/
    /Assume that the current compilation unit represents the whole
    program being compiled. All public functions and variables with the
    exception of |main| and those merged by
    attribute>externally_visible| become static functions and in effect
    are optimized more aggressively by interprocedural optimizers. /

So we can accomplish that for now with a simple pass on the source.

The pass is called "-internalize". You can control its operation a bit with the -internalize-public-api-file=<filename> and -internalize-public-api-list=<list> flags.

   But

that had me thinking, how do we accomplish the same for unused C++
classes or member functions within classes. I figured we could
accomplish that by changing the linkage type within the llvm IR. But it
turns out these already get linkonce_odr linkage.
http://llvm.org/docs/LangRef.html states "Unreferenced linkonce globals
are allowed to be discarded"

The answer is still internalize. Don't include their names in the list of public APIs and they'll be switched to 'internal' linkage, then deleted by the LTO passes.

classnum{

private:

intnumber;

public:

     num(intn):number(n){}
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.htm

intmult(intother){

returnnumber*other;

     }

};

intmain(void){

return0;

}

If I compile the above to LLVM IR there is actually no trace of the num
class which kind of baffles me because what if I was compiling a library
and this class was needed in the library consumer?

There is no representation of a class in llvm or .o files. Instead, there's a plain struct which represents the typed memory (ie., storage for the non-static data members only), plus a pile of functions which take a pointer to that memory as their first argument (the member functions taking their 'this' argument).

Note that 'static' functions also just change the linkage to internal. So does putting code in an anonymous namespace. The main thing you get out of linker integration into LLVM LTO is that the linker can look at the pile of non-llvm code and determine which symbols are required by the rest of the system and compute that public API list for internalize.

Nick

Nick,

that had me thinking, how do we accomplish the same for unused C++

classes or member functions within classes. I figured we could
accomplish that by changing the linkage type within the llvm IR. But it
turns out these already get linkonce_odr linkage.
http://llvm.org/docs/LangRef.html states "Unreferenced linkonce globals
are allowed to be discarded"

The answer is still internalize. Don't include their names in the list of
public APIs and they'll be switched to 'internal' linkage, then deleted by
the LTO passes.

Thanks, someone else suggested this but I didn't realize this affected the
link step vs the opt step so I felt it wasn't working.
I'm looking at this now but to be clear, is this known to work with
llvm-lld or just gnu-ld? Your previous answer of "the right future is a
world where lld performs llvm lto for you" had me thinking not to expect
any LTO from lld which is why I've been focused on opt.

There is no representation of a class in llvm or .o files. Instead,
there's a plain struct which represents the typed memory (ie., storage for
the non-static data members only), plus a pile of functions which take a
pointer to that memory as their first argument (the member functions taking
their 'this' argument).

Got it.. You do see a lot of class.xyz references in the llvm assembly so
it's clear what it represents in the C++ code. But I came to C++ from C so
I always think of classes that way anyway :wink:

Note that 'static' functions also just change the linkage to internal. So
does putting code in an anonymous namespace. The main thing you get out of
linker integration into LLVM LTO is that the linker can look at the pile of
non-llvm code and determine which symbols are required by the rest of the
system and compute that public API list for internalize.

What I was surprised about was the effect "inline" has on linkage in C++
and the fact that defining a member function or constructor within the
class body makes it inline. While I don't understand the choice of the
name inline semantically, it makes sense that member functions defined
within the class body can be removed when not referenced locally since
libraries would certainly provide a header with the definitions in separate
C++ files (how the compiler knew to discard all traces of an unused class
without knowledge of the whole program is what baffled me earlier but makes
sense now).

Thanks,
Ed