Clang as a compiler-only tool

Hi,

In FreeBSD we recently switched to clang as the default compiler. We have
clang/llvm sources imported in FreeBSD and build it using our own buildsystem.
We use clang only as a compiler, we dont need ARCMT, StaticAnalyzer, Rewriter,
etc.

Currently there's no simple way how to compile clang as compiler-only tool.
ARCMT, StaticAnalyzer etc. are always compiled and linked in. In FreeBSD
this only takes time and space. There is no way how to disable building
and linking of ARCMT, StaticAnalyzer etc.

I would like to ask about your opinion on the attached trivial patch, which
introduces #ifndef CLANG_IS_COMPILER_ONLY in two places. This enables
me to not compile/link:

ASTMatchers, StaticAnalyzer, Edit, Rewrite, ARCMigrate and Serialization

Resulting in significant reduction in compile time of clang and ~2MB in
the size of the clang binary.

I intend this to be used in FreeBSD buildsystem but it can be easily added
to autotools/cmake builds as well if people find it useful.

Do you think this is useful? Should it be done like this or in some other way?
Is the patch ok to be committed?

Thanks, Roman

clang-as-a-compiler-only.patch (1.33 KB)

--- lib/FrontendTool/ExecuteCompilerInvocation.cpp (revision 168286)
+++ lib/FrontendTool/ExecuteCompilerInvocation.cpp (working copy)
@@ -79,11 +81,13 @@
     return new PrintPreprocessedAction();
   }

+#ifndef CLANG_IS_COMPILER_ONLY
   case RewriteMacros: return new RewriteMacrosAction();
   case RewriteObjC: return new RewriteObjCAction();
   case RewriteTest: return new RewriteTestAction();
   case RunAnalysis: return new ento::AnalysisAction();
   case MigrateSource: return new arcmt::MigrateSourceAction();
+#endif
   case RunPreprocessorOnly: return new PreprocessOnlyAction();
   }
   llvm_unreachable("Invalid program action!");

Won't this cause Clang to crash if the user asks for an action that is
not compiled-in?

Dmitri

It will execute llvm_unreachable("Invalid program action!"); which is fine
for me. FreeBSD really needs just the compiler bits.

Note that I am not advocating making this the default, just having a way
to make this possible without modifying the sources.

Roman

Well, I am just suggesting that Clang crashing on command line
parameter invoking an action that is disabled during compile-time is
not the user experience we want to provide for any Clang
configuration. A proper error message would be much better.

Dmitri

+1. Even though the FreeBSD build only needs the compiler bits, users
may want more. They should be treated to a useful error message
("feature not available. Remove CLANG_IS_COMPILER_ONLY.") or the like.

Agreed. If this is user facing, there should be a proper, explanatory diagnostic message. Executing llvm_unreachable is a bug, not a feature!

Completely agree. It would, however, be nice if the ARCMT and similar tools could be made into plugins (or, now that the tooling infrastructure is in place, even into separate tools). The Apple people said that this would happen after the 3.0 release - is anyone planning on making it happen soon?

David

Ok, I made it emit an error like this:

witten ~/llvm$ ./Release+Asserts/bin/clang -cc1 -analyze -analyzer-checker=alpha.cplusplus.VirtualCall -analyzer-store region tools/clang/test/Analysis/virtualcall.cpp
error: action RunAnalysis not compiled in

This happens to uncover a small bug though. We are emitting diagnostics in
ExecuteCompilerInvocation.cpp which then gets warn about with -verify:

witten ~/llvm$ ./Release+Asserts/bin/clang -cc1 -fsyntax-only -load /tmp/bah -plugin foobar -verify ~/hello.c
error: 'error' diagnostics seen but not expected:
  (frontend): unable to load plugin '/tmp/bah': '/tmp/bah: cannot open shared object file: No such file or directory'

but thats a separate issue, just fwiw :slight_smile:

With the patch like that (#ifndef + emitting diagnostic), is it ok?

Roman

clang-as-a-compiler-only.patch (2.74 KB)

I don’t personally think this diagnostic is clear enough. What is a RunAnalysis? Clang is known for its good, readable diagnostics - I’d prefer something more human-readable.

error:: Using functionality that has been compiled out of Clang
note:: Recompile without -DXXXX to enable this functionality

Much better now!

How will you pass this #define to the source file? A special
./configure argument or just CXXFLAGS? And what about the change to
exclude unneeded directories from compiling/linking? And, of course,
will you be able to run regression tests with such a compiler? (How
to exclude tests for functionality that was not built?)

Dmitri

As I said, I intend this to be used in FreeBSD buildsystem, I dont plan
to use it with configure/cmake at all. Thus I dont have any opinion
on that.

I dont know how to exclude such tests nor how to even detect it was not
compiled in. I just see value in having clang-lite for FreeBSD.

Thats why I asked on cfe-dev@ :slight_smile: Is there a value in making this
./configure option which doesnt compile/link StaticAnalyzer/ARCMT etc. ?

Roman

> Ok, I made it emit an error like this:
>
> witten ~/llvm$ ./Release+Asserts/bin/clang -cc1 -analyze -analyzer-checker=alpha.cplusplus.VirtualCall -analyzer-store region tools/clang/test/Analysis/virtualcall.cpp
> error: action RunAnalysis not compiled in
>
>
> This happens to uncover a small bug though. We are emitting diagnostics in
> ExecuteCompilerInvocation.cpp which then gets warn about with -verify:
>
>
> witten ~/llvm$ ./Release+Asserts/bin/clang -cc1 -fsyntax-only -load /tmp/bah -plugin foobar -verify ~/hello.c
> error: 'error' diagnostics seen but not expected:
> (frontend): unable to load plugin '/tmp/bah': '/tmp/bah: cannot open shared object file: No such file or directory'
>
> but thats a separate issue, just fwiw :slight_smile:
>
>
> With the patch like that (#ifndef + emitting diagnostic), is it ok?

Much better now!

How will you pass this #define to the source file? A special
./configure argument or just CXXFLAGS? And what about the change to
exclude unneeded directories from compiling/linking? And, of course,
will you be able to run regression tests with such a compiler? (How
to exclude tests for functionality that was not built?)

As I said, I intend this to be used in FreeBSD buildsystem, I dont plan
to use it with configure/cmake at all. Thus I dont have any opinion
on that.

So FreeBSD build system will have completely custom Makefiles for LLVM
and Clang?

I dont know how to exclude such tests nor how to even detect it was not
compiled in. I just see value in having clang-lite for FreeBSD.

Why don't you see value in having tests for the FreeBSD system
compiler? I think that running tests for such an important piece of
system software is extremely important.

Thats why I asked on cfe-dev@ :slight_smile: Is there a value in making this
./configure option which doesnt compile/link StaticAnalyzer/ARCMT etc. ?

It might be useful for someone else only if it is *useable*: so that
one can actually select this mode with configure or cmake, and run
regression tests afterwards.

Otherwise, if you are not planning to use Clang's ./configure, I don't
think it makes sense for you to waste your time on implementing it.
This option will have zero users. (If FreeBSD build system used it,
that would count as lots of users IMHO.)

Dmitri

I'd appreciated such option(s), if it was possible to keep StaticAnalyzer and Rewriter
and remove everything else from your list.

I believe the purpose of this is to build a base system, which also includes a full version of clang. In that case nobody would care about the error message or "unreachable" or whatever, because it's not going to be the final compiler.

The FreeBSD compiler doesn't change often, so once it's validated as a system compiler it doesn't need to run self-tests each time someone builds the world.

On my FreeBSD box, the majority of time spent in "buildworld" is spent building clang. I don't know if it's built twice or not, so I'm not sure if I'm right here. If my guesses are correct, having a cut-down version of clang to build the base system would save some time.

-Krzysztof

Agreed. If this is user facing, there should be a proper, explanatory diagnostic message. Executing llvm_unreachable is a bug, not a feature!

Completely agree. It would, however, be nice if the ARCMT and similar tools could be made into plugins (or, now that the tooling infrastructure is in place, even into separate tools).

This is, of course, the right way to implement this functionality: the static analyzer and ARCMT should be plugins.

The Apple people said that this would happen after the 3.0 release - is anyone planning on making it happen soon?

I don't know of any plans to work on it.

  - Doug

My team looked into making the static analyzer a plug-in. One obstacle we found was that enabling plug-ins increases the start of time of clang. On the Mac at least, we use export maps to restrict the set of symbols that are exported from the clang executable. Enabling the plug-ins means we must export far more symbols from the executable, which would have a direct impact on the start up time of clang. I do not remember the the exact numbers, but this regression in start-up time would be unacceptable.

One possible direction is to create a clang service. There has been some discussion before of doing this. A clang service could amortize the cost of starting up over many separate compilations, and possibly provide a better way to enable plug-ins anyway.

My team looked into making the static analyzer a plug-in. One obstacle we found was that enabling plug-ins increases the start of time of clang. On the Mac at least, we use export maps to restrict the set of symbols that are exported from the clang executable. Enabling the plug-ins means we must export far more symbols from the executable, which would have a direct impact on the start up time of clang. I do not remember the the exact numbers, but this regression in start-up time would be unacceptable.

One possible direction is to create a clang service. There has been some discussion before of doing this. A clang service could amortize the cost of starting up over many separate compilations, and possibly provide a better way to enable plug-ins anyway.

Yes, unfortunately the way plugins are currently implemented (given
them access to all of the clang binary) is very expensive. This is
even more noticeable when using LTO on ELF systems. If enabling
plugins (linking with -export-dynamic), LTO cannot internalize much.

Another possibility that might work is to build a shared library that
exports only the symbols that plugins can use (just lib/AST maybe?)
and have both clang and plugins link with it.

In any case, these are all long term solutions, so I think something
along the lines Roman is proposing is a a reasonable compromise for
now.

Cheers,
Rafael

I am doubtful that this would be any better. Plug-ins will likely want to use almost every API that Clang has. That means that the shared library would need to export a ton of symbols, putting a lot of work on the dynamic linker.

To use LLVM plugins I have to do a build with shared libraries, or you get a problem two handlers for command line options get registered (or, rather, the same one twice). I tried doing builds of LLVM and FreeBSD using the shared and statically linked versions of clang and found that there was a 5-10% slowdown for the shared version on x86-64, which didn't seem too bad. Not ideal, but a smaller change than going from -O2 to -O3 and a much smaller slowdown than enabling LTO.

The problem with only exporting the symbols that you think a plugin would use is that it's very hard to determine exactly what those would be. For example, it sounds like a lot of people want to write plugins that modify IR generation, so they'd need access to all of the clang IRGen and a load of the LLVM stuff as well.

David

Yes, unfortunately the way plugins are currently implemented (given
them access to all of the clang binary) is very expensive. This is
even more noticeable when using LTO on ELF systems. If enabling
plugins (linking with -export-dynamic), LTO cannot internalize much.

To put some numbers, I bootstrapped clang at 168705 in Release mode
(no asserts) on linux x86_64:

plain bootstrap:
  bin/clang is 32572488 bytes

bootstrap with CLANG_IS_PRODUCTION
  bin/clang is 30528584 bytes

bootstrap with LTO:
  bin/clang is 36843384 bytes!! (Looks like the inliner needs some lto logic)

bootstrap with LTO and CLANG_IS_PRODUCTION:
  bin/clang is 32055896 bytes.

I can benchmark startup if anyone has a suggestion on how to do it.

Cheers,
Rafael