RFC: Integrating clang-cc functionality into clang (the driver)

Hi all,

I've been thinking lately about how we can push forward with our goal of
integrating the 'clang-cc' functionality into the 'clang' executable, so that we
have a single compiler binary. This will also unblock future work on clang APIs,
and hopefully make it easier to support new interesting uses of clang.

Heres my proposal:

Hello again,

All the major pieces of the clang / clang-cc integration project are
now in place, which means its time to put them to work! :slight_smile:

The next step is to eliminate the clang-cc binary. I am going to do
this pretty soon, but I wanted to give people a brief heads up in case
it is likely to conflict with some changes in flight. Please let me
know if you think this will disrupt you.

This is (the disruptive part of) what is going to happen:
1. clang will get the -cc1 mode, which will be equivalent to calling
clang-cc, and clang will recursively invoke itself (for now). clang
link will become very slow. :confused:
2. All the tests are going to be rewritten to use 'clang -cc1'
instead of 'clang-cc'.
3. clang-cc will be removed.

Here are some comments on what actually has been done, if you are interested:

Hi all,

I've been thinking lately about how we can push forward with our goal of
integrating the 'clang-cc' functionality into the 'clang' executable, so that we
have a single compiler binary. This will also unblock future work on clang APIs,
and hopefully make it easier to support new interesting uses of clang.

Heres my proposal:

--

Goals
--

1. Make it easier to build clang based tools (from an API perspective).

We still have a ways to go, but I think the new CompilerInstance class
at least helps with this.

2. Avoid unnecessary fork/exec of clang-cc.
a. Makes it easier to debug!
b. Make driver / compiler interaction more obviously a private
implementation
detail.

This will be an easy incremental change once the -cc1 mode is in, and
shouldn't be disruptive. Although you won't have to cut-and-paste for
compiler crashes, which might make you happy. :slight_smile:

Non-Goals
--

1. Add a general purpose mechanism for extending 'clang' (e.g., a plugin
model). This work will make that easier, however.

This is still a non-goal, but I did add plugin support to 'clang-cc',
for people interested in quickly writing a new ASTConsumer and
plugging it in. It doesn't work that well yet given the static
constructors in the backend (for example, pass manager complaining
about passes being registered multiple times).

Proposal (user level)
--

1. Driver gets a new option -cc1, which must be the leading argument (after any
-ccc arguments, but those are "internal" and not supposed to be used by users
anyway). This is a "mode", the remaining arguments will be processed "like"
clang-cc arguments. This is just for debuggability, and for use in -v or -###.

In practice, the arguments will be processed by hand or by reusing the driver
argument parsing functionality instead of using LLVM's command line library.

The driver argument parsing got reused, and TableGenified in the process.

2. 'clang' gets a new option -no-integrated-cc1 which would just execute
'clang' recursively passing the -cc1 argument. Primarily only for testing,
users shouldn't have a good reason to use this.

3. We'll take some steps to still be friendly if clang crashes (currently the
driver tries to at least print a canonical "error: clang-cc failed" type of
message).

This will be done once we drop fork/exec.

Proposal (implementation)
--

1. There will be a new class CompilerInstance (suggestions for a better name
welcome) which holds all of the state needed for running Clang. That is, this
will wrap the source manager, the file manager, the preprocessor factory, the
AST context, the AST consumer, and all that horrible stuff. This will probably
actually be constructed via a builder.

Done.

2. Internally there will be a CompilerInvocation object which maintains the
various bits of state that forms a single invocation of clang-cc (include
paths, target options, triple, code generation options, etc.).

a. The CompilerInvocation object will have two important methods, the first
converts the invocation into a list of 'clang -cc1' arguments. The second
"executes" the invocation and returns a CompilerInstance instance.

Done.

b. The Driver will get a new CompilerJob class which just wraps a
CompilerInvocation. The Driver's Clang tool implementation will be changed to
construct an instance of this object instead of constructing a list of
arguments. This job will take care of running the clang compiler in/out-of
process depending on -no-integrated-cc1, but otherwise is just an adaptor for
CompilerInvocation.

Not done.

c. There will be a method to turn a 'clang -cc1' argument list into a
CompilerInvocation object.

Done.

3. The Driver will get a new API for parsing a "gcc-like" argument list which
corresponds to a single "compile only" task (-fsyntax-only, -S, etc.), and
returns a CompilerJob. This API will return an error for argument vectors which
would do something more complicated, for example executing multiple
compilations or running the linker or assembler.

Not done, but this turned out to not really be necessary for the
functionality I wanted (in ASTUnit). It would still be nice from a
code cleanliness perspective.

4. Move "standard" tests to use 'clang -cc1' instead of 'clang-cc'.

That's what this email is about!

The Future of clang-cc
--

clang-cc is kind of a mess, so at least initially I'd rather just move the
driver and appropriate tests to using the 'clang' executable. Once that's done
we can reevaluate and see what the next step is. One option is to keep clang-cc
around as a dumping/play ground for tools or other features that don't fit into
the "compiler" model of functionality. Another option is to extend 'clang' to
support the main features of clang-cc we care about (i.e., the ones we test) and
move everything else into separate tools (which would probably only be
optionally built -- these would amount to examples).

I ended up deciding I didn't want to leave clang-cc as a wasteland,
and so it got refactored along the way. It is now largely just the
setup of a CompilerInstance object.

- Daniel

Hi,

So can I now invoke the driver directly (after forking, in case it
crashes),
without the need for execve(), and have the ability to turn all the
clang-cc commandline flags on/off?
Is there a way now to get diagnostics directly, without the need to
redirect/parse the output?

Also will cl::ParseCommandLineOptions still work for LLVM commandline
options?

I am currently using some rather low-level switches for clang-cc, so I
might as well ask now whether these are going away in the future or not:
-ffreestanding -nostdinc -disable-free -fdiagnostics-show-option
-fmessage-length=80
-fcolor-diagnostics -triple clambc-generic-generic -include bytecode.h
-Wall -warn-dead-stores -warn-security-syntactic -analyzer-eagerly-assume
-v -g -E -S

Also I've been experimenting at some point with writing a simple editor
that uses clang for syntax highlighting/completion.
There were 2 issues:
- creating the Preprocessor object involved setting lot of language
related stuff, like implicitint, accesscontrol, bool support, and so on.
Is there a way to just tell it to create with the language defaults that
the clang driver would use? (and eventually tell it about
-std=c99, and it automatically sets up whatever clang sets up for c99).

- This isn't necesarely related to your driver work, but I didn't see
any support for reusing previous parse results, like reparsing only the
portion of the
file/membuffer that changed. Can that somehow be accomplished with the
new driver infrastructure? (i.e. tell it that you've previously compiled
this file
with same driver, and now you only want to reparse/rebuild the AST for
the changed part).

Best regards,
--Edwin

Hello again,

All the major pieces of the clang / clang-cc integration project are
now in place, which means its time to put them to work! :slight_smile:

Hi,

So can I now invoke the driver directly (after forking, in case it
crashes),
without the need for execve(), and have the ability to turn all the
clang-cc commandline flags on/off?

Yes. Look at ASTUnit::LoadFromSource for example.

Is there a way now to get diagnostics directly, without the need to
redirect/parse the output?

Yes.

Also will cl::ParseCommandLineOptions still work for LLVM commandline
options?

It works, but it isn't called by default. We could add an -mllvm for
clang-cc for this, although I'd rather not.

I am currently using some rather low-level switches for clang-cc, so I
might as well ask now whether these are going away in the future or not:
-ffreestanding -nostdinc -disable-free -fdiagnostics-show-option
-fmessage-length=80
-fcolor-diagnostics -triple clambc-generic-generic -include bytecode.h
-Wall -warn-dead-stores -warn-security-syntactic -analyzer-eagerly-assume
-v -g -E -S

No, although it would be better for you to use the driver to construct
a CompilerInvocation, and then tweak the resulting object. That way
you are insulated from changes to the clang/clang
-cc1/CompilerInvocation API.

Also I've been experimenting at some point with writing a simple editor
that uses clang for syntax highlighting/completion.
There were 2 issues:
- creating the Preprocessor object involved setting lot of language
related stuff, like implicitint, accesscontrol, bool support, and so on.
Is there a way to just tell it to create with the language defaults that
the clang driver would use? (and eventually tell it about
-std=c99, and it automatically sets up whatever clang sets up for c99).

You can do this easily now via CompilerInvocation and
CompilerInstance. You can look at how the FrontendAction wrapper
implements this, for example.

- This isn't necesarely related to your driver work, but I didn't see
any support for reusing previous parse results, like reparsing only the
portion of the
file/membuffer that changed. Can that somehow be accomplished with the
new driver infrastructure? (i.e. tell it that you've previously compiled
this file
with same driver, and now you only want to reparse/rebuild the AST for
the changed part).

This doesn't really have anything to do with the driver, and we don't
have the underlying feature support for this.

It's great to hear someone is working on this kind of stuff, please
consider packing your work up as an example we can include with clang!

- Daniel

Daniel Dunbar wrote:

Hello again,

This is (the disruptive part of) what is going to happen:
1. clang will get the -cc1 mode, which will be equivalent to calling
clang-cc, and clang will recursively invoke itself (for now). clang
link will become very slow. :confused:
2. All the tests are going to be rewritten to use 'clang -cc1'
instead of 'clang-cc'.
3. clang-cc will be removed.

This would be less disruptive if clang-cc became a symlink to clang, and being called as clang-cc would make it assume -cc1.

Sebastian

That is true, but I don't see why we would want to maintain clang-cc,
its just extra gunk in the build system and extra complexity for the
user. I'll do it if out-voted though.

- Daniel

I'd rather just kill clang-cc outright. It's much cleaner from the user's perspective to have a single "clang" that does everything.

Of course, those of us who *work* on Clang will have to retrain our fingers not to use clang-cc :slight_smile:

  - Doug

I'd rather just kill clang-cc outright.

Me too.

Of course, those of us who *work* on Clang will have to retrain our fingers not to use clang-cc :slight_smile:

My fingers are looking forward to that day.

  

Hello again,

All the major pieces of the clang / clang-cc integration project are
now in place, which means its time to put them to work! :slight_smile:

Hi,

So can I now invoke the driver directly (after forking, in case it
crashes),
without the need for execve(), and have the ability to turn all the
clang-cc commandline flags on/off?
    
Yes. Look at ASTUnit::LoadFromSource for example.

Is there a way now to get diagnostics directly, without the need to
redirect/parse the output?
    
Yes.
  
Ok.

Also will cl::ParseCommandLineOptions still work for LLVM commandline
options?
    
It works, but it isn't called by default. We could add an -mllvm for
clang-cc for this, although I'd rather not.
  
Thats fine, I can call that in my code before calling the driver class.

I am currently using some rather low-level switches for clang-cc, so I
might as well ask now whether these are going away in the future or not:
-ffreestanding -nostdinc -disable-free -fdiagnostics-show-option
-fmessage-length=80
-fcolor-diagnostics -triple clambc-generic-generic -include bytecode.h
-Wall -warn-dead-stores -warn-security-syntactic -analyzer-eagerly-assume
-v -g -E -S
    
No, although it would be better for you to use the driver to construct
a CompilerInvocation, and then tweak the resulting object. That way
you are insulated from changes to the clang/clang
-cc1/CompilerInvocation API.
  
Yes, that makes sense.

Also I've been experimenting at some point with writing a simple editor
that uses clang for syntax highlighting/completion.
There were 2 issues:
- creating the Preprocessor object involved setting lot of language
related stuff, like implicitint, accesscontrol, bool support, and so on.
Is there a way to just tell it to create with the language defaults that
the clang driver would use? (and eventually tell it about
-std=c99, and it automatically sets up whatever clang sets up for c99).
    
You can do this easily now via CompilerInvocation and
CompilerInstance. You can look at how the FrontendAction wrapper
implements this, for example.

- This isn't necesarely related to your driver work, but I didn't see
any support for reusing previous parse results, like reparsing only the
portion of the
file/membuffer that changed. Can that somehow be accomplished with the
new driver infrastructure? (i.e. tell it that you've previously compiled
this file
with same driver, and now you only want to reparse/rebuild the AST for
the changed part).
    
This doesn't really have anything to do with the driver, and we don't
have the underlying feature support for this.

It's great to hear someone is working on this kind of stuff, please
consider packing your work up as an example we can include with clang!
  
Ok, I'll do that when I have some free time to update it to the new
clang API.
What license is acceptable for the examples, GPL2 good enough?

Best regards,
--Edwin