Integrating "-distribute" into clang's Driver

Hi cfe-dev,

I've added an option to -cc1: -distribute. The option takes as input a
single source file, and produces object code, by distributing the
source to slaves. I'm now trying to make '-distribute' a
non-cc1-option as well, so that a user can use -distribute in their
CFLAGS to get projects to build in a distributed manner without much
hassle. I have several questions regarding this:

1. Since I'm skipping the assembler(I'm doing assembly on slaves), but
still going on to the linker, I'm confused about how to integrate the
-distribute option into the Action pipeline in Driver.cpp. What's the
best way to do this? I'd like to be able to smartly handle a user
typing "clang -distribute -E myFile.c" by not invoking -distribute in
-cc1 if no object code is required.

2. Since I'm skipping the assembler, I need to know where to save the
object code to on disk. Is there an easy way to get clang to pass -cc1
the expected location of the object file, so that the linker will be
able to find the object file?

3. Is there any way (or does clang already) invoke multiple -cc1s in
parallel where possible? If not, would this be easy to add in? When
called with -distribute, clang will just connect via a UNIX socket to
another process, send over the source+args, and receive the diags, and
the object file will be written out to disk by the process at the
other end of the socket, so I'm not worried about thread safety at
all.

Thanks,
Mike

Hi cfe-dev,

I've added an option to -cc1: -distribute. The option takes as input a
single source file, and produces object code, by distributing the
source to slaves. I'm now trying to make '-distribute' a
non-cc1-option as well, so that a user can use -distribute in their
CFLAGS to get projects to build in a distributed manner without much
hassle. I have several questions regarding this:

1. Since I'm skipping the assembler(I'm doing assembly on slaves), but
still going on to the linker, I'm confused about how to integrate the
-distribute option into the Action pipeline in Driver.cpp. What's the
best way to do this? I'd like to be able to smartly handle a user
typing "clang -distribute -E myFile.c" by not invoking -distribute in
-cc1 if no object code is required.

The -integrated-as option is pretty similar to what you need; try
taking a look at how that is implemented? As for -E, you can check
explicitly in Clang::ConstructJob in lib/Driver/Tools.cpp.

2. Since I'm skipping the assembler, I need to know where to save the
object code to on disk. Is there an easy way to get clang to pass -cc1
the expected location of the object file, so that the linker will be
able to find the object file?

See above.

3. Is there any way (or does clang already) invoke multiple -cc1s in
parallel where possible? If not, would this be easy to add in? When
called with -distribute, clang will just connect via a UNIX socket to
another process, send over the source+args, and receive the diags, and
the object file will be written out to disk by the process at the
other end of the socket, so I'm not worried about thread safety at
all.

It probably wouldn't be that difficult to implement; the driver
already invokes separate -cc1 instances when it is passed multiple
files. That said, majority of popular build systems don't call the
compiler in this way, so it isn't very high priority.

-Eli

Thanks for the quick reply!

It looks like -integrated-as is handled in SelectToolForJob, where the
default assembler is overridden with the compiler's... this is
slightly different than what I want to do(it is replacing a stage in
the pipeline instead of removing one and bridging the gap between the
two bordering stages).

Right now, my plan is to check in Driver::BuildActions if the args has
the option OPT_distribute, and if it does, leave out an assembly
stage. The -E example I mentioned was just an example of a larger
class of problems(i.e. what if the user passes --emit-llvm and
-distribute?), so I'm concerned about handling these.

I'm still unsure about how to bridge the gap between the -cc1
invocation and the linker though. Any ideas on how I'd do that? Also,
any ideas on how I would pass -distribute down to -cc1?

Thanks!
Mike

Okay, did a little more digging, and I think I'm making the situation
more complicated than it is :).

If I leave out the assembly stage, clang produces a -cc1 command that
outputs (in my example) to cc-IpYybF.s, and a linker command that
takes cc-IpYybF.s as input. So, looks like it won't be an issue
getting the filename into -cc1(even if the file extension is a little
misleading)!

As for passing the arguments on, in Clang::ConstructJob, I simply
check if Args contains "-distribute", and if it does, I push
"-distribute" onto CmdArgs.

There is one small snag I encountered(and worked around, in an ugly
way). Because clang expects assembly output, it passes "-S" to the
-cc1 invocation. This will override the "-distribute" option. What I
did to work around this, was check to see if "-distribute" and "-S"
are both present. If they are, I drop the "-S".

This is unfortunate, because if a user adds both -distribute and -S to
their args, the behavior will not be to do the action locally, but
instead to write an object file to the assembly location. Any ideas on
elegant workarounds?

Thanks,
Mike

Okay, did a little more digging, and I think I'm making the situation
more complicated than it is :).

If I leave out the assembly stage, clang produces a -cc1 command that
outputs (in my example) to cc-IpYybF.s, and a linker command that
takes cc-IpYybF.s as input. So, looks like it won't be an issue
getting the filename into -cc1(even if the file extension is a little
misleading)!

As for passing the arguments on, in Clang::ConstructJob, I simply
check if Args contains "-distribute", and if it does, I push
"-distribute" onto CmdArgs.

There is one small snag I encountered(and worked around, in an ugly
way). Because clang expects assembly output, it passes "-S" to the
-cc1 invocation. This will override the "-distribute" option. What I
did to work around this, was check to see if "-distribute" and "-S"
are both present. If they are, I drop the "-S".

This is unfortunate, because if a user adds both -distribute and -S to
their args, the behavior will not be to do the action locally, but
instead to write an object file to the assembly location. Any ideas on
elegant workarounds?

Here's what I was thinking: your "clang -cc1 -distribute" mode is
roughly equivalent to "clang -cc1 -emit-obj" in the sense that it
takes a C source code file and outputs an object file, right?
Therefore, I think you should be able to make "-distribute" act like
"-integrated-as" in job creation, and just modify Clang::ConstructJob
slightly to pass "-distribute" instead of "-emit-obj" in "-distribute"
mode. Or maybe make a unique job type for it. Does that sound like
it could work?

-Eli