Order of host + device flags created by clang-linker-wrapper

Dear all,

I have a question about the behavior of clang-linker-wrapper. For compiling an OpenMP offloading binary, I use e.g.

clang++ ... -Xoffload-linker-x86_64-unknown-unknown-elf -L<device_path> -L<host_path>

Here, the order is important, since I want <device_path> being searched before <host_path> during device linking.

Previously, this strategy worked for us, when we used -Xopenmp-target=x86_64-unknown-unknown-elf -L<device_path> with an older version of LLVM. However, after merging current upstream changes to our code base and using the new driver, we need to use -Xoffload-linker- AFAIK, since clang-linker-wrapper is called to do the linking, and now our linking does not work as before since the order of flags is reversed.

While clang-linker-wrapper gets the same order of linker flags as shown above, i.e., it is called like

clang-linker-wrapper ... --device-linker=x86_64-unknown-unknown-elf=-L<device_path> -L<host_path>

when it actually executes the link job for the device, the order is reversed and host flags appear before device flags as in

clang -o test.x86_64.native.img ... -L <host_path> -Wl,-L<device_path>

Due to this, the wrong libraries are found during device linking and our build does not work anymore.

My question is whether there is a special reason for the linker wrapper reversing the order of flags. Could this also be done as before, preserving the order?

Any help would be greatly appreciated.

There’s no special reason for the ordering, looking back at the code the above should simply work without the -Xoffload-linker option so I wonder what’s going on there.

  // If this is CPU offloading we copy the input libraries.                                                                                                                                                                      
  if (!Triple.isAMDGPU() && !Triple.isNVPTX()) {                                                                                                                                                                                 
    ArgStringList LinkerArgs;                                                                                                                                                                                                    
    for (const opt::Arg *Arg : Args.filtered(OPT_library, OPT_library_path))                                                                                                                                                     
      Arg->render(Args, LinkerArgs);                                                                                                                                                                                             
    for (const opt::Arg *Arg : Args.filtered(OPT_rpath))                                                                                                                                                                         
          Args.MakeArgString("-Wl,-rpath," + StringRef(Arg->getValue())));                                                                                                                                                       
    llvm::copy(LinkerArgs, std::back_inserter(CmdArgs));                                                                                                                                                                         

So if I do, clang input.c -fopenmp-targets=x86_64-pc-linux-gnu -fopenmp -L ~/somepath -v I can see that it’s being forwarded to the device link job,

"clang" -o /tmp/a.out.x86_64.native-f64a61.img --target=x86_64-pc-linux-gnu -march=native -O2 -Wl,--no-undefined /tmp/input-977a57-x86_64-pc-linux-gnu--6d232d.o -Wl,-Bsymbolic -shared -L /home/jhuber/somepath

Could you make a reproducer for the error you’re seeing and open a bug?

Thanks for your answer!

I need to use -Xoffload-linker-, since my above description was a bit simplified. In reality, I have two different offload targets, and I need to pass specific linker flags to each device linking step and make sure each target only receives the flags intended for it. Furthermore, I must make sure that the host linker does not see any of these flags. Hence, I cannot simply use -L but need to specify for which target I want to use a specific flag.

In the code snippet from Clang linker wrapper you posted above, the --device-linker=x86_64-unknown-unknown-elf=-L<device_path> argument is skipped since it is of type OPT_linker_arg_EQ and added a few lines later by

for (StringRef Arg : Args.getAllArgValues(OPT_linker_arg_EQ))
    CmdArgs.push_back(Args.MakeArgString("-Wl," + Arg));

So kind of moving this into the above code and iterating over all arguments only once, keeping their order and checking for type, could be a solution for me and restore behavior of older Clang versions, which preserved flag order from the initial clang++ call.

If you could tell me what exactly it means to make a reproducer, I could open an issue. Does it mean e.g. providing some Compiler Explorer example showing what happens?

In addition, I have another question. In older Clang versions, linker flags generated by addOpenMPRuntime() in CommonArgs.cpp were not passed to device linking steps. Only flags that were explicitly passed to the initial clang++ call were passed to the device linker. Now, I see that e.g. -l omptarget -l omp is passed to the device linker, since it is present in the clang-linker-wrapper call and forwarded to the device. Also, there are flags like -l gcc_s -l gcc -l pthread and system search paths forwarded.

For us, this is problematic, since we need to make sure those flags do not arrive at the device linking step. For example, we have specific flags we need to use on the device for using a corresponding OpenMP (offloading) runtime. Also, we have specific C/C++ runtime libraries we need to use.

Hence, I would like to get rid of these flags, at least in our downstream compiler. (I don’t know whether this could also be a problem for other users.) Do you have an idea what would be a simple solution for making sure these flags do not arrive at the device linker, i.e., return to the old behavior where only the linker flags passed in the initial clang++ call are used?

So those flags being passed by default was mostly born from the x64 target needing all the basic system utilities to run. I don’t know if there’s an easy way to not pass those implicit arguments since it would mean the thing wouldn’t run without user intervention. For some of the other issues we could probably add -Xhost-linker or something to do the opposite of -Xoffload-liniker, but I don’t know how we would deal with the implicit arguments unless we want the user to manually link -lc -lomptarget -lomp -lgcc etc.

I don’t have a complete proposal for a solution yet, but FYI what I am currently investigating is the following.

I introduced a new command line option --offload-link-explicit-host-flags-only. If this is present, only the linker flags explicitly given by the user are forwarded to the clang-linker-wrapper in LinkerWrapper::ConstructJob().

Furthermore, I want to change clang() in clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp, which creates the command line arguments for the issued clang invocation. I want to change this so that the order of flags is preserved, i.e., host and device specific flags are forwarded to clang in the given order instead of putting device flags after host flags.

This way, when --offload-link-explicit-host-flags-only is used, I would practically get the old behavior. Nevertheless, by omitting the flag, I would be able to forward linker flags generated for the host to the devices if I want.

For us, this could fix the problems, however I have to dig a bit deeper, since the x86_64 case is just for debugging and test scenarios, while we have different kinds of (nested) OpenMP offloading set ups for RISCV targets, e.g., with certain externally provided C libraries from RISCV GNU toolchains and special OpenMP offloading library builds.

Here, I think I have to check whether everything works again, because during such compilations, corresponding toolchains are used, and I cannot precisely say how the interplay between the different toolchains, linker wrapper, and different kinds of general and linker related flags will behave. I am currently working on this and can inform you how this works out if you want.

Changing the order is easy, we could also have a special use flag that requires the user to manually pass the libraries. But I wouldn’t put that in clang, make it a linker wrapper flag and pass it via -Wl,. You can make some patches that meet your needs if you want, or I could take a stab at it when time allows.

Thanks for your proposal! Since I need to get this running quickly, I am already trying to fix this for us. I have changed the idea a bit and introduced, roughly as you suggsted, a linker wrapper option one can pass via -Wl,. It simply suppresses host library search paths and libraries for device link jobs. It seems that this fixes most of the problems we had.

However, let me try to get it running completely, then I can maybe give you some better feedback on what we really needed, maybe this simplifies things that you could do for upstream.