Myterious soft-float output in LTO cache

I’m trying to compile and link a 300k codebase for RISCV64 using our internal 15.0.7-based toolchain. It almost works, but it fails to link:

Hard-float 'd' ABI can't be used for a target that doesn't support the D instruction set extension (ignoring target-abi)
(... plenty of these ...)
Hard-float 'd' ABI can't be used for a target that doesn't support the D instruction set extension (ignoring target-abi)
ld.lld: error: /home/tamas/work/xx/build/lto_cache/llvmcache-8D3B2369E12D4E19EEDA96FFB8F16EF08BB1D893: cannot link object files with different floating-point ABI
ld.lld: error: /home/tamas/work/xx/build/lto_cache/llvmcache-5B904B7F5E30E1103D236BFB9D9748FC6AF7A62C: cannot link object files with different floating-point ABI
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)

Inspecting these two files, I see

ELF 64-bit LSB relocatable, UCB RISC-V, soft-float ABI, version 1 (SYSV), not stripped

and

ELF 64-bit LSB relocatable, UCB RISC-V, soft-float ABI, version 1 (SYSV), with debug_info, not stripped

I’m using a compiler config file with the following contents:

--sysroot <CFGDIR>/../targets/riscv64-xx-linux-musl
-march=rv64imafdc 
-mabi=lp64d
-mfloat-abi=hard
-Wl,--no-dynamic-linker
-Wl,-rpath,<CFGDIR>/../targets/riscv64-xx-linux-musl/lib
-Wl,-rpath,<CFGDIR>/../lib/riscv64-xx-linux-musl
-Qunused-arguments
-pie
-fPIC

-march, -mabi, -mfloat-abi were added later in an effort to fix the issue but did not make a difference (these are the desired values but also what I would expect to be the default without specifying them). The full invocation that I reconstructed from the ninja build looks like this:

#! /bin/bash
/home/tamas/.xx_toolchain/15.0.7+bced4da4f/bin/riscv64-xx-linux-musl-clang \
  -Os \
  -g \
  -DNDEBUG \
  -fvisibility=hidden \
  -Wall \
  -Wsign-compare \
  -Wno-reorder-ctor \
  -Wno-delete-non-virtual-dtor \
  -Wunused-variable \
  -Wuninitialized \
  -fno-omit-frame-pointer \
  -fno-ident \
  -g2 \
  -glldb \
  -gdwarf-aranges \
  -ggnu-pubnames \
  -fdata-sections \
  -ffunction-sections \
  -flto=thin \
  -Wl,--lto-O2 \
  -fwhole-program-vtables \
  -Wl,--thinlto-cache-dir=/home/tamas/work/xx/build/lto_cache/ \
  -MD -MT test.o -MF test.d -o test.o -c ../test.c
  • plus a PCH include which I didn’t bother with.

Inspecting all files in the LTO cache, I see that they are all one of these four:

ELF 64-bit LSB relocatable, UCB RISC-V, double-float ABI, version 1 (GNU/Linux), with debug_info, not stripped
ELF 64-bit LSB relocatable, UCB RISC-V, double-float ABI, version 1 (SYSV), with debug_info, not stripped
ELF 64-bit LSB relocatable, UCB RISC-V, soft-float ABI, version 1 (SYSV), not stripped
ELF 64-bit LSB relocatable, UCB RISC-V, soft-float ABI, version 1 (SYSV), with debug_info, not stripped

Notably, the two ELF files which end up soft-float are containing C source filenames as strings. Attempting to compile them separately with identical flags results in an object file containing IR (as expected). But that’s the output of the compilation, not an artifact that ends up in the LTO cache.

Any idea what I might be doing wrong or how I could troubleshoot this further? Disabling LTO should be a last resort as it would require significant effort to do it in all deps.

Passing -Wl,-plugin-opt=-target-abi=lp64d makes it link, though I’m still getting the Hard-float 'd' ABI can't be used for a target that doesn't support the D instruction set extension (ignoring target-abi) warnings. I gathered this from the following thread:

[llvm-dev] Encode target-abi into LLVM bitcode for LTO.

It seems like there are some open reviews that might be related, such as :gear: D71387 pass -mabi to LTO linker only in RISC-V targets, enable RISC-V LTO (llvm.org)

Is this warning safe to ignore?

@teresajohnson is probably a good person to weigh in.

1 Like

Just to confirm my understanding, we are selecting cache entries that use the soft-float ABI, rather than generating a newer object file for the same source file compiled with the double-float ABI?

When were these cache entries created, and does the error persist if you remove everything from your cache before the LTO link? I.e. are we picking up stale cache entries, or is the first LTO link actually creating such cached object files?

These cache entries are freshly generated and the behavior 100% reproduces with an empty cache (unless I pass -Wl,-plugin-opt=-target-abi=lp64d). So I don’t think the cache itself is a factor here.

@zakk0610 @lenary can you help here? This is related to the RISC-V target-abi LTO work you did.

Ok, yes, the cache seems uninvolved. Basically, it seems that this information is not encoded in the IR, which is why it gets dropped in the LTO link unless you pass that flag.

Looking through that llvm-dev discussion you pointed to, this appears to be the email outlining the decision: [llvm-dev] Encode target-abi into LLVM bitcode for LTO.

It looks like the 2 patches mentioned there were submitted, so I wonder why you aren’t seeing the error from D72768? But it also seems like the follow on change mentioned in the above email to get LTO to automatically use the module flag to create the TM didn’t get committed yet or isn’t kicking in here - ah looks like this was ⚙ D78035 [PoC][RISCV] enable LTO/ThinLTO on RISCV which was a PoC patch. @zakk0610 what is the status of that? Was it waiting on a full review? In that case it shouldn’t be marked PoC.

What I’m also not sure about is why passing -target-abi=lp64d to the LTO link results in the warnings you are seeing.

1 Like

I think there was also an issue where the mcpu or march needs to be passed to plugin-opt as well. there may be a patch for that too. I’m not at my computer so I’ll try to find it later.

I don’t believe the issues around ISA and ABI with LTO were ever fully fixed, though IIRC @preames seemed to think so when we talked a few months back?

You may need to pass the corresponding target-feature via linker option.
⚙ D132843 [RISCV] Ensure target features get passed to the LTO linker for RISC-V mentions the downstream work around solution and what issues we have.

I’m so sorry my patches make confusing and I would close my patches as well.

1 Like

Thanks! I passed -Wl,-plugin-opt=-target-abi=lp64d and that made it link, but my build crashed. I suppose passing the arch might help. I can’t find the right spelling for the flag, is it -target-cpu?

IIRC, it may be like -Wl,--plugin-opt=-mattr=+64bit,+a,+c,+d,+f,+m.
I’m not sure if it’s allowed to concat all target-features as one string.

1 Like

Thank you, that’s a great starting point. I’ll play around with it and document my progress in this thread. I appreciate your help.

Ah yes, the perennial problem where command line flags aren’t recorded in IR, so LTO is broken.

Making codegen decisions based on cl::opt is also LTO-hostile.

See also:
commit fc018ebb608e (“[IR] make -warn-frame-size into a module attr”)
commit 3787ee457173 (“reland [IR] make -stack-alignment= into a module attr”)

To be clear, if -Wl,--plugin-opt= fixes your problem, that is a defect in LLVM. The whole point of IR is to include the necessary info to compile, and if some command line flag doesn’t work with LTO, that’s a sin of omission in the IR.

1 Like

Thanks.

3787ee457173 I actually reverted in my tree because it was causing problems between x264 and ffmpeg (x264 is using a different stack alignment internally).

To be clear, if -Wl,--plugin-opt= fixes your problem, that is a defect in LLVM

I agree - but I’m happy there is a viable workaround. BTW, it’s a little unclear to me why the ABI couldn’t be encoded in the triple like in the case of some other targets.

Encoding the ABI in the triple creates combinatorial explosion, since there’s just the one field that covers both libc and other ABI details, and that still doesn’t account for ISA choices which can present incompatibilities (there are various conflicting extensions, because the embedded space wants more choice and to be able to ignore things that it doesn’t need), which by that logic would also need to be encoded in the triple.

1 Like

Ah that makes a lot of sense, thank you.

I’ve been investigating some ABI/target-features related issues on RISC-V for a while, and would like to get this solved. Since this post seems the most closely related, I think further discussion can be had here without introducing too much additional context.

Looking at the issue tracker, there seems to be a number of related issues:

On discourse, Encode target-abi into LLVM bitcode for LTO. seems particularly relevant to the core issues, but it doesn’t seem like any decision was reached on how to bridge the gap or solve the problem.

There are also some earlier attempts to fix the problem, like ⚙ D132843 [RISCV] Ensure target features get passed to the LTO linker for RISC-V and ⚙ D71387 pass -mabi to LTO linker only in RISC-V targets, enable RISC-V LTO, though it isn’t clear to me what the status of those are given that the authors have moved on from RISC-V related work.

Within the issues and discussions, I found several comments from maintainers that note that this is a known problem and provide a work around.

However, a work around is normally a temporary solution, so I’d like to understand what the problem is in more detail, so we can decide on how to solve it. Or in the case that we do know, what steps are left to have a working solution before the next major release.

I’ve updated LTO plugin uses wrong ABI for LTO objects on riscv · Issue #50591 · llvm/llvm-project · GitHub with most of this information, and hopefully track related work on the larger problem, but Discourse is a better place for discussion, so I’m hoping we can get some kind of consensus here on the satus/design/approach and then finish up the technical work on Github.

CC: @MaskRay @petrhosek @mysterymath @kito-cheng

1 Like