Generating RISCV code from MLIR

In our tpp-mlir project, we generate LLVM IR from MLIR and then code-gen to the native platform. It currently works for x86 and Arm, but when adding RISCV, we’re getting the following error:

$ ./build/bin/tpp-run -n 10 -e entry -entry-point-result=void gemm_fp32_const_small.mlir

LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified

This is running on a QEMU (edk2-riscv64) on both Fedora 40 and Ubuntu 24 with the following QEMU args:

 <qemu:commandline>
   <qemu:arg value='-cpu'/>
   <qemu:arg value='rv64,zba=true,zbb=true,v=true,vlen=256,vext_spec=v1.0,rvv_ta_all_1s=true,rvv_ma_all_1s=true'/>
 </qemu:commandline>

We have tried multiple fpuName options, including rv64imafdcv, RV64IMAFDCZicsr_Zifencei and even the full splat from cpuinfo: rv64imafdcvh_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_sstc, to no avail.

I’m not sure how feature strings work with the RISCV back-end in LLVM, so am looking for pointers in how to work out if the problem is in our building of the target machine, the feature string, or some other option that we’re missing.

Thanks!

The zvl*b extension is for specifying the minimum VLEN of the platform.

You will need the v extension specified when using the RISC-V vector extension. Here [0], you can find the implication (dependence) of the v-extension, so combining with the QEMU specification you mentioned, you will need to specify zvl256b too during compilation.

[0] riscv-v-spec/v-spec.adoc at master · riscvarchive/riscv-v-spec · GitHub

Thanks!

Ok, I’ve tried:

  • rv64imafdcvh_zicsr_zifencei_zvl256b_zve64d
  • rv64imafdcvh_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_sstc_zvl256b_zve64d

and still get:

LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified

IIUC, both v and zve* are specified.

Can you provide a brief LLVM IR for reproduction?

This is just a wrapper code that calls our library functions. There should be no vector code in there, only in our library code. The library code works fine when called from C++. My guess is that this could be some ABI issue (ie. compiler calls the functions in the wrong way).

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

@__constant_4x32x32xf32 = private constant [4 x [32 x [32 x float]]] [[32 x [32 x float]] [[32 x float] [float 1.000000e+00, ...  float 1.000000e+00]]]], align 128

; Function Attrs: mustprogress nounwind willreturn allockind("free") memory(argmem: readwrite, inaccessiblemem: readwrite)
declare void @free(ptr allocptr nocapture noundef) local_unnamed_addr #0

; Function Attrs: mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite)
declare noalias noundef ptr @malloc(i64 noundef) local_unnamed_addr #1

declare void @printNewline() local_unnamed_addr #2

declare void @printF64(double) local_unnamed_addr #2

define { ptr, ptr, i64, [4 x i64], [4 x i64] } @_entry(ptr nocapture readnone %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i64 %8, i64 %9, i64 %10) local_unnamed_addr #2 {
.preheader5:
  %11 = tail call dereferenceable_or_null(32832) ptr @malloc(i64 32832)
  %12 = ptrtoint ptr %11 to i64
  %13 = add i64 %12, 63
  %14 = and i64 %13, -64
  %15 = inttoptr i64 %14 to ptr
  %16 = tail call i64 @xsmm_brgemm_dispatch(i64 1, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 1024, i64 1024, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 0, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 1024, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 2048, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 3072, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 4096, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 5120, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 6144, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %1, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 7168, i64 4)
  %17 = tail call dereferenceable_or_null(32832) ptr @malloc(i64 32832)
  %18 = ptrtoint ptr %17 to i64
  %19 = add i64 %18, 63
  %20 = and i64 %19, -64
  %21 = inttoptr i64 %20 to ptr
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 0, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 1024, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 2048, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 3072, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 4096, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 5120, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 6144, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %15, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %21, i64 7168, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 0, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 1024, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 2048, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 0, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 3072, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 4096, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 5120, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 6144, i64 4)
  tail call void @xsmm_brgemm_invoke(i64 1, i64 %16, ptr %21, i64 4096, ptr nonnull @__constant_4x32x32xf32, i64 0, ptr %15, i64 7168, i64 4)
  %22 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } undef, ptr %11, 0
  %23 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %22, ptr %15, 1
  %24 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %23, i64 0, 2
  %25 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %24, i64 2, 3, 0
  %26 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %25, i64 4, 3, 1
  %27 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %26, i64 32, 3, 2
  %28 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %27, i64 32, 3, 3
  %29 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %28, i64 4096, 4, 0
  %30 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %29, i64 1024, 4, 1
  %31 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %30, i64 32, 4, 2
  %32 = insertvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %31, i64 1, 4, 3
  tail call void @free(ptr %17)
  ret { ptr, ptr, i64, [4 x i64], [4 x i64] } %32
}

declare void @xsmm_brgemm_invoke(i64, i64, ptr, i64, ptr, i64, ptr, i64, i64) local_unnamed_addr #2

declare i64 @xsmm_brgemm_dispatch(i64, i64, i64, i64, i64, i64, i64, i64, i64, i64) local_unnamed_addr #2

declare double @perf_stop_timer(i64) local_unnamed_addr #2

declare i64 @perf_start_timer() local_unnamed_addr #2

define void @entry() local_unnamed_addr #2 {
  %1 = tail call i64 @perf_start_timer()
  %2 = tail call { ptr, ptr, i64, [4 x i64], [4 x i64] } @_entry(ptr nonnull poison, ptr nonnull @__wrapper_0, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison)
  %3 = extractvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %2, 0
  tail call void @free(ptr %3)
  %4 = tail call double @perf_stop_timer(i64 %1)
  %5 = tail call i64 @perf_start_timer()
  br label %6

6:                                                ; preds = %0, %6
  %7 = phi i64 [ 0, %0 ], [ %10, %6 ]
  %8 = tail call { ptr, ptr, i64, [4 x i64], [4 x i64] } @_entry(ptr nonnull poison, ptr nonnull @__wrapper_0, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison, i64 poison)
  %9 = extractvalue { ptr, ptr, i64, [4 x i64], [4 x i64] } %8, 0
  tail call void @free(ptr %9)
  %10 = add nuw nsw i64 %7, 1
  %11 = icmp samesign ult i64 %7, 9
  br i1 %11, label %6, label %12

12:                                               ; preds = %6
  %13 = tail call double @perf_stop_timer(i64 %5)
  %14 = fdiv double %13, 1.000000e+01
  tail call void @printF64(double %14)
  tail call void @printNewline()
  ret void
}

attributes #0 = { mustprogress nounwind willreturn allockind("free") memory(argmem: readwrite, inaccessiblemem: readwrite) "alloc-family"="malloc" "unsafe-fp-math"="true" }
attributes #1 = { mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite) "alloc-family"="malloc" "unsafe-fp-math"="true" }
attributes #2 = { "unsafe-fp-math"="true" }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

The createTargetMachine argument takes a comma separated list of extensions each starting with + or -. I’m surprised it doesn’t print a warning or error for the string not being recognized.

Well, I did try:

-fpu "+rv64imafdcvh,+zicsr,+zifencei,+zvl256b,+zve64d"

And I still get the same error:

LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified

I was confused, because on x86/arm the feature string of cpuinfo is a comma-separated list, and on RV it has underscores. But neither formats seem to work.

On our program, either x86 and arm do create different target machines when we specify the string (though, to be honest, we don’t pass a list, just one flag).

Edited to add: Any fpu string, even empty, gives me the same error, so it’s probably a bigger issue.

This is still not a feature name, it’s just a march string. I think you want +64bit,+m,+a,+f,+d,+c,+v,+h instead of this part of your feature string, the rest looks like a bunch of feature strings as LLVM knows them.

In future, I suggest invoking clang with the -march= flag that you need (which we’ve documented), and looking at how it translates that string into -target-feature arguments for clang -cc1 (I forget if these are also serialised on every function in the IR as a target-features attribute). This translation can also be done in code using the functions in llvm’s TargetParser component, which should be always available even if the risc-v backend is not enabled in your build.

-fpu "+64bit,+m,+a,+f,+d,+c,+v,+h,+zicsr,+zifencei,+zvl256b,+zve64d" -triple riscv64imafd-linux-gnu
...
LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified

Clang emits a monster of a target-feature list :rofl:

$ clang -### -c file.cpp -march=rv64imafd
"-target-feature" "+m" "-target-feature" "+a" "-target-feature" "+f"
"-target-feature" "+d" "-target-feature" "+c" "-target-feature" "+zicsr" ... // everything else disabled

and

$ clang -### -c file.cpp -march=rv64imafdv
"-target-feature" "+m" "-target-feature" "+a" "-target-feature" "+f"
"-target-feature" "+d" "-target-feature" "+v" "-target-feature" "+zicsr"
"-target-feature" "+zve32f" "-target-feature" "+zve32x" "-target-feature" "+zve64d"
"-target-feature" "+zve64f" "-target-feature" "+zve64x" "-target-feature" "+zvl128b"
"-target-feature" "+zvl32b" "-target-feature" "+zvl64b" ... // everything else disabled

So, I used:

 -triple riscv64imafdv-linux-gnu
 -fpu "+64bit,+m,+a,+f,+d,+v,+zicsr,+zve32f,+zve32x,+zve64d,+zve64f,+zve64x,+zvl256b,+zvl128b,+zvl32b,+zvl64b"

...

LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified

:sob:

Edited to reflect that this error also happens on the BPi board, so it’s likely something we’re doing on our side.

@rengolin can you get a backtrace for where that error is occuring. I’m wondering if its occuring somewhere separate from the TargetMachine path.

Apologies for the lack of line information, building LLVM in QEMU is very painful.

First part: error and signal handler:

$ ./build/bin/tpp-run -n 10 -e entry -entry-point-result=void gemm_fp32_const_small.mlir -fpu "+64bit,+m,+a,+f,+d,+c,+v,+h,+zicsr,+zifencei,+zvl256b,+zve64d" -triple riscv64imafdv-linux-gnu
LLVM ERROR: 'zvl*b' requires 'v' or 'zve*' extension to also be specified

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./build/bin/tpp-run -n 10 -e entry -entry-point-result=void gemm_fp32_const_small.mlir -fpu +64bit,+m,+a,+f,+d,+c,+v,+h,+zicsr,+zifencei,+zvl256b,+zve64d -triple riscv64imafdv-linux-gnu
 #0 0x0000000001de5e6c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./build/bin/tpp-run+0x1de5e6c)
 #1 0x0000000001de4260 llvm::sys::RunSignalHandlers() (./build/bin/tpp-run+0x1de4260)
 #2 0x0000000001de64d4 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fff82d405b0 (linux-vdso.so.1+0x5b0)
 #4 0x00007fff8268d8ea __pthread_kill_implementation (/lib64/lp64d/libc.so.6+0x7b8ea)
 #5 0x00007fff8264f28e gsignal (/lib64/lp64d/libc.so.6+0x3d28e)
 #6 0x00007fff8263e218 abort (/lib64/lp64d/libc.so.6+0x2c218)
 #7 0x0000000001da33ea llvm::report_fatal_error(llvm::Twine const&, bool) (./build/bin/tpp-run+0x1da33ea)
 #8 0x0000000001da280e llvm::report_fatal_error(llvm::Error, bool) (./build/bin/tpp-run+0x1da280e)

Second part, where the problem lies:

 #9 0x0000000001fa3b0c llvm::RISCVABI::getTargetABI(llvm::StringRef) RISCVBaseInfo.cpp:0:0
#10 0x0000000001fc66f0 llvm::RISCVTargetELFStreamer::RISCVTargetELFStreamer(llvm::MCStreamer&, llvm::MCSubtargetInfo const&) RISCVELFStreamer.cpp:0:0
#11 0x0000000001fbfa7a createRISCVObjectTargetStreamer(llvm::MCStreamer&, llvm::MCSubtargetInfo const&) RISCVMCTargetDesc.cpp:0:0
#12 0x0000000005f059e2 llvm::Target::createMCObjectStreamer(llvm::Triple const&, llvm::MCContext&, std::unique_ptr<llvm::MCAsmBackend, std::default_delete<llvm::MCAsmBackend>>, std::unique_ptr<llvm::MCObjectWriter, std::default_delete<llvm::MCObjectWriter>>, std::unique_ptr<llvm::MCCodeEmitter, std::default_delete<llvm::MCCodeEmitter>>, llvm::MCSubtargetInfo const&) const (./build/bin/tpp-run+0x5f059e2)
#13 0x0000000004eb7978 llvm::CodeGenTargetMachineImpl::addPassesToEmitMC(llvm::legacy::PassManagerBase&, llvm::MCContext*&, llvm::raw_pwrite_stream&, bool) (./build/bin/tpp-run+0x4eb7978)
#14 0x0000000004b600ce llvm::orc::SimpleCompiler::operator()(llvm::Module&) (./build/bin/tpp-run+0x4b600ce)
#15 0x0000000004b955d2 decltype(auto) llvm::orc::ThreadSafeModule::withModuleDo<llvm::orc::IRCompileLayer::IRCompiler&>(llvm::orc::IRCompileLayer::IRCompiler&) IRCompileLayer.cpp:0:0
#16 0x0000000004b9534e llvm::orc::IRCompileLayer::emit(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility>>, llvm::orc::ThreadSafeModule) (./build/bin/tpp-run+0x4b9534e)
#17 0x0000000004ba5b2a llvm::orc::IRTransformLayer::emit(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility>>, llvm::orc::ThreadSafeModule) (./build/bin/tpp-run+0x4ba5b2a)
#18 0x0000000004ba5b2a llvm::orc::IRTransformLayer::emit(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility>>, llvm::orc::ThreadSafeModule) (./build/bin/tpp-run+0x4ba5b2a)
#19 0x0000000004b981f0 llvm::orc::BasicIRLayerMaterializationUnit::materialize(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility>>) (./build/bin/tpp-run+0x4b981f0)

Remaining stack:

#20 0x0000000004b6d0c8 llvm::orc::MaterializationTask::run() (./build/bin/tpp-run+0x4b6d0c8)
#21 0x0000000004b626c4 llvm::orc::ExecutionSession::dispatchTask(std::unique_ptr<llvm::orc::Task, std::default_delete<llvm::orc::Task>>) Core.cpp:0:0
#22 0x0000000004b6eb30 llvm::orc::ExecutionSession::dispatchOutstandingMUs() (./build/bin/tpp-run+0x4b6eb30)
#23 0x0000000004b707a8 llvm::orc::ExecutionSession::OL_completeLookup(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState>>, std::shared_ptr<llvm::orc::AsynchronousSymbolQuery>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>> const&)>) (./build/bin/tpp-run+0x4b707a8)
#24 0x0000000004b8879a llvm::orc::InProgressFullLookupState::complete(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState>>) Core.cpp:0:0
#25 0x0000000004b647de llvm::orc::ExecutionSession::OL_applyQueryPhase1(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState>>, llvm::Error) (./build/bin/tpp-run+0x4b647de)
#26 0x0000000004b61fe0 llvm::orc::ExecutionSession::lookup(llvm::orc::LookupKind, std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>>> const&, llvm::orc::SymbolLookupSet, llvm::orc::SymbolState, llvm::unique_function<void (llvm::Expected<llvm::DenseMap<llvm::orc::SymbolStringPtr, llvm::orc::ExecutorSymbolDef, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>, llvm::detail::DenseMapPair<llvm::orc::SymbolStringPtr, llvm::orc::ExecutorSymbolDef>>>)>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>> const&)>) (./build/bin/tpp-run+0x4b61fe0)
#27 0x0000000004b6ed28 llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>>> const&, llvm::orc::SymbolLookupSet, llvm::orc::LookupKind, llvm::orc::SymbolState, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>> const&)>) (./build/bin/tpp-run+0x4b6ed28)
#28 0x0000000004b6f1f8 llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>>> const&, llvm::orc::SymbolStringPtr, llvm::orc::SymbolState) (./build/bin/tpp-run+0x4b6f1f8)
#29 0x0000000004b9af00 llvm::orc::LLJIT::lookupLinkerMangled(llvm::orc::JITDylib&, llvm::orc::SymbolStringPtr) (./build/bin/tpp-run+0x4b9af00)
#30 0x000000000465f7d4 llvm::orc::LLJIT::lookupLinkerMangled(llvm::orc::JITDylib&, llvm::StringRef) ExecutionEngine.cpp:0:0
#31 0x000000000465ebcc mlir::ExecutionEngine::lookup(llvm::StringRef) const (./build/bin/tpp-run+0x465ebcc)
#32 0x000000000465bc78 mlir::ExecutionEngine::lookupPacked(llvm::StringRef) const (./build/bin/tpp-run+0x465bc78)
#33 0x000000000465ab2e compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) JitRunner.cpp:0:0
#34 0x0000000004658fb0 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) JitRunner.cpp:0:0
#35 0x00000000046579a6 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) (./build/bin/tpp-run+0x46579a6)
#36 0x0000000001c329ec main (./build/bin/tpp-run+0x1c329ec)
#37 0x00007fff8263e67c __libc_start_call_main (/lib64/lp64d/libc.so.6+0x2c67c)
#38 0x00007fff8263e728 __libc_start_main@GLIBC_2.27 (/lib64/lp64d/libc.so.6+0x2c728)
#39 0x0000000001c30a60 _start (./build/bin/tpp-run+0x1c30a60)
Aborted (core dumped)

The code for initializing the target machine looks deeply suspicious to me, to be honest.

This doesn’t look like it would be at all right for RISC-V, where cpus are not features and we don’t separate fpus into their own option.

1 Like

@rengolin Can you try this and see what it prints

diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
index 6d2659aa1236..21f1c762c43d 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
@@ -147,12 +147,16 @@ llvm::Expected<std::unique_ptr<RISCVISAInfo>>
 parseFeatureBits(bool IsRV64, const FeatureBitset &FeatureBits) {
   unsigned XLen = IsRV64 ? 64 : 32;
   std::vector<std::string> FeatureVector;
+  dbgs() << "Features: ";
   // Convert FeatureBitset to FeatureVector.
   for (auto Feature : RISCVFeatureKV) {
     if (FeatureBits[Feature.Value] &&
-        llvm::RISCVISAInfo::isSupportedExtensionFeature(Feature.Key))
+        llvm::RISCVISAInfo::isSupportedExtensionFeature(Feature.Key)) {
+      dbgs() << Feature.Key;
       FeatureVector.push_back(std::string("+") + Feature.Key);
+    }
   }
+  dbgs() << "\n";
   return llvm::RISCVISAInfo::parseFeatures(XLen, FeatureVector);
 }

I suspected as much, given any feature string I used has the same result. Do you have an example I can copy from?

This works fine for x86 and arm, so I’m a bit lost here.

You should remove the leading ‘+’ before “64bit” since the code that passes to TargetMachine already prepends a ‘+’. Otherwise the string looks fine.

The code needs to convert the Subtarget feature string into an ISA string and it seems something is going wrong with that. Can you try the debug prints I suggested?

I did do that, no change.

Yes, we’re getting to that part. We’ll have a whole-week meeting next week and will try to sort that out and will reply here when we have something.

<scratch>Will also try to get a debug build to get more info on the stack trace.</scratch> We don’t have enough disk on the BPi and the QEMU is just too slow. We’ll pepper the code with more debug messages and trace the TargetMachine construction until we’re sure we’re building the right one. (This brings me back to my first experiences on Arm boards a long time ago).

Thanks for all the help!