clang SVN r303370 broke libclc

Hi Yaxun,

since clang SVN r303370 ("CodeGen: Cast alloca to expected address
space"), libclc[0] fails to build for me, see below.

[0] configured with ./configure.py --with-llvm-config=$HOME/src/llvm-git/llvm/build-amd64/bin/llvm-config -g ninja amdgcn-mesa-mesa3d amdgcn--amdhsa

[6/356] LLVM-CC amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc
FAILED: amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc
/home/daenzer/src/llvm-git/llvm/build-amd64/bin/clang -MMD -MF amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc.d -target amdgcn-mesa-mesa3d -I`dirname amdgcn-amdhsa/lib/workitem/get_num_groups.cl` -I./generic/include -fno-builtin -D__CLC_INTERNAL -emit-llvm -mcpu=tahiti -c -o amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc amdgcn-amdhsa/lib/workitem/get_num_groups.cl
clang-5.0: /home/daenzer/src/llvm-git/llvm/lib/IR/Instructions.cpp:2681: static llvm::CastInst *llvm::CastInst::Create(Instruction::CastOps, llvm::Value *, llvm::Type *, const llvm::Twine &, llvm::Instruction *): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 0x00007f504a1c17e9 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/daenzer/src/llvm-git/llvm/build-amd64/../lib/Support/Unix/Signals.inc:398:13
#1 0x00007f504a1c17e9 PrintStackTraceSignalHandler(void*) /home/daenzer/src/llvm-git/llvm/build-amd64/../lib/Support/Unix/Signals.inc:461:0
#2 0x00007f504a1c1b36 bool __gnu_cxx::operator!=<std::pair<void (*)(void*), void*>*, std::vector<std::pair<void (*)(void*), void*>, std::allocator<std::pair<void (*)(void*), void*> > > >(__gnu_cxx::__normal_iterator<std::pair<void (*)(void*), void*>*, std::vector<std::pair<void (*)(void*), void*>, std::allocator<std::pair<void (*)(void*), void*> > > > const&, __gnu_cxx::__normal_iterator<std::pair<void (*)(void*), void*>*, std::vector<std::pair<void (*)(void*), void*>, std::allocator<std::pair<void (*)(void*), void*> > > > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/stl_iterator.h:880:27
#3 0x00007f504a1c1b36 llvm::sys::RunSignalHandlers() /home/daenzer/src/llvm-git/llvm/lib/Support/Signals.cpp:43:0
#4 0x00007f504a1c1b36 SignalHandler(int) /home/daenzer/src/llvm-git/llvm/build-amd64/../lib/Support/Unix/Signals.inc:242:0
#5 0x00007f504c6f40c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x110c0)
#6 0x00007f5049035fcf gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fcf)
#7 0x00007f50490373fa abort (/lib/x86_64-linux-gnu/libc.so.6+0x343fa)
#8 0x00007f504902ee37 (/lib/x86_64-linux-gnu/libc.so.6+0x2be37)
#9 0x00007f504902eee2 (/lib/x86_64-linux-gnu/libc.so.6+0x2bee2)
#10 0x00007f504a2a33ab llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*) /home/daenzer/src/llvm-git/llvm/lib/IR/Instructions.cpp:2697:12
#11 0x0000000000780e27 llvm::IRBuilder<llvm::ConstantFolder, clang::CodeGen::CGBuilderInserter>::CreateCast(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&) /home/daenzer/src/llvm-git/llvm/build-amd64/../include/llvm/IR/IRBuilder.h:0:19
#12 0x0000000000780e27 llvm::IRBuilder<llvm::ConstantFolder, clang::CodeGen::CGBuilderInserter>::CreateBitCast(llvm::Value*, llvm::Type*, llvm::Twine const&) /home/daenzer/src/llvm-git/llvm/build-amd64/../include/llvm/IR/IRBuilder.h:1426:0
#13 0x0000000000780e27 clang::CodeGen::CodeGenFunction::EmitLifetimeEnd(llvm::Value*, llvm::Value*) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CGDecl.cpp:939:0
#14 0x0000000000779839 EmitCleanup(clang::CodeGen::CodeGenFunction&, clang::CodeGen::EHScopeStack::Cleanup*, clang::CodeGen::EHScopeStack::Cleanup::Flags, clang::CodeGen::Address) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CGCleanup.cpp:562:3
#15 0x0000000000778679 llvm::IRBuilderBase::GetInsertBlock() const /home/daenzer/src/llvm-git/llvm/build-amd64/../include/llvm/IR/IRBuilder.h:122:47
#16 0x0000000000778679 clang::CodeGen::CodeGenFunction::PopCleanupBlock(bool) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CGCleanup.cpp:888:0
#17 0x00000000007771e6 clang::CodeGen::EHScopeStack::stable_begin() const /home/daenzer/src/llvm-git/llvm/build-amd64/../tools/clang/lib/CodeGen/EHScopeStack.h:380:28
#18 0x00000000007771e6 clang::CodeGen::CodeGenFunction::PopCleanupBlocks(clang::CodeGen::EHScopeStack::stable_iterator, std::initializer_list<llvm::Value**>) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CGCleanup.cpp:426:0
#19 0x00000000006759f2 clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CodeGenFunction.cpp:0:5
#20 0x000000000067af1f clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CodeGenFunction.cpp:1221:8
#21 0x000000000069016e clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:3058:3
#22 0x000000000068a874 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:1914:12
#23 0x0000000000692aaa clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:3785:5
#24 0x0000000000a2334f (anonymous namespace)::CodeGeneratorImpl::HandleTopLevelDecl(clang::DeclGroupRef) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/ModuleBuilder.cpp:151:73
#25 0x0000000000a20b0a clang::BackendConsumer::HandleTopLevelDecl(clang::DeclGroupRef) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:138:11
#26 0x0000000000c4f9b2 clang::ParseAST(clang::Sema&, bool, bool) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/Parse/ParseAST.cpp:151:9
#27 0x000000000099a2fb clang::FrontendAction::Execute() /home/daenzer/src/llvm-git/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:841:10
#28 0x000000000095ee01 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:971:11
#29 0x0000000000a1c6a3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/daenzer/src/llvm-git/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:249:25
#30 0x00000000005b81b8 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/daenzer/src/llvm-git/llvm/tools/clang/tools/driver/cc1_main.cpp:221:13
#31 0x00000000005b6aa8 ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /home/daenzer/src/llvm-git/llvm/tools/clang/tools/driver/driver.cpp:299:12
#32 0x00000000005b6aa8 main /home/daenzer/src/llvm-git/llvm/tools/clang/tools/driver/driver.cpp:380:0
#33 0x00007f50490232b1 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b1)
#34 0x00000000005b3d6a _start (/home/daenzer/src/llvm-git/llvm/build-amd64/bin/clang-5.0+0x5b3d6a)
Stack dump:
0. Program arguments: /home/daenzer/src/llvm-git/llvm/build-amd64/bin/clang-5.0 -cc1 -triple amdgcn-mesa-mesa3d -emit-llvm-bc -emit-llvm-uselists -disable-free -main-file-name get_num_groups.cl -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -no-integrated-as -mconstructor-aliases -target-cpu tahiti -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/daenzer/src/llvm-git/libclc/amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.gcno -resource-dir /home/daenzer/src/llvm-git/llvm/build-amd64/lib/clang/5.0.0 -dependency-file amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc.d -MT amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc -I amdgcn-amdhsa/lib/workitem -I ./generic/include -D __CLC_INTERNAL -fno-dwarf-directory-asm -fdebug-compilation-dir /home/daenzer/src/llvm-git/libclc -ferror-limit 19 -fmessage-length 0 -fno-builtin -fobjc-runtime=gcc -fdiagnostics-show-option -o amdgcn-mesa-mesa3d/lib/workitem/get_num_groups.cl.tahiti.bc -x cl amdgcn-amdhsa/lib/workitem/get_num_groups.cl
1. <eof> parser at end of file
2. amdgcn-amdhsa/lib/workitem/get_num_groups.cl:4:17: LLVM IR generation of declaration 'get_num_groups'
3. amdgcn-amdhsa/lib/workitem/get_num_groups.cl:4:17: Generating code for declaration 'get_num_groups'
clang-5.0: error: unable to execute command: Aborted
clang-5.0: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 5.0.0 (http://llvm.org/git/clang.git 3022dac388832e0bb669821e6677abd5cb8a8784) (llvm/trunk 303323)
Target: amdgcn-mesa-mesa3d
Thread model: posix
InstalledDir: /home/daenzer/src/llvm-git/llvm/build-amd64/bin
clang-5.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-5.0: note: diagnostic msg:

I'm afraid that's a bit over my libclc expertise. Any volunteers?

Also, Vedran pointed out that r600-- (and amdgcn--) are broken as well.
What's the suggested solution for the r600 driver?

I figured out that you meant amdgcn-mesa-opencl instead of
amdgcn-mesa-mesa3d-opencl. r600-mesa-opencl builds as well, but note
that I still get a failure with amdgcn--amdhsa (or amdgcn-amd-amdhsa or
amdgcn-unknown-amdhsa).

For amdgcn target now it is required to use the environment component
opencl to compile OpenCL programs. For example, instead of using
triple amdgcn-mesa-mesa3d, you need to use
amdgcn-mesa-mesa3d-opencl.

Can you update libclc to use triple amdgcn-mesa-mesa3d-opencl ?

[...]

I figured out that you meant amdgcn-mesa-opencl instead of
amdgcn-mesa-mesa3d-opencl.

Unfortunately, running Tom Stellard's hello_world example program with
the result hangs my Kaveri GPU. Sam, can you look into this?

> For amdgcn target now it is required to use the environment component
> opencl to compile OpenCL programs. For example, instead of using
> triple amdgcn-mesa-mesa3d, you need to use
> amdgcn-mesa-mesa3d-opencl.
>
> Can you update libclc to use triple amdgcn-mesa-mesa3d-opencl ?

Hi Sam,

Can you explain more why this is necessary, and how hard would it be to
no break the existing triple? I would prefer not to have to update
all users if possible.

Thanks,
Tom

Seems to work fine, thanks!

also seems to fix most of EG 670 opencl regressions.

Jan

> > I have fixed the issue by https://reviews.llvm.org/rL303644
>
> Seems to work fine, thanks!

also seems to fix most of EG 670 opencl regressions.

while at the same time it introduces regression in
"set kernel argument for array" subtest on Turks.

Jan

Actually, I'm also seeing that test and a bunch of local atomic tests
fail. Reverting both of Sam's patches and rebuilding libclc and Mesa
fixes them. Not sure why I wasn't seeing these failures before, but it
looks like there's still an issue here.

you can find the tests here:
https://piglit.freedesktop.org/

it's fairly straightforward to setup. my run cmdline is:

OCL_ICD_VENDORS=/etc/OpenCL/vendors/mesa-git.icd python3 ./piglit run
tests/cl.py -o results/turks_cl_`date`

the to generate html report I run:

./piglit summary html summary/turks_cl results/turks_cl_*

it looks like this:
http://paul.rutgers.edu/~jv356/piglit/radeon-latest-5/problems.html

Jan

just fyi the 'set kernel argument for array' test is also local memory
related.

Hi Sam,

is there any progress on this regression? I still see the test failing
on my machine.

thanks,
Jan

I am trying to build mesa with opencl enabled on Ubuntu 16.04 so that
I can reproduce the issue, however I got some issue. After I build
and install it, I cannot see the OpenCL icd for mesa under
/etc/OpenCL/vendors.

Here is the command I use to build mesa:

./autogen.sh --prefix=/usr --libdir=/usr/lib/x86_64-linux-gnu --with-
llvm-prefix=/usr/llvm/x86_64-linux-gnu --enable-glx-tls --enable-
texture-float --disable-xvmc --disable-nine --with-gallium-
drivers=radeonsi,swrast --with-dri-drivers= --with-egl-
platforms=x11,drm --enable-gles1 --enable-gles2 -enable-opencl
CXXFLAGS="-O2" CFLAGS="-O2"
make
sudo make install

Can someone take a look if I missed something? Thanks.

I think you need --enable-opencl_icd as well, otherwise mesa provides
the root opencl library (LD_PRELOAD, or LD_LIBRARY_PATH should be
enough to make it work even without icd)

BTW my email was always bounced back from libclc-dev@lists.llvm.org.
I tried to subscribe to it for several times but never got response.
Can someone pass my email to the mailing list?

no idea how to do that.
AFAIK, Tom Stellard is the official maintainer of libclc. He might know
better.

regards,
Jan

Hi Jan,

What's the command to generate html like http://paul.rutgers.edu/~jv3
56/piglit/radeon-latest-5/problems.html with accumulated results?

I used ./piglit summary html summary/turks_cl results/turks_cl_* but
it only generates html with one result, which makes comparison of
different runs difficult.

there needs to be multiple directories with results in 'results/'
directory.
another problem might be that piglit does not overwrite existing
summaries (you should see an error message when that happens) so you
might be stuck with the first one you generated.
./piglit summary html summary/out results/* --overwrite

BTW I suspect the regression was due to
mesa/src/gallium/state_trackers/clover/llvm/codegen/common.cpp, line
131:

           if \(address\_space ==

address_spaces[clang::LangAS::opencl_local
-
compat::lang_as_offset]) {

After my change, this is no longer valid way to get target address
space. Should be:

           if \(address\_space ==

address_spaces[clang::LangAS::opencl_local]) {

compat::lang_as_offset is 0 for llvm >= 5 so those two statements
should be identical. Am I missing something?

thanks for working on this.
Jan

Hi Jan,

What's the command to generate html like http://paul.rutgers.edu/~jv3
56/piglit/radeon-latest-5/problems.html with accumulated results?

I used ./piglit summary html summary/turks_cl results/turks_cl_* but
it only generates html with one result, which makes comparison of
different runs difficult.

there needs to be multiple directories with results in 'results/'
directory.
another problem might be that piglit does not overwrite existing
summaries (you should see an error message when that happens) so you
might be stuck with the first one you generated.
./piglit summary html summary/out results/* --overwrite

BTW I suspect the regression was due to
mesa/src/gallium/state_trackers/clover/llvm/codegen/common.cpp, line
131:

               if (address_space ==
address_spaces[clang::LangAS::opencl_local
                                                   -
compat::lang_as_offset]) {

After my change, this is no longer valid way to get target address
space. Should be:

               if (address_space ==
address_spaces[clang::LangAS::opencl_local]) {

compat::lang_as_offset is 0 for llvm >= 5 so those two statements
should be identical. Am I missing something?

Now that I finally got my debugger setup working again, I can say that
'address_spaces', which is fetched by
c.getTarget().getAddressSpaceMap() in common.cpp:make_kernel_args, is
an empty collection which points to 'DefaultAddrSpaceMap' from
clang/lib/basic/TargetInfo.cpp

It seems like the clang target that clover is using doesn't have an
address space map initialized at all at this time... I'm not sure if
that's because we're setting up the clang instance incorrectly (wrong
triple), or if that map is just never initialized in the AMDGPU
target(s) in LLVM. I've found the AMDGPUAS struct in AMDGPU.h which
contains the address space mappings that are shared among subtargets
and the getAMDGPUAS(Triple) function in AMDGPUBaseInfo.cpp, but I'm
not sure where that information gets copied over to something that's
returned by TargetInfo.getAddressSpaceMap(). Hopefully someone who
knows that code better than I can tell me if I'm missing something.

--Aaron

There are some recent changes about address space mapping in Clang
for amdgcn target for better support C++ based kernel languages.

Currently, the constructor of AMDGPUTargetInfo does not initialize
AddrSpaceMap. Instead, it is initialized by virtual member
adjust(LangOptions &Opts) since the address space mapping of amdgcn
target depends on language.

For backward compatibility, I can let constructor of AMDGPUTargetInfo
initialize AddrSpaceMap to the old value. Hopefully this can fix the
regression.

Hi,

what's the expected situation going forward? will there be one
addrspacemap for all languages, or per language maps (set using
adjust())?
it'd be nice if we could adapt clover now and future proof it before
llvm 5 comes out.

Jan

There are some recent changes about address space mapping in Clang
for amdgcn target for better support C++ based kernel languages.

Currently, the constructor of AMDGPUTargetInfo does not initialize
AddrSpaceMap. Instead, it is initialized by virtual member
adjust(LangOptions &Opts) since the address space mapping of amdgcn
target depends on language.

For backward compatibility, I can let constructor of AMDGPUTargetInfo
initialize AddrSpaceMap to the old value. Hopefully this can fix the
regression.

Hi,

what's the expected situation going forward? will there be one
addrspacemap for all languages, or per language maps (set using
adjust())?
it'd be nice if we could adapt clover now and future proof it before
llvm 5 comes out.

Agreed. I wouldn't mind a fix to LLVM/AMDGPU that preserves backwards
compatibility, but if there's something that we can do to mesa/clover
to adapt/fix our usage of llvm, I'm interested in hearing about it.

--Aaron

What are the target triples used by mesa/clover for amdgpu targets?

clover uses 'r600--' and 'amdgcn-mesa-mesa3d'.

Currently amdgpu targets support 4 address space mappings:

OpenCL/private-is-zero
Non-OpenCL/private-is-zero
OpenCL/generic-is-zero
Non-OpenCL/generic-is-zero

The future plan is:
1. deprecate private-is-zero address space mappings and only support
generic-is-zero address space mappings

just out of curiosity, what's the use of private-is-zero mappings?
clover shouldn't really care (or is easily switched).

2. merge OpenCL and Non-OpenCL address space mapping so that it no
longer depends on language

It seems mesa/clover only uses OpenCL/private-is-zero address space
mapping. If that is the case, I can initialize AddrSpaceMap in
AMDGPUTargetInfo based on target triple and skip the adjustment by
Language options for mesa/clover. Then mesa/clover itself does not
need any change. This should still work after the planned two changes
in the future.

thanks, that sounds good to me. let me know if there is a patch to
test. I can test r600 (turks), my carrizo machine is stuck on llvm4.0
because of ROCm.

Jan

Between Jan and I we should be able to manage testing out the change
for both r600/amdgpu back-ends. Feel free to give fiji a test, of
course.

If there's something that we can do to specify the kernel language to
get the existing code to populate the address space map as well, I
wouldn't mind knowing. It probably means that we're missing something
else somewhere as well (e.g. I'm not entirely certain whether clang
compiler options are currently plumbed through correctly from clover
for things like '-cl-std=xx').

Also, if I'm reading the existing code correctly, the global/local
address spaces stay the same for amdgpu as 1 and 3? If not, we've got
a bit of work to do in libclc, since those are the only address spaces
numbers that currently have the atomic functions implemented (which
are then mapped by the r600/amdgpu targets. It's not the end of the
world, but it'd be nice to know if I should get started on that as
well.

The patch for AMDGPUTargetInfo is here https://reviews.llvm.org/D34987

I gave that patch a spin on both my r600 and GCN-generation cards. The
address-space mapping regression is resolved for both to the best of
my knowledge. My test case was to run the CL CTS atomic tests which
use local kernel arguments for the local atomic function tests. The
atomic local tests all failed before. The regressions are now resolved
(there are still a few unrelated test failures)

I suspect that it would be a good thing for us to fix up our clang
front-end creation code in invocation.cpp anyway, as I believe that it
is the root problem of a few issues that I've run into with clover in
the past as well (e.g. -cl-std not being respected).

--Aaron

Between Jan and I we should be able to manage testing out the change
for both r600/amdgpu back-ends. Feel free to give fiji a test, of
course.

If there's something that we can do to specify the kernel language to
get the existing code to populate the address space map as well, I
wouldn't mind knowing. It probably means that we're missing something
else somewhere as well (e.g. I'm not entirely certain whether clang
compiler options are currently plumbed through correctly from clover
for things like '-cl-std=xx').

Also, if I'm reading the existing code correctly, the global/local
address spaces stay the same for amdgpu as 1 and 3? If not, we've got
a bit of work to do in libclc, since those are the only address spaces
numbers that currently have the atomic functions implemented (which
are then mapped by the r600/amdgpu targets. It's not the end of the
world, but it'd be nice to know if I should get started on that as
well.

I think this might be quite a bit of work, so we might want to start
asap. libclc assumes specific addrspace numbers in atomics, vstore_half
helpers, and images. Moreover, we expect the mappings to be the same
for r600, amdgcn, and nvptx. The first one can be rewritten using clang
builtins (i haven't checked whether the semantics of ocl and clang
match), the other two cannot (at the moment).
nvptx includes target specific clang builtins for images, but not
store/load_half. amdgcn/r600 have neither. either way we'll no longer
be able to share the code in libclc.

Jan