[OpenCL] clang can't compile a simple enqueue_kernel with default opt level

Hi all,

I’ve noticed that clang fails to compile a simple OpenCL kernel with an enqueue_kernel() call if optimization level isn’t set:

$ cat test.cl

void foo(size_t id, __global int* out) {

out[id] = id;


kernel void enqueue_foo(__global int* out) {

size_t id = get_global_id(0);

void (^fooBlock)(void) = ^{ foo(id, out); };

queue_t queue = get_default_queue();

ndrange_t ndrange = ndrange_1D(1);

enqueue_kernel(queue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, fooBlock);


$ build/bin/clang --version

clang version 7.0.0 (https://git.llvm.org/git/clang.git/ 08712fff7fba84b88e2e57b3c739d53b1aab1ed6)

$ clang -cc1 -emit-llvm -x cl -triple spir64-unknown-unknown -finclude-default-header -cl-std=CL2.0 test.cl

clang: ./llvm/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = clang::BlockExpr; Y = const clang::Expr; typename llvm::cast_retty<X, Y*

::ret_type = const clang::BlockExpr*]: Assertion `isa(Val) && “cast() argument of incompatible type!”’ failed.

#9 llvm::cast_retty<clang::BlockExpr, clang::Expr const*>::ret_type llvm::cast<clang::BlockExpr, clang::Expr const>(clang::Expr const*) ./llvm/include/llvm/Support/Casting


#10 clang::CodeGen::CGOpenCLRuntime::emitOpenCLEnqueuedBlock(clang::CodeGen::CodeGenFunction&, clang::Expr const*) ./llvm/tools/clang/lib/CodeGen/CGOpenCLRuntime.cpp:125:0

#11 clang::CodeGen::CodeGenFunction::EmitBuiltinExpr(clang::FunctionDecl const*, unsigned int, clang::CallExpr const*, clang::CodeGen::ReturnValueSlot) ./llvm/tools/clang/


#12 clang::CodeGen::CodeGenFunction::EmitCallExpr(clang::CallExpr const*, clang::CodeGen::ReturnValueSlot) ./llvm/tools/clang/lib/CodeGen/CGExpr.cpp:4218:0

#13 (anonymous namespace)::ScalarExprEmitter::VisitCallExpr(clang::CallExpr const*) ./llvm/tools/clang/lib/CodeGen/CGExprScalar.cpp:517:0

#14 clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) ./llvm/build/tools/clang/include/clang/AST/StmtNod


#15 (anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) ./llvm/tools/clang/lib/CodeGen/CGExprScalar.cpp:358:0

Clang compiles OpenCL sources with ‘-O2’ optimization level by default. However tests for enqueue_kernel() in clang are compiled with ‘-O0’.

So, it seems we have a bug here. Am I right?



Right. I am taking a look.



It seems one of the assumptions that E in emitOpenCLEnqueuedBlock has BlockExpr type is not right and therefore the assert is triggered inside the static cast. This can be easily fixed by calling IgnoreImplicit() method.

But then there is another issue due to the generation of the Block during the AST parsing of the block declaration and during the enqueue_kernel builtin generation path. I think we should avoid generating the block literal from the enqueue_kernel generation. This doesn’t really match the original compilation flow.

@Sam, I was just wondering whether we could avoid generating the literal inside the emitOpenCLEnqueuedBlock and pass the name of the block and num of its params to createEnqueuedBlockKernel or perhaps we could pass the BlockExpr? The wrapper kernel function is pretty simple at the end and doesn’t require the block invoke function itself (it can just rebuild the prototype). What do you think?


Thanks Anastasia for investigating the issue.

The wrapper kernel needs to call the block invoke function, which is created by emitOpenCLEnqueuedBlock.


The wrapper kernel needs to call the block invoke function, which is created by emitOpenCLEnqueuedBlock.

Yes, but do we actually need the block definition for emitting the call (i.e. llvm::Value for the block invoke function)?
auto *V = CGF.EmitBlockLiteral(cast(Block), &Invoke);

Could we just recreate the function prototype only, while emitting the kernel body inside createEnqueuedBlockKernel?

I have a fix for this issue https://reviews.llvm.org/D43240