Adding a return statement into a conditional block causes a EXC_BAD_ACCESS (code=EXC_I386_GPFLT) using LLJIT

tommasopeduzzi · November 2, 2021, 7:57am

Hello!

I recently started making a JIT compiler for my own language and I have just gotten to the point of conditional statements. My code produces the following IR when putting a return statement inside the conditional block:

define double @fib(double %i) {
entry:
  %x = alloca double, align 8
  %i1 = alloca double, align 8
  store double %i, double* %i1, align 8
  store double 0.000000e+00, double* %x, align 8
  %0 = load double, double* %i1, align 8
  %1 = fcmp ult double %0, 2.000000e+00
  %2 = uitofp i1 %1 to double
  %3 = fcmp one double %2, 0.000000e+00
  br i1 %3, label %then, label %else

then:                                             ; preds = %entry
  %4 = load double, double* %i1, align 8
  store double %4, double* %x, align 8
  %5 = load double, double* %x, align 8           ; These two lines seem
  ret double %5**                                 ; to be the problem
  br label %continue

else:                                             ; preds = %entry
  %6 = load double, double* %i1, align 8
  %7 = fsub double %6, 1.000000e+00
  %8 = call double @fib(double %7)
  %9 = load double, double* %i1, align 8
  %10 = fsub double %9, 2.000000e+00
  %11 = call double @fib(double %10)
  %12 = fadd double %8, %11
  ret double %12
  br label %continue

continue:                                         ; preds = %else, %then
  %13 = load double, double* %x, align 8
  ret double %13
}

define double @entry() {
entry:
  %0 = call double @fib(double 1.700000e+01)
  ret double %0
}

These two lines in the then BasicBlock seem to be the problem:

  %5 = load double, double* %x, align 8
  ret double %5**

If I just set the value of %x and return it afterwards the code runs as expected.
When calling the lookup of the “entry” on the LLJIT I get an error like this:

EXC_BAD_ACCESS (code=EXC_I386_GPFLT)

The debugger tracks this down to some machine code, that from what I can see is my compiled program:

	vmovsd %xmm0, (%rsp)
	movq   $0x0, 0x8(%rsp)
	vmovsd 0xfe6(%rip), %xmm1         ; xmm1 = mem[0],zero
	vcmpnlesd %xmm0, %xmm1, %xmm0
	vmovsd 0xfe1(%rip), %xmm1         ; xmm1 = mem[0],zero
	vandpd %xmm1, %xmm0, %xmm0
	vxorpd %xmm1, %xmm1, %xmm1
	vucomisd %xmm1, %xmm0
	je     0x103f35043
	vmovsd (%rsp), %xmm0              ; xmm0 = mem[0],zero
	vmovsd %xmm0, 0x8(%rsp)
	retq                              ; The error occurs here 
	jmp    0x103f35078
	vmovsd (%rsp), %xmm0              ; xmm0 = mem[0],zero
	vaddsd 0xfc0(%rip), %xmm0, %xmm0
	callq  0x103f35000
	vmovsd %xmm0, 0x10(%rsp)
	vmovsd (%rsp), %xmm0              ; xmm0 = mem[0],zero
	vaddsd 0xfb0(%rip), %xmm0, %xmm0
	callq  0x103f35000
	vaddsd 0x10(%rsp), %xmm0, %xmm0
	addq   $0x18, %rsp
	retq
	vmovsd 0x8(%rsp), %xmm0           ; xmm0 = mem[0],zero
	addq   $0x18, %rsp
	retq
	nopw   %cs:(%rax,%rax)
	pushq  %rax
	vmovsd 0xf87(%rip), %xmm0         ; xmm0 = mem[0],zero
	callq  0x103f35000
	popq   %rax
	retq

Specifically to the first retq instruction.

When setting a variable and returning it after the block the problem doesn’t seem to happen.

Can someone explain why the above IR is incorrect and how I can fix the problem?

Any help is enormously appreciated!
Tom

TNorthover · November 2, 2021, 9:30am

A basic block is only allowed one terminator instruction, and both ret and br count, so %then and %else are invalid. I didn’t see anything at a glance wrong in the assembly you pasted, but really all bets are off if you’ve tried to compile invalid IR.

It’d probably be a good idea to run the verifier over any modules you’re producing, at least in debug mode. It’d have caught this issue.

tommasopeduzzi · November 2, 2021, 1:34pm

Thanks for the reply!
The problem here is that the %else block compiles without a problem, even with two terminator instructions. It’s just the %then block that causes the problem. Am I overseeing something?
Another problem I have is that I can’t run the Module Pass Manager over my Module, it tries accessing 0x8, as %rax is null. Do you by any chance know what is going wrong here?

fodinabor · November 2, 2021, 2:32pm

Having two terminators is really just wrong.
Probably all passes expect there to be a single terminator but it’s not necessarily checked by any of them, so it might happen to just work.
But as said, it’s just plain invalid IR, so please fix that before spending any more time investigating other issues.

tommasopeduzzi · November 2, 2021, 3:03pm

You’re right, I was just responding to his recommandations of running the verifier over the modules. Regarding the faulty IR: Shouldn’t the second terminator just be ignored, as it will just leave the block after the first terminator? Why isn’t this the case?

TNorthover · November 2, 2021, 3:13pm

I have no idea if these scenarios actually occur (first seems more likely), but just off the top of my head:

Pass wants to know what happens after %else, so it just grabs the last instruction in the block (the only terminator after all) and assumes flow will continue at %continue. Maybe it forwards values from one block to another or infers that would lead to some UB and wipes out the whole function after there.

Pass wants to calculate how big a basic block is so it iterates through the instructions until it hits a terminator (there can be only one) and stops. Maybe the wrong offset gets put into a jump instruction because of this.

It’s far easier in the long term to just define all those worries away (as well as hitting random assertions where people have been careful) by ensuring the IR is valid in the first place.

tommasopeduzzi · November 2, 2021, 5:46pm

Thanks for your help!
I fixed the problem in this specific case, by checking if the basic block already has a terminator in it before inserting the br instruction. This seems quite janky though.
If I had some pass (possibly the Verifier-Pass) go over the module, would it detect these kind of double terminator basic-blocks and correct them? If so, what Pass could do this?

TNorthover · November 2, 2021, 8:31pm

The verifier pass won’t fix your code (how would it know which terminator you wanted to keep?), but would print an error and return failure. It’s one of the most easy passes to run, it looks like there’s an llvm::verifyModule function you can call after including Verifier.h so you don’t even need to faff around with pass managers.

tommasopeduzzi · November 3, 2021, 3:15pm

Thanks for clearing this up. Indeed, the verifier tells me when the IR is faulty, allowing me to stop the JIT-execution and avoiding a SIGSEV.
Now, I want the user to be able to write multiple instructions, including multiple return instructions, even if the later ones are just ignored. Just blindly including them in the IR makes the frontend incredibly simple. My hope is that there is a pass that removes everything in a basic block after the first terminator instruction, deletes empty basic blocks or deletes unnecessary basic blocks that aren’t referenced anywhere. Do I have to write my own? Is this even a good idea? What would be an alternative that wouldn’t make the frontend too complicated?
If Phi-Nodes are better the mem2reg pass will include these when it’s more efficient. At least that’s what I understood from the official Kaleidoscope tutorial.
Thanks a lot for your help, I really appreciate it!!

fodinabor · November 3, 2021, 3:47pm

You could either just loop over everything that comes after the ret in a BB and erase those instructions or maybe try running llvm::simplifyCFG, which should resemble running the SimplifyCFG pass, which in your example, seems to remove this kind of dead code… see: Compiler Explorer

tommasopeduzzi · November 5, 2021, 8:26am

I tried running the SimplifyCFG pass, but it does not seem to do exactly what I want, which is to remove everything in a Basic Block after the first terminator. I have resorted to writing my own pass, which shouldn’t be too complicated. The problem that I’ve run into with this is that I cant iterate forward, and eraseFromParent() Instructions as I go, as this causes an bad memory access error. I could save when the first terminator instruction occurs and delete the last instruction until the last instruction is the terminator/the size of the basic block is the same as the length up to the first terminator. Is there a better way to do this?

fodinabor · November 5, 2021, 9:00am

The common pattern to do this in LLVM is to have one pass iterating over the instructions of the BB, filling the instructions to be removed into a worklist. Afterwards, you can iterate over the work list and just erase the instructions.

llvm::SmallVector<llvm::Instruction*> WL;
for(auto& I : BB)
  if(...)
    WL.push_back(&I);
for(auto *I : WL)
  I->eraseFromParent();

Topic		Replies	Views
Adding conditionals with LLVM Beginners	4	195	February 5, 2024
Basic block with two return instructions LLVM Dev List Archives	6	69	February 5, 2010
If Conversion and predicated returns LLVM Dev List Archives	3	107	April 10, 2013
JIT Stub Problem LLVM Dev List Archives	0	51	December 19, 2007
Adding an Object File to JIT Beginners	2	737	March 19, 2021

Adding a return statement into a conditional block causes a EXC_BAD_ACCESS (code=EXC_I386_GPFLT) using LLJIT

Related Topics