clang10 mis-compiles simple C program transpiled from brainfxxk

Hello,

I’m really amazed to find out that under -O3, a simple piece of C code generated from a brainfxxk-to-C transpiler is miscompiled.

As one probably know, the C code transpiled from brainfxxk only contains 3 kind of statements:

(1) ++(*ptr) / --(*ptr)

(2) ++ptr / --ptr

(3) while (*ptr) { … }

where ptr is a uint8_t*.

So it seems very clear to me that the code contains no undefined behavior (the pointer is uint8_t* and unsigned integer overflow is not UD).

After further investigation, it seems like clang compiled this loop:

while (*ptr) {
–(*ptr);
++ptr;
++(*ptr);
–ptr;
}

to an unconditional infinite loop under -O3, resulting in the bug. The code snippet above seems completely benign to me.

I attached the offending program. With

clang a.c -O0

it worked fine (it should print out an ASCII-art picture of mandelbrot fracture). However, with -O1 or -O3, it goes into a dead loop (in the code snippet above) after printing out a few characters.

I also tried UndefinedBehaviorSanitizer. Strangely, when compiling using

clang a.c -O3 -fsanitize=undefined

the code worked again, with no infinite loop, and no undefined behavior reported.

So it seems to me a LLVM optimizer bug. I would greatly appreciate if any one is willing to investigate.

Best,
Haoran

src.zip (4.64 KB)

I did a simple bisect on clang version, and it seems like clang 8.0.0 works correctly, but clang 9.0.0 failed to compile the code correctly.
https://godbolt.org/z/676Grr ← if you change the clang version to 8.0.0, you will see the expected output in ‘output’ section.
I don’t have the ability to bisect on clang git history. I would greatly appreciate it if any one is willing to do that.

Thanks!

Haoran Xu <haoranxu510@gmail.com> 于2020年10月21日周三 下午8:47写道:

A further bisect using opt’s -opt-bisect-limit option shows that the following pass is causing the issue:

BISECT: running pass (39) Early CSE w/ MemorySSA on function (main)

Haoran Xu <haoranxu510@gmail.com> 于2020年10月21日周三 下午9:00写道:

Might be worth running the c source file through creduce or similar to narrow it down a bit that way too.

I was just able to determine the offending IR code before and after the transformation. I’m now almost certain it’s a bug in LLVM.

Before transformation, we have the following IR (I renamed all %xxx for brevity):

%1 = load i8, i8* %0, align 1
%2 = add i8 %1, -1
store i8 %2, i8* %0, align 1

The above IR is inside a loop, so the value in %0 can be different in each run.

The optimization pass changed the IR above to the following:

store i8 %3, i8* %0, align 1

where %3 is defined by

%4 = load i8, i8* %0, align 1
%3 = add i8 %4, -1

in an earlier piece of IR.

Apparently the pass treated %3 the same thing as %2 and it fired CSE, without realizing that the content in %0 may have been changed by the loop.

David Blaikie <dblaikie@gmail.com> 于2020年10月21日周三 下午10:18写道:

Hi David,

I just tried creduce, but it generated a code with completely undefined behavior (out of range accesses, etc).
The “interesting” criteria I used is “timeout in -O1 but finishes in 20s in -O0”. However, it turns out that the undefined behavior in creduce-generated code just makes the criteria “happens to” pass.

Best,
Haoran

Haoran Xu <haoranxu510@gmail.com> 于2020年10月21日周三 下午10:32写道:

I managed to bisected out the diff introducing the bug: https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02

And I managed to figured out a fix: it seems (to me) that the diff is mostly a code refactoring diff, but the author accidentally removed two lines of functional logic, resulting in the bug.
After adding the two lines back, the miscompiled code works fine now. However, I’m not certain of the performance implications of clearing the whole AAQI.

diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index 232132958f7…9a8ca46093d 100644
— a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -836,6 +836,8 @@ AliasResult BasicAAResult::alias(const MemoryLocation &LocA,
AliasResult Alias = aliasCheck(LocA.Ptr, LocA.Size, LocA.AATags, LocB.Ptr,
LocB.Size, LocB.AATags, AAQI);

  • AAQI.AliasCache.shrink_and_clear();
  • AAQI.IsCapturedCache.shrink_and_clear();
    VisitedPhiBBs.clear();
    return Alias;
    }

I also noticed that the author forgot to change a few function calls to use his new API, but I’m not sure if it is a correctness issue or not. At least it’s not causing the brainfxxk code miscompilation.

diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp b/llvm/lib/Analysis/AliasAnalysis.cpp
index 06a33f659c1…94a9d12eb17 100644
— a/llvm/lib/Analysis/AliasAnalysis.cpp
+++ b/llvm/lib/Analysis/AliasAnalysis.cpp
@@ -231,7 +231,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase *Call,

// If Loc is a constant memory location, the call definitely could not
// modify the memory location.

  • if (isModSet(Result) && pointsToConstantMemory(Loc, /OrLocal/ false))
  • if (isModSet(Result) && pointsToConstantMemory(Loc, AAQI, /OrLocal/ false))
    Result = clearMod(Result);

return Result;
@@ -308,7 +308,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase *Call1,

// ModRefC1 indicates what Call1 might do to Call2ArgLoc, and we use
// above ArgMask to update dependence info.

  • ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc);
  • ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc, AAQI);
    ArgMask = intersectModRef(ArgMask, ModRefC1);

// Conservatively clear IsMustAlias unless only MustAlias is found.
@@ -349,7 +349,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase *Call1,
// might Mod Call1ArgLoc, then we care about either a Mod or a Ref by
// Call2. If Call1 might Ref, then we care only about a Mod by Call2.
ModRefInfo ArgModRefC1 = getArgModRefInfo(Call1, Call1ArgIdx);

  • ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc);
  • ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc, AAQI);
    if ((isModSet(ArgModRefC1) && isModOrRefSet(ModRefC2)) ||
    (isRefSet(ArgModRefC1) && isModSet(ModRefC2)))
    R = intersectModRef(unionModRef(R, ArgModRefC1), Result);

I’ll submit a patch for review, but I would really appreciate any comments here as well.

Thanks.

Best,
Haoran

Haoran Xu <haoranxu510@gmail.com> 于2020年10月22日周四 上午12:12写道:

I suspect that you found the same issue as https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e. We’re trying to find a solution to that, but it’s not entirely simple.

Nikita

I don’t know anything about LLVM internals, but solely from the commit message of your diff I think it’s very likely.

So my current understanding is that the shrink_and_clear() is required to produce correct aliasing result, but if we naively do it for AAQI (as my patch proposed), we probably lose most of the performance benefits of BatchedAA.

However, this is only my understanding after spending 2 hours reading the code from zero background knowledge, so I guess I can’t say much about it.

Nikita Popov <nikita.ppv@gmail.com> 于2020年10月22日周四 上午6:41写道:

Some clarifications to the description in https://reviews.llvm.org/D89991:
The clearing of the cache is not supposed to be done in BatchAA. The patch introducing BatchAA was not a refactoring patch, and it did not accidentally miss the lines to clear those caches.

Nikita flagged the issue and I looked into it. The condition that’s broken in BatchAA is the isValueEqualInPotentialCycles. This condition can be satisfied in a query and a result cached, but the condition broken in a subsequent query. Working out a fix is not straightforward.
Yes, clearing the whole cache is a big hammer we can use, and, yes, it will lose all the performance benefits.

For visibility, I uploaded a WIP fix:
https://reviews.llvm.org/D90062

There are still problems I have yet to solve in the above patch, hence the lack of reviewers. However, please feel free to test it and let me know if it resolves your failure.

Best,
Alina

Hi Alina,

Thanks for your reply! That makes sense to me.

Regarding the other point I mentioned (the missing AAQI parameters in some of the function calls), do you think they make sense, or is it intentional that you use the non-batchAA version there?
My current understanding is that changing them to use BatchAA might expose more issues that triggers the isValueEqualInPotentialCycles bug, but if you have the bug fixed, make them use BatchAA as well could probably improve performance a bit.

Best,
Haoran

Alina Sbirlea <asbirlea@google.com> 于2020年10月23日周五 上午11:21写道:

Hi Haoran,

Yes, you’re right, the instances you added in the patch can also pass along the AAQI param.
Do you have commit access to check in this change or should I do it on your behalf?

The fix Nikita proposed (https://reviews.llvm.org/D90066) should resolve the original issue in AA and BatchAA, but please let us know if you’re still seeing it with this patch.

Best,
Alina

Hi Alina,

Thanks for your reply. I don’t have commit access (it’s my first patch to LLVM), so please do it on my behalf.
It’s kind of hard for me to compile LLVM on my local machine, so can you just check that the failing code I attached in original email works (i.e. does not dead-loop and prints out a fracture-like ASCII art) in the original email after Nikita’s diff?

Thanks!
Haoran

Alina Sbirlea <asbirlea@google.com> 于2020年10月23日周五 下午3:48写道:

Hi Haoran,

I think you may have come across an issue that was fixed already. I cannot reproduce the failure at ToT.
In godbolt it reproduces with the clang11 release, but not with trunk.

Best,
Alina

That’s interesting. I didn’t try trunk since I thought LLVM 11 is already the latest. Thanks again!

Alina Sbirlea <asbirlea@google.com> 于2020年10月23日周五 下午5:15写道: