LLJIT global constants string becomes invalid in generated code


Recently I hit an issue that LLJIT crashes when CodeGenOpt::Less or higher is given.

After investigation, it turned out that the issue is some global constant string in the IR, like

@.str.117 = private unnamed_addr constant [9 x i8] c"lineitem\00", align 1

becomes an invalid pointer in the generated code.

$1 = 0xf7fab054 <error: Cannot access memory at address 0xf7fab054>

The issue doesn’t show up when CodeGenOpt::None is given. It also doesn’t show up when the host program (which uses the LLVM library) is compiled with “-O0”. So it looks like some kind of use-after-free issue.

Does any one have seen similar issues before, or have some idea on how to fix this?


Hmm, it turns out that the issue actually lies in CodeModel. If CodeModel::Small is specified, it seems like LLJIT assumes the data symbol lies in first 2GB memory. It then silently truncates the address of the constant string to a 32-bit pointer, without reporting any error. That’s why the address becomes invalid (the actual address for the string is at something like 0x7ffff7fab054).

Given that LLJIT never attempts to allocate its executable code in first 2GB memory (I grepped ‘MAP_32BIT’ in LLVM and found nothing), is it error prone to allow CodeModel::Small to be specified?


Haoran Xu <haoranxu510@gmail.com> 于2020年11月4日周三 下午8:57写道: