crash with large structure values on the stack


This example input crashes if you run it through llc on x86.

[begin example]

; ModuleID = ‘test’

%struct_2 = type { [90000 x %struct_1] }
%struct_1 = type { i8 }

define void @testFcn(%struct_2 %in1) {
%in1_ = alloca %struct_2
store %struct_2 %in1, %struct_2* %in1_, align 8
%localStruct_ = alloca %struct_2
store %struct_2 %in1, %struct_2* %localStruct_, align 8
br label %exit

exit: ; preds = %testFcn_entry
ret void

[end example]

It looks like at some stage of the backend compiler flow there is a “merge_values” instruction generated where the number of inputs exceeds 16k, but the number of inputs is stored in an unsigned short. When this instruction is being translated into x86 machine code, then there is an out of bounds access:

~> llc bug-simple.bc
llc: /local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649: llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo < NumValues && “Illegal result number!”’ failed.

Probably the truncation of NumOperands should be caught directly with an assertion in the SDNode constructor.

One other interesting aspect of it is that if you make the struct_2 type a smaller matrix, like:

%struct_2 = type { [65534 x %struct_1] }

Then you don’t get a crash, instead it takes about 20+ minutes to process and you get a lot of movb instructions out - like 200k or so. Callgrind says that 33% of the time consumed is directly in llvm::SUnit::ComputeHeight(). Clearly something has gone badly non-linear for this case.

I looked at what clang is doing for a similar construct and I see it generates a memcpy intrinsic instead of the direct load/store. I’m now doing that in my own front-end, but it seems like this is intended to be supported so I thought I’d report it. It seems like turning this construct into memcpy as an optimization or part of the back-end lowering might be a better long-term approach.

I filed this as bug #21671 in bugzilla.

Let me know if you’d like more information or if I can help come up with a fix. (I’m not familiar at all with the backend so I’d need some guidance to be of much help.)

Dale Martin

While it would be good to get this bug fixed, I want to point out that frontends are generally discouraged from emitting large aggregate loads and stores. It’s considered “more canonical” for the frontend to use @llvm.memcpy to move this stuff around, and then load the individual elements as needed. See past discussions with “daedal nix” about optimizer problems in this area.

I believe this is the same as

I had a dumb fix for this which simply bumped up the size of SDNode::NumOperands and SDNode::NumValues.

Hal had very reasonable reservations with this approach because it increases the size of a structure which is used a lot.

​I think an assertion that NumOperands doesn’t get truncated during construction would be a good first step. (From your review it looks like that has been added - great!) But then whatever is creating this merge_values nodes probably should be visited next - it doesn’t seem like a scalable way to generate code. I understand Reid’s statement that this isn’t a high priority.