InferAddrSpace] The operand with non-FLAT-address-space got Undefined when rewriting its user to new address space

Consider the following IR targeted to AMDGPU:

%agg.tmp1 = alloca %struct.float4, align 16, addrspace(5)
%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0

In InferAddrSpace pass, the %agg.tmp12.sroa.0.0..sroa_idx.i is considered to be in FLAT address space (0) so it’s specific address space
need to be inferred. From %agg.tmp1, the address space is inferred to be PRIVATE address space (5).

When rewriting %agg.tmp12.sroa.0.0..sroa_idx.i to the new address space, the operand %agg.tmp1 is not taken care of, so %agg.tmp1 is set to be Undef, which is wrong.

To resolve this issue, in the function of static Value *operandWithNewAddressSpaceOrCreatePoison, I think below code snippet need to be inserted:

static Value *operandWithNewAddressSpaceOrCreatePoison(
    ...

    if (Value *NewOperand = ValueWithNewAddrSpace.lookup(Operand))                                                                                                                                
        return NewOperand;

    // __Insert this code snippet below__
    // If the address space of the operand already meets the requirement, just return it                                                                                                                                                                                                                                                                                                    
    if (Operand->getType()->getPointerAddressSpace() == NewAddrSpace)                                                                                                                       
        return Operand;
    ...

Any ideas? Thanks in advance.

Can you provide a reproducer, you can try it out on godbold and include the link.

That said, this sounds more like a bug.
Feel free to file it, preferably with reproducer, on our issues tracker on GH.

Hi jdoerfert, thanks for the suggestion.

This is the godbolt link (Compiler Explorer). The input is a llvm-ir targeted to amdgpu, translated from a cuda-example. Actually, I am not quite sure if the input ir is alright in its syntax. However, on my local server, I can observe that the compilation failed after the pass “InferAddrSpace”.

I am new to llvm community, can you kindly tell me how to “fire it on issues tracker on GH”? I have fired an issue on github ([InferAddrSpace] The operand with non-FLAT-address-space got Undefined when rewriting its user to new address space · Issue #64851 · llvm/llvm-project · GitHub)

I am not quite sure if the input ir is alright in its syntax

From the reproducer it’s difficult to judge, it would be much more helpful if you can reduce the IR with llvm-reduce, also attach that to the github ticket.

However, on my local server, I can observe that the compilation failed after the pass “InferAddrSpace”.

Do you mean you get a different error locally compared to the one I see on godbolt?

I am new to llvm community, can you kindly tell me how to “fire it on issues tracker on GH”?

I think @jdoerfert meant to open an issue on GitHub, that you have done :slight_smile:

Hi, thanks for the reply.

  1. Yes you’re right, I will try llvm-reduce to make a minimal reproducible ir;

  2. I think both are of the same error. The difference is, on my local server, I can hack the code and dump all the modules before/after each pass, so I can locate the exact bug. However, on godbolt, it seems I cannot dump the modules before/after each pass if the compilations crashes (correct me if I am wrong);

  3. Well yes, I did not quite understand the abbreviation “GH”, Haha.

One more thing, it seems I cannot handwrite an error-prone llvm-ir and feed it to the llc since the Lexer would check against it and exit ASAP. It seems to me that the paragraph which causes the optimization passes to crash is generated in the fly.

When I say “error-prone llvm-ir”, I mean something look like this:

%agg.tmp1 = alloca %struct.float4, align 16, addrspace(5)             
%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0        
%agg.tmp12.sroa.0.0.copyload.i = load float, float* %agg.tmp12.sroa.0.0..sroa_idx.i, align 16

The suspicious bug is that when InferAddrSpace is inferring the address space of %agg.tmp12.sroa.0.0..sroa_idx.i, it would actually make %agg.tmp1 to Undef since %agg.tmp1 is not contained in the list of instructions that need to be inferred or predicated (since it has a non-flat address space).

However, if you feed this IR into llc, the Lexer would find that %agg.tmp12.sroa.0.0..sroa_idx.i must be labeled with address space 5 (same as its operand), so it just check fails and exists.

However, on godbolt, it seems I cannot dump the modules before/after each pass if the compilations crashes

You can pass -print-changed or -print-after-all but since the IR is huge you’ll get truncated output which won’t help in this case at least.

One more thing, it seems I cannot handwrite an error-prone llvm-ir and feed it to the llc since the Lexer would check against it and exit ASAP. It seems to me that the paragraph which causes the optimization passes to crash is generated in the fly.

I’m not sure what you mean by lexer here, if you want to custom write a test maybe you can modify the output from the module dump you have locally and only invoke a single pass that causes the crash.

For this case, I think you can try invoking infer-address-spaces pass on a small IR with this pattern to prove your case (see llvm/test/Transforms/InferAddressSpaces/AMDGPU/infer-address-space.ll).

Some updates:

  1. I failed to use llvm-reduce on my original IR since it crashed, the backtrace shows the same error as compiling with llc;

  2. I failed to use opt to run the single pass “infer-address-space” on my handwritten-tiny IR input, since the Lexer would also check against the syntax and exit if it complains.

  3. Godbolt shows different backtrace compared to my local llc version, both of them crashed.

  4. For now, the only way to reproduce this bug, is to compile the original huge IR with my local llc.

OK, so I added print-after-all to llc, the result is huge, but I can locate the error pattern in the dumped IR (attached file). Below is my observation:

On line 75288, we have:

%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0

%agg.tmp1 first appeared on line 75242, after the pass “Inliner for always_inline functions (always-inline)”:

%agg.tmp1 = alloca %struct.float4, align 16, addrspace(5)

Notice that %agg.tmp1 has addrspace(5), while %agg.tmp12.sroa.0.0..sroa_idx.i is not labeled with a address space, which makes it addrspace(0). This pattern would cause problem in the pass “infer-address-space”.

After “infer-address-space” pass, on line 111516, we have:

%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds

since the operand, %agg.tmp1 is wrongly considered by this pass to be undefined.

I would ask you to update your local llc to the latest version and attach your error-prone IR (reduced or not) as well as the crash backtrace.

This wouldn’t prevent you from using llvm-reduce.

The error is because you are trying to codegen an alloca with the wrong address space. This isn’t supposed to work. This IR is just PTX IR with the triple swapped, full of PTX intrinsics and features (and the wrong alloca addrspace). IR is not portable across targets. You have to produce as amdgcn IR to begin with, with the correct datalayout.

There perhaps is a bug in InferAddressSpaces where allocas using the wrong address space aren’t properly handled. This isn’t a practical concern, since non-addrspace(5) allocas are always going to fail to codegen anyway as you’ve observed

Not quite familiar with llvm-reduce. But as it crashes, no result file is written to the disk.

Well, I have translated the relevant intrinsics into amdgcn IR. The PTX intrinsics and features are just left uncleaned. You can trust me in this, since I have completed the compilation process with llc upon fixing the “bug” in InferAddrSpace.

I am not quite sure if it should be called as a “bug”, because the IR feed into InferAddrSpace actually does not meet the requirements by the Lexer. I am looking into this problem to locate the pass that generated this wrongly IR, which in turn, triggered the misbehavior of InferAddrSpace.

Update:

For now, I think the problem may be rooted from the function void PruningFunctionCloner::CloneBlock( const BasicBlock *BB, BasicBlock::const_iterator StartingInst, std::vector<const BasicBlock *> &ToClone) (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/CloneFunction.cpp)

// Eagerly remap operands to the newly cloned instruction, except for PHI
// nodes for which we defer processing until we update the CFG. Also defer
// debug intrinsic processing because they may contain use-before-defs.
if (!isa<PHINode>(NewInst) && !isa<DbgVariableIntrinsic>(NewInst)) {
    RemapInstruction(NewInst, VMap, ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges);
    ...
}

This code snippet is to clone an instruction to a new instruction with operands mapped to new ones. However, the caller failed to provide any TypeMapper.

For example, the old instruction is

%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4* %bodyPos, i64 0, i32 0

Its first operand %bodyPos is to be replaced by %agg.tmp1.

But %agg.tmp1 is defined in address space 5, so %agg.tmp12.sroa.0.0..sroa_idx.i must also be defined in address space 5.

agg.tmp1 = alloca %struct.float4, align 16, addrspace(5)

But what returned by RemapInstruction is

%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0

while expecting:

%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0, addrspace(5)

Huh? That second thing isn’t valid syntax. A GEP only has one address space, and that’s the address space of the pointer given to it, which is correct in the first line.

You are right. I made a mistake.
The actual error is the 3rd line:

%agg.tmp1 = alloca %struct.float4, align 16, addrspace(5)
%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0
%agg.tmp12.sroa.0.0.copyload.i = load float, float* %agg.tmp12.sroa.0.0..sroa_idx.i, align 16

while expecting:

%agg.tmp1 = alloca %struct.float4, align 16, addrspace(5)
%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0
%agg.tmp12.sroa.0.0.copyload.i = load float, float addrspace(5)* %agg.tmp12.sroa.0.0..sroa_idx.i, align 16

If that’s how it manifests, it should be trivial to reduce the IR just by running the IR verifier on the output for the interestingness check.

That’s the wired thing. Given

%agg.tmp12.sroa.0.0..sroa_idx.i = getelementptr inbounds %struct.float4, %struct.float4 addrspace(5)* %agg.tmp1, i64 0, i32 0

If you try to do

V->getType()->getPointerAddressSpace()

what you get is

float*

And it causes the ‘Infer Address Space’ to go mad.

No you don’t, an address space is an integer, but float* is a type.

Typo, you get 0 in this case. float* is when you do V->getType()->dump()

Update:

I believe the root cause has been found. It is the translation from nvvm->amdgpu_llvm.

In nvvm, alloca insructions are defaulted to addrspace(0) (or llc believes so), so when llc parses nvvm, the llvm::values are created with addrspace(0).

Later in optimization passes (for example ‘inline function’), new allocas are created with addrspace(5) (to meet the requirement of AMDGPU DataLayout). But the type of the users (for example GEP) remains to be addrspace(0). So this pattern drives the ‘Infer Address Space’ to go mad.

Looking at the -print-after-all output it seems that AlwaysInliner is garbling things here when trying to handle byval arguments whose address spaces aren’t the alloca address space. When it creates the alloca it uses the default alloca address space (and that propagates to the direct uses of it, the GEPs), but the subsequent uses of it use the original alloca address space.

Compiler Explorer is a simple test to show the problem. LangRef doesn’t obviously have anything to say about this (and the IR verifier doesn’t balk at the input), but it is quite a strange thing to have.

Thanks for the compiler explorer demo, it’s great!

Not quite sure if this pattern (allocas exist with address space different from DataLayout requirement) is legal to AlwaysInliner.

If it is illegal, it would be better for AlwaysInliner to complain ASAP; otherwise, it is a bug need to be fixed in AlwaysInliner.

Infer Address Space pass has a chance to fix the IR (as I have shown in the first question of this post), but again, is this pattern legal to it (address space of user is flat, while address space of operand is non-flat) ?

Which is the pass need to be fixed?
Or at the end of the day, is the input IR legal?