I have been working on [flang][OpenMP] Parallel region failure · Issue #106297 · llvm/llvm-project · GitHub and believe this shows a broader problem with the LLVM IR generation for delayed privatization. This RFC is to agree on a solution to this problem.
Here is a minimal reproducer
program bug
implicit none
integer :: table(10)
!$OMP PARALLEL PRIVATE(table)
table = 50
if (any(table/=50)) then
stop 'fail 3'
end if
!$OMP END PARALLEL
print *,'ok'
End Program
I can only reproduce this at -O0
in this case, but I believe the same thing could happen for other code at different optimization levels.
Problem Description
Currently, the translation of privatization for omp.parallel
from MLIR to LLVMIR (OpenMPToLLVMIRTranslation.cpp
) works something like this
Initially the privatized variable is referenced via an MLIR block argument to the parallel region.
- Replace (in MLIR) all usages of this block argument with the non-private copy of the variable (constructor for
OmpParallelOpConversionManager
). In this case, the shared copy oftable
is a global variable. - Generate LLVMIR for the body of the parallel region (
bodyGenCB
) using this incorrect reference to the shared copy oftable
- Run the privatization callback (
privCB
) which inlines the region which allocates the private variable and sets itsllvmReplacementValue
argument with what the private variable instance should be replaced with (in LLVMIR). - Replace references to the shared copy of
table
with the value for the private copy oftable
provided by the privatization callback in step 3 (OpenMPIRBuilder::createParallel
). - Undo the changes to the MLIR (destructor for
OmpParallelOpConversionManager
)
The problem comes in step 4. This replacement does not work correctly if the value being replaced is an llvm::Constant
instead of an llvm::Instruction
. In the code example above, we are replacing the address of a global variable (treated as a constant) with a runtime value (an address returned by alloca). If the llvm::Constant
is used by an llvm::Instruction
we replace it successfully (e.g. where it is used in getelementptr
to lower table/=50
). However llvm::Constant
s can also be used as parts of other llvm::Constants
(this is used to create a constant struct initializer used to populate the box passed to the runtime call to Assign
for table = 50
). In this case we cannot insert the result of the alloca into the compound constant (a constant cannot depend upon a runtime value). The conversion silently skips these cases and so not all references to the shared copy of table
are converted.
The result of this is that the assignment table = 50
(or any other runtime call using the box descriptor) is performed on the shared copy of table
not the private copy.
But what about clang?
Why doesn’t clang face the same issue? The privCB
callback implemented by clang looks like a no-op (it sets the llvmReplacementValue
equal to the source value so that no replacement is performed). I haven’t dug into this but presumably clang generates its own code for privatization in the body of the parallel region without relying upon the system in OpenMPIRBuilder
. So far as I can tell, MLIR is the only real upstream usage of the privCB
argument.
Potential Solutions
1. Implement transformations for instructions using the compound constants
In this case, the compound constant is used in a store
. This store instruction can be rewritten to accept a runtime value for the descriptor’s data address field.
This is by far the simplest solution, but I do not think it is general enough. The replacement in step 4 happens in OpenMPIRBuilder
, which is intended to share OpenMP LLVMIR generation amongst all LLVM frontends. We therefore cannot make any assumptions about what LLVM IR is contained inside of the parallel region. It is not possible in general to replace all uses of an llvm::Constant
with a runtime value. Furthermore, it would be difficult to maintain a set of all of the possible transformations to allow instructions using constants to be changed to use runtime values. I have not found any existing helper in LLVM which already does this (but I am not very experienced in this area and might have missed something).
2. Don’t use privCB
in OpenMPToLLVMIRTranslation.cpp
The steps enumerated in the problem description are too error prone. Compare this to how we create private copies of reduction variables:
- Inline and convert to LLVMIR the allocation and initialization of the reduction variables
- Set up a mapping from the MLIR block argument for the reduction variable to the LLVM value for the allocated private reduction variable
- Generate LLVMIR for the body of the parallel region using the correct value for the private reduction variable from the start
Presumably this was not done for privatization because the privatization callback given to OpenMPIRBuilder
runs after the body generation callback and so the llvm values for the private copies of the variable are not available at the time when the body of the parallel region is translated.
This solution would be to move privatization out of privCB
and merge it with how reduction variables are processed. This could allow for some code sharing between privatization and reduction. This is my preferred solution.
The drawback of this approach is that it leaves the privCB
callback unchanged in OpenMPIRBuilder
, and this could lead to the same bug in other front ends.
3. Remove privCB
from OpenMPIRBuilder
Follow on from solution (2) by removing this privatization callback which is no longer used in upstream frontends.
This would require broader buy-in from maintainers of OpenMPIRBuilder and might reduce future code sharing between frontends.
4. Move calls to privCB
before bodyGenCB
in OpenMPIRBuilder
This would allow keeping this callback design, possibly allowing for more code sharing between frontends, but by calling privCB
before bodyGenCB
it can work more like reduction variable privatization (described in solution 2).
This would require broader buy-in from maintainers of OpenMPIRBuilder and would be a silent semantics change to downstream code (it would still build but do something different).