May IR types be merged by llvm-link?

Hi all,

There is some uncertainty in the concept of LLVM IR, which results in unexpected IR in some cases. The problem description is here: https://reviews.llvm.org/D40567#943747. In short, llvm-link tries to merge an opaque type with its definition, using type name for that. Clang uses the same name for all specializations of a class template, so in this case llvm-link chooses arbitrary type as a definition. As a result the opaque type is mapped to wrong type in IR.

The question here is whether the opaque type resolution made by llvm-link is a correct operation, which in turn depends on what source language objects are represented in IR. Variables and function must exist in IR because these are entities directly represented in object files, but types do not have similar requirement. There are at least two viewpoints on which entities of source language should be represented in IR.

Case 1

LLVM IR is a functional equivalent of the compiled program. It tries to preserve information about externally visible objects (variables, functions) that may be used in operation on IR modules. In particular, as C++ defines rules of equivalence for types defined in different translation units, and these rules make opaque type resolution possible, the IR must have an equivalent for C++ type.

In this case clang must provide appropriate IR type identification, so that the same types in different translation units can be recognized. It can be made by assigning each type a unique name.

Case 2

LLVM IR is a low-level representation designed for code generation. Some information about externally visible objects may be lost, it is expectable. In particular, IR types belong solely to internal machinery, they have no relation to types used in source language.

In this case opaque type name resolution made by llvm-link is incorrect operation and must be removed. Only functions and variables may be merged and opaque type resolution may occur only as a side-effect of such merge.

It looks like now clang and llvm-link follow different concepts.

I wonder which viewpoint complies with the IR design.

This simply doesn’t reflect reality. Opaque and identified structure types exist to make IR easier to understand for humans; the names don’t affect the semantics of any IR instruction, and transformations frequently throw away types to perform optimizations. clang should try to do this anyway; not because we have to, but just to make IR easier to read. -Eli

Thank you, Eli.

Hello Eli,

Nonetheless, It seems opaqueness of types affects analyses and code generation; see isSized() calls. So maybe it’s not purely about readability.