Recursive types

I noticed that LLVM IR allows struct types to contain themselves, as in this test case: llvm-project/llvm/test/Verifier/recursive-type-1.ll at main · llvm/llvm-project · GitHub

%rt2 = type { i32, { i8, %rt2, i8 }, i32 }

Is there a real use case for this? If not, could we ban it at type construction time (and in StructType::setBody) so that other code (like StructType::isSized) would not have to worry about it?

Yes, we should forbid this. The only legitimate use of type recursion I’m aware of was via pointee types, and that went away with opaque pointers.

My preferred way of forbidding this would be to require all struct member types to be specified up-front for named structs, in the same way as they are for literal structs. This not only removes the recursion, but also the mutability from the type system. (The separate setting of the body used to be necessary for the pointee type recursion.)

1 Like

Do you think IR still needs all three of opaque, identified and literal structs?

Incidentally if we manage to remove setBody then I think LangRef should be updated to remove any suggestion that “opaque types” are a kind of structure type (even if we still implement them as a kind of llvm::StructType).

I think that identified structs may still be useful from an IR readability perspective, but I don’t think we need opaque structs anymore.

FWIW, I think there’s some merit to keeping the potential for recursion in target extension types, since those could be of a “typed pointer” nature.

I assume you mean recursion through the target extension types’ type parameters, not through their layout type. I agree that that sounds less problematic than types like %rt2 above that try to contain themselves in a layout sense. But, if we remove StructType::setBody then there will be no way to create recursion of any kind in the type system.

1 Like

Removing StructType::setBody doesn’t necessarily conflict with what I wrote. Recursion through target ext types would need some new mechanism / new entry point in the first place. I think it’s reasonable to simplify now and then maybe add complexity again later when/if it’s warranted.

+1 to removing recursive structs and type mutability, for what it’s worth

(There’s a whole pile of code I had to clone into LowerBufferFatPointers that would be way simpler if it could just take a recursive walk through a type without having to worry about cycles that can’t exist these days anyway)

I’ve been looking into what it would take to get rid of type mutability, and one annoying case I ran into is that the IR parser allows type forward references, so something like this is valid:

%a = type { %b }
%b = type { i8 }
@g = external global %a

However, we not just allow it, it’s also how the IR gets printed (all types grouped at the top, but with definitions after uses). So we can’t remove support for this without removing the ability to parse a lot of old IR.

It would be possible to work around this (by having our own type representation in LLParser for incomplete types), but it’s going to be rather ugly.

Does bitcode have a similar problem? I think forward references are allowed in bitcode, but are they commonly generated in practice?