Named metadata to represent language specific logic

Hi all,

I was wondering if we could use named metadata to store some of C++
logic without changing the IR. This is primarily only for front-end
buiding and the resulting IR (with or without metadata) should be the
same as it is today (or better).

I say this because of the number of global variables front-ends need
to keep because LLVM IR cannot represent all the information of types,
vatriables, functions (like sizes, offsets, alignment, linkage
semantics etc). So, if we could generate some generic IR with
annotations, and run a pass before validation that would convert all
those annotations into another, lower, IR, coding front-ends would be
much simpler.

That would also allow back-ends to understand those named metadata and
possibly generate correct code without the necessity of the final
pass, but I gather that some people find it repulsing to have metadata
with meaning in IR, so I won't go as far as to suggest that... :wink:

Some examples below. Don't pay too much attention to the syntax or the
contents, I'm just brainstorming...

;====================================
; Unions & bitfields
; union U { int a; int b:3; int c:3; char d; }
%union.U = type { i32 }, !union;

!union = metadata { metadata !U.a, metadata !U.bc, metadata !U.d };
!U.a = metadata { metadata !intID, metadata !"align", i8 4 };
!U.bc = metadata { metadata !U.b, metadata !U.c };
!U.b = metadata { metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3 };
!U.c = metadata { metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3, metadata !"offset", i8 3 };
!U.d = metadata { metadata !charID, metadata !"align", i8 4 };

;====================================
; Linkage information on a function
; extern inline f_() { return "const string"; } // "const string" HAS
to be common to ALL comp.units
define linkonce_odr i8* @_Z2f_f() nounwind inlinehint, !extern {
entry:
  ret i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0)
}

!extern = metadata { metadata !"common group", metadata !"_Z2f_f" };

-> so, if inside a function that has metadata "extern", returning the
constant string should place the string into a common group, even
though it's not declared itself as such.

;====================================
; Class size
; struct Base { char a[3]; Base() {} };
; struct Derived : Base { char b; }
%struct.Base = type { [3 x i8] }, !BasePadding;
%struct.Derived type { %struct.Base, i8 }, !DerivedPadding;

!BasePadding = { metadata !"size", i8 1 };
!DerivedPadding = { metadata !"size", i8 3 };

So, Base's padding is only applied when inside Derived, and GEP can
still work on the element directly. Sizes could be relative to WORD
size, if one wanted a truly generic IR, but that would raise a lot of
questions... Ignore that for now.

The final pass would replace all GEPs to those classes, unions,
constant returns into the confusing IR we have today.

I know each front-end could do that on its own, but if there an
interest among other front-end developers (specially C++) to have such
feature, we could do a more generic approach, so we could extend
support for specific languages without drastically changing the
LangRef. (As a matter of fact, is that something we want in the long
run?)

Would that benefit other languages that cannot be properly represented
in IR? OpenCL?

Thoughts welcome, even harsh ones. :wink:

Hi Renato,

I was also applying extensible metadata to my project. :slight_smile:

;====================================
; Unions & bitfields
; union U { int a; int b:3; int c:3; char d; }
%union.U = type { i32 }, !union;
!union = metadata { metadata !U.a, metadata !U.bc, metadata !U.d };
!U.a = metadata { metadata !intID, metadata !"align", i8 4 };
!U.bc = metadata { metadata !U.b, metadata !U.c };
!U.b = metadata { metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3 };
!U.c = metadata { metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3, metadata !"offset", i8 3 };
!U.d = metadata { metadata !charID, metadata !"align", i8 4 };

I think type cannot have Named Metadata on current llvm code.
If you will distinguish type with Named Metadata and type without
Named Metadata, you will also have to change type system and codes
related to it. For example, To distinguish "type { i32 } !union" and
"type { i32 }", StructValtype has to be change and then type of value
with "type { i32 } !union" is distinguished from type of value with
"type { i32 }". This property will give confusion to related codes.

I suggest Named Metadata with all of union types as following:

%union.U = type { i32 };

!llvm.uniontypes = metadata !{!0}

!0 = metadata !{metadata !"union.U", metadata !1, metadata !2, metadata !5 };
!1 = metadata !{ metadata !intID, metadata !"align", i8 4 };
!2 = metadata !{ metadata !3, metadata !4 };
!3 = metadata !{ metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3 };
!4 = metadata !{ metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3, metadata !"offset", i8 3 };
!5 = metadata !{ metadata !charID, metadata !"align", i8 4 };

"!0" will must point to own IR type like first argument (metadata !"union.U").
Other method may be needed to point to own IR type becuase
initializer of union type sometimes has temporary type.

;====================================
; Class size
; struct Base { char a[3]; Base() {} };
; struct Derived : Base { char b; }
%struct.Base = type { [3 x i8] }, !BasePadding;
%struct.Derived type { %struct.Base, i8 }, !DerivedPadding;
!BasePadding = { metadata !"size", i8 1 };
!DerivedPadding = { metadata !"size", i8 3 };

%struct.Base = type { [3 x i8] }
%struct.Derived type { %struct.Base, i8 };

!llvm.classtypes = metadata !{!0, !1}

!0 = !{ metadata !"struct.Base", metadata !"size", i8 1, and other informations };
!1 = !{ metadata !"struct.Derived type", metadata !"size", i8 3, and other informations };

I agree to use extensible metadata to store informations of high level
language without changing the IR. I think it is so hard working for me
to change IR becase of a lot of side effets and compatibility.

Thanks,
Jin-Gu Kang