Hi all,
Howdy Matthijs,
I've also been developing an interest in using IR annotations for my compiler.
Some discussion with Bart turns out that he has implemented some code to parse
the llvm.globals.annotations array, but in no way integrated or reusable.
We've spent some thought about how this could be done properly, which I will
share here.
Ok, cool. Annotations are tricky to do right
Firstly, however, I was wondering about the format of the
llvm.globals.annotations array. It does not seem to be defined in the LLVM
language reference, shouldn't it be? It's name suggests that it is a reserved
variable name with a fixed type (similar to intrinsic functions?).
Yes, we should document it. It is a convention established by the __builtin_annotate function in the c compilers. We should standardize it and document it.
Furthermore, it seems that the AnnotationManager that is currently implemented
is capable of keeping a list of Annotations for any Annotatable (currently
only Function). These annotations are kept in memory only and really have
nothing to do at all with the annotations in the IR.
Yes, this is a really old mechanism that we should rip out. MachineFunction should be moved to be an analysis that is preserved as an actual part of the passmanager, instead of being a thing we tack onto the Function object. We have killed all uses of this old annotation mechanism except MachineFunction.
Still, it seems that using the AnnotationManager to make the IR annotations
accessible seems like a decent approach.
I agree that *having* an annotationmanager makes sense, but the existing one should die and be replaced.
The way I see this is having some pass, or probably the assembly reader or the
AnnotationManager itself, parsing the llvm.global.annotations variable and
adding annotations to the corresponding GlobalValues. This would just leave the
annotations in the IR as well, so that transformation passes would properly
preserve them (and, just like debug info, sometimes be prevented from
modifying some annotated global values unless they are taught how to preserve
the annotations).
Makes sense. This is similar to how the MachineDebugInfo stuff deserializes debug info out of the LLVM IR and presents it for easy consumption of the code generator.
By using a subclass of Annotation (say, GlobalAnnotation) we can distinguish
between annotations that are (or should be) in the IR and (the existing)
annotations that should be in memory only. This would also allow for newly
added annotations to be immediately be added to the IR, ensuring that the
AnnotationManager's view remains consistent with the IR.
I think we need to distinguish between two forms of annotation:
1. there are some "annotations" like "readonly", "nounwind", etc that are baked into the LLVM IR and are/should be documented in LangRef.
2. There are annotations that are really "cheap extensions" of the LLVM IR that are either experimental, very domain specific, or that are just metadata about the code.
For #1, the current "parameter attributes" we have work reasonable well, and Devang is actually cooking up a proposal to extend them a bit (to fix some issues with LTO). #2 is something that llvm.annotate handles reasonable well, but I agree it would be great to have a nice interface to update/read them.
The advantage of #1 is that the compiler as a whole knows about the attributes, but this means that adding one is "hard". The advantage of #2 is that they are easy to add, but they have limitations and can impact codegen (e.g. they disable IPO in some cases).
A problem I could imagine using this approach would be name conflicts. Since
any annotation name could come from the IR, these could conflict by the other
names already in use (such as "CodeGen::MachineFunction" IIRC). This could be
solved by using a "GlobalAnnotation::" prefix for the name, or something
similar.
It could also be served by making them completely string based, and just provide a simple string interface? That way you don't need classes for each attribute.
-Chris