[RFC] Pretty Printing Immediate Arguments in LLVM Intrinsics

[RFC] Pretty Printing Immediate Arguments in LLVM Intrinsics

This is a follow-up with a more focused scope on the proposal discussed here.

LLVM is seeing an influx of target-specific intrinsics, each tailored to expose hardware specific features. As the target-specific intrinsics are growing in both number and complexity, improving the readability of LLVM IR has become increasingly important. One aspect that hinders the readability is the use of immediate arguments to encode various compile time options like several instruction/operation modifiers. This RFC proposes adding LLVM infrastructure that enables printing self-explanatory inline comments for the immediate arguments, thereby streamlining debugging and developer productivity.

This proposal extends LLVM’s intrinsic framework by adding a new PrettyPrintImmArg<ArgIndex<N>, "functionName"> property to the intrinsic definitions in the TableGen file. This generates automatic mapping between immediate arguments and target-specific print functions. All formatting logic and mappings are resolved during TableGen processing and embedded in the generated implementation (.inc) files. The AsmWriter is enhanced to detect pretty-print enabled intrinsics and call the appropriate print functions during IR emission, inserting human-readable comments before the immediate arguments.

Core Features

  1. Opt-In Feature: Enabled by the user via explicit command-line flags in llvm-dis (e.g., --print-formatted-intrinsics)
  2. Low Overhead: Intrinsics that do not utilize this feature incur only an O(1) cost to verify that the feature is not enabled and do not pay any extra overhead.
  3. Comment-Only Addition: All enhancements are pure comments with zero semantic impact.
  4. No Bitcode Impact: Pretty printing impacts only textual IR representation by addition of comments, and none of the pretty print artifacts go into LLVM bitcode.
  5. Complete Roundtrip Compatibility: Full llvm-as and llvm-dis roundtrip works smoothly (.bc ↔ .ll)
  6. Full Backward Compatibility: Existing IR, tools, and workflows remain unaffected.

Example

Consider this example of an intrinsic from the tcgen05.mma.* family below to illustrate the feature.

def int_nvvm_tcgen05_mma_tensor : DefaultAttrsIntrinsic<[], 
                                                       [llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
                                                       [IntrWriteMem, IntrArgMemOnly,
                                                       ImmArg<ArgIndex<5>>,
                                                       ImmArg<ArgIndex<6>>, 
                                                       ImmArg<ArgIndex<7>>,
                                                       PrettyPrintImmArg<ArgIndex<5>, "printTcgen05MMAKind">,
                                                       PrettyPrintImmArg<ArgIndex<6>, "printTcgen05CTAGroup">,
                                                       PrettyPrintImmArg<ArgIndex<7>, "printTcgen05CollectorUsageOp">]>;

Without pretty printing (by default)

call void @llvm.nvvm.tcgen05.mma.tensor(ptr addrspace(6) %dtmem, ptr addrspace(6) %atensor, i64 %b, i32 %idesc, i1 %enable_inp_d, i32 3, i32 2, i32 3)
call void @llvm.nvvm.tcgen05.mma.tensor(..., i32 0, i32 1, i32 2)
call void @llvm.nvvm.tcgen05.mma.tensor(..., i32 1, i32 1, i32 1)
call void @llvm.nvvm.tcgen05.mma.tensor(..., i32 2, i32 2, i32 0)

With pretty printing (enabled by --print-formatted-intrinsics)

call void @llvm.nvvm.tcgen05.mma.tensor(..., /* kind::i8 */     i32 3, /* cta_group::2 */ i32 2, /* collector::a::use */     i32 3)
call void @llvm.nvvm.tcgen05.mma.tensor(..., /* kind::f16 */    i32 0, /* cta_group::1 */ i32 1, /* collector::a::fill */    i32 2)
call void @llvm.nvvm.tcgen05.mma.tensor(..., /* kind::tf32 */   i32 1, /* cta_group::1 */ i32 1, /* collector::a::lastuse */ i32 1)
call void @llvm.nvvm.tcgen05.mma.tensor(..., /* kind::f8f6f4 */ i32 2, /* cta_group::2 */ i32 2, /* collector::a::discard */ i32 0)

Although this RFC focuses on the pretty-printing of immediate arguments of the intrinsic, further improvements such as the inclusion of argument names can be explored in future work.

For those interested in the implementation details, a draft PR is available here: [LLVM-Tablegen] Pretty Printing Immediate Arguments in LLVM Intrinsics by DharuniRAcharya · Pull Request #162629 · llvm/llvm-project.

1 Like

Adding a few folks for feedback,

@nikic , @arsenm , @Artem-B , @jurahul

Thanks @DharuniRAcharya . Just expressing my support as here. As demonstrated by the test change in the draft PR, this opt-in feature makes the intrinsic calls a lot more readable and we are hoping other upstream and downstream targets may find this useful.

2 Likes

I think this is a good idea. It’d be great if this not a llvm-dis only flag but a command line option defined in the library, as I also want this feature to be available when generating textual LLVM IR from frontends like clang (so that you can do something like -mllvm -print-formatted-intrinsics)

1 Like

Makes sense. It can be a cl::opt defined in AsmWriter.cpp.

I like the general idea.

I wonder what people feel about potentially going even further and allowing not just pretty-printing, but also parsing of more human-readable IR?

That was a part of the earlier proposal last year that is linked above (or atleast considered at that time). We even had discussed printing the “modifiers” as suffixes of the intrinsic name and elide them from the argument list (internally at least). One concern/takeaway from that discussion was that hand-modifying LLVM IR may be a niche use case to not warrant the !/$, but that can be debated. Also, in terms of staging, it may make sense to add support for this and then naming intrinsic args to provide more semantic information in the LLVM IR for reading and then consider parsing support.

Another +1 here for the proposal in general.

Empathic “yes” from me. E.g. my workflow would typically look like this:

  • compile CUDA code
  • stare at IR. annotated arguments help a lot
  • Feed that IR to llvm-reduce or other tools.
  • Run througo no-op opt to annotate args. Stare at the result.
  • go back to the un-annotated IR, edit, repeat.

Right now, I’ll have to regenerate the IR w/o annotations if they can’t be parsed, and I’d rather skip that extra step and be able to just use annotated IR as is, and avoid adding extra steps and files.

Bonus points for improving IR test readability, where inlined comments could be conveniently placed next to the parts they describe, without having to do creative line splitting.

Thanks. I think we can stage things as follows:

  1. Get immarg pretty-printing/formatting (this RFC) in.
  2. Extend to support named intrinsic arguments for additional readability. Related to this, I was also thinking if we can augment the Intrinsic API to support querying arguments by name, aka, similar to MIR getNamedOperandIndex(). I ran into cases where having such an API would help simplify the code.
  3. Parsing support ,that is Feature 3b in [RFC] Pretty printing for LLVM Intrinsic arguments (Short) - IR & Optimizations - LLVM Discussion Forums.

All of these are covered in the earlier RFC last year. Looks like there interest to pursue all of them.

Sounds great, ship it. :slight_smile:

Has anyone given serious consideration to backporting limited subsets of MLIR functionality to LLVM? Target-specific intrinsics with immediate-only arguments sure sound a lot like MLIR Operation attributes.

I don’t want to expand scope or make light of a serious undertaking. I just want to point out that I think there is a potential large open project here, somewhat in the vein of what @jurahul was doing some time back to make target intrinsics more pluggable. We were discussing how we’d make it possible to disable targets, and I think there was no appetite for the increased complexity at the time, but if the number of target intrinsics were to double, maybe it becomes worth it.

Generally in favor of this. Some notes on the specifics:

  • I agree with what was already mentioned above: This should be a global option for all tools, not llvm-dis specific.
  • Is there any reason not to enable this by default?
  • I find your specific printing format a bit weird, and I think it also indicates that we really should be splitting up the “argument name” and the “value printing” part of this into separate features.

Looking at this example:

call void @llvm.nvvm.tcgen05.mma.tensor(..., /* kind::f16 */ i32 0, /* cta_group::1 */ i32 1, /* collector::a::fill */ i32 2)

Why the “namespace” style notation for the argument name? Why is this kind::f16 and not kind=f16?

The cta_group bit really just needs the argument argument, the bit after cta_group:: just matches the argument value.

Without getting too hung up on the specifics, I’d structure the feature something like this:

def int_nvvm_tcgen05_mma_tensor : DefaultAttrsIntrinsic<[], 
                                                       [llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
                                                       [IntrWriteMem, IntrArgMemOnly,
                                                       ImmArg<ArgIndex<5>>,
                                                       ImmArg<ArgIndex<6>>, 
                                                       ImmArg<ArgIndex<7>>,
                                                       ArgInfo<ArgIndex<5>, "kind", "printTcgen05MMAKind">,
                                                       ArgInfo<ArgIndex<6>, "cta_group">,
                                                       ArgInfo<ArgIndex<7>, "collector", "printTcgen05CollectorUsageOp">]>;

Where ArgInfo accepts an argument name and an optional argument printing function. And then the printing would be something like this:

call void @llvm.nvvm.tcgen05.mma.tensor(..., /*kind=f16*/ i32 0, /*cta_group=*/ i32 1, /* collector=fill*/ i32 2)

The advantage here is that a) you can define argument names without also defining a printing function (which means this would also work for non-immarg arguments) and b) you can reuse printing functions more broadly. If you have two arguments that take values from the same set, you don’t need two printing functions.


As a side note, I feel like our current way to specifying argument-specific information (like attributes) in TableGen is not great because we have to repeat the ArgIndex all the time. It would be nicer if we could just put everything together like Arg<5, ImmArg, Name<"kind">, PrintingFunction<"printFoo">>…

We initially used the “namespace-style” notation (e.g., kind::f16) to align with how modifiers like kind, cta_group, etc., are represented in the PTX ISA (kind::f16, cta_group::1). We wanted to share the same printer function (for a given set) between pretty-printing and the InstPrinter in the (NVPTX) backend.

However, I see that we can still reuse the same printer function even with the name–value split style (kind=f16) you suggested. So, we can adopt the kind=f16 notation.

Thank you all for your feedback! I’ll try the changes suggested (mainly the ArgInfo implementation, global option for all tools) and update here.

The cl::opt objects are thread-unsafe, so they cannot be correctly used for LLVM-based jit compilers in common case. However, this feature is useful to debug such compilers. So, I would ask to support an optional AsmWriter constructor parameter in addition to the cl::opt to make the pretty printing configurable on fine-grain basis in multithreaded environments.

I tried the suggested changes, particularly the ArgInfo implementation, and they seem to work well. I would appreciate it if you could review the latest version of the PR: [LLVM-Tablegen] Pretty Printing Arguments in LLVM Intrinsics by DharuniRAcharya · Pull Request #162629 · llvm/llvm-project.

Unless I’ve missed something, /* comment */ isn’t IR syntax, we’ve only got ; comment. Adding an inline comment syntax seems legitimate to me independent of this though, (* whichever *) it is.

I think I’m in favour of more meaningful strings than i32 42 on arguments to intrinsics and opposed to splattering comments across the IR dump. How about enumerations as an actual thing instead? As in at the top level introduce enumeration types, where the arguments are usable as immediates to intrinsics and anything else, and semantically it’s a constant of whatever type?

Then we pretty-print as the enumeration value and it’s still an i32 immediate or whatever for the existing code paths.

C-style inline comment support was added last year: [LLVM][AsmParser] Add support for C style comments (#111554) · llvm/llvm-project@b8ac87f · GitHub

2 Likes

Right, I added C style comments in preparation for implementing this feature at that time.

The change has been merged - [LLVM-Tablegen] Pretty Printing Arguments in LLVM Intrinsics by DharuniRAcharya · Pull Request #162629 · llvm/llvm-project.
Many thanks to everyone for your valuable reviews and feedback!

2 Likes