[RFC] Pretty printing for LLVM Intrinsic arguments

[RFC] Pretty printing for LLVM Intrinsic arguments

This RFC proposes adding LLVM infrastructure to support pretty printing and parsing of intrinsics arguments. The motivation is to increase the readability/grokability and hackability of LLVM intrinsics. Although LLVM intrinsics have target defined semantics, when intrinsics use a long list of arguments and/or use immediate arguments with opaque values to encode their behavior, it may become difficult for a human reader to decode/understand a specific call to such an intrinsic. The end goal of this RFC is to enable a human who is reading or modifying LLVM IR to more easily work with LLVM intrinsics. This mostly includes compiler engineers working with and debugging the compiler, and frontend/higher level compiler engineers who want to interface with LLVM and use these intrinsics. In a production compiler flow, I’d assume that LLVM IR rarely gets printed or parsed, so in that context, LLVM IR printing and parsing is a debug tool for compiler developers, and the goal of this RFC is to increase their productivity by making intrinsics less opaque than what they are today in LLVM assembly.

Additionally, as a biproduct, it may offer additional flexibility for backends when designing their intrinsics.

Summary

Broadly, this RFC is proposing three independent features:

  1. Support for C style comments /*...*/ in LLVM’s assembly syntax.
  2. Ability to define names for intrinsics arguments and have them printed in LLMV IR as inline comments.
  3. Ability to pretty-print the value of immediate arguments and parse the pretty-printed value (this is a hackability aid). These are referred to as formatted immediate arguments in this RFC.

Together, they should help improve the readibility of LLVM intrinsics by proving additional information that will reduce the cognitive load on the human working with intrinsics. Having the ability to parse pretty-printed values of immediate arguments also free up enginners from needing to know exact encoding of immediate args if they want to change it, they just need to know the pretty printed/formatted encoding to change it.

A TL;DR summary of the proposal is as follows:

; Feature (1): C style comments.
; -----------------------------------------------
/* LLVM assemby will support C style comments that span multiple lines.
 * - Below we want to test fadd(fmul)=>fma transform with FTZ enabled
 * - Additional, we need to use rounding mode rtz to exercise the special
 *   handling of this mode in function HandleRoundingModes.
 */ 
%t = llvm.target.fmul.ftz(float %x, float %y)
%w = llvm.target.fadd.rtz(float %t, float %w)

; LLVM assemby will support "inline" comments that do not span till end of line.
%y = /*ignored*/ call @log(/*num=*/ %x, /*base=*/float 10.0) /* ignored */

; Feature (2): Argument names for intrinsics.
; -----------------------------------------------
; Intrinsic declarations will optionally print argument names using inline comments.
declare void @llvm.target.some_op(/*ftz=*/i1, /*rnd=*/i8, /*value=*/float)

; Argument names will be printed in intrinsic calls.
call void @llvm.target.some_op(/*ftz=*/i1 0, /*rnd=*/i8 3, /*value=*/float %x)

; Feature (3): Formatted immediate arguments.
; -----------------------------------------------
; Immediate argument values will be pretty printed as 1 or more '.' prefixed tokens
; using inline comment following the argument name.
call void @llvm.target.some_op(/*ftz=.ftz*/i1 0, /*rnd=.rtz*/i8 3, /*value=*/float %x)

; Argument names and pretty printing for immarg are orthogonal and can be opted in
; on a per argument basis. Here:
; - arg0 just uses arg name feature.
; - arg1 uses arg name and formatted imm arg.
; - arg2 uses just formatted imm arg
; - arg3 does not use arg name (and is not an immarg)
call void @llvm.target.some_other_op(/*ftz=*/i1 0, /*rnd=.rtz*/i8 3, /*.low_precision*/i8 0, float %x)

; IR parser will support reading formatted immediate arg values, so that IR can
; be hand-edited to tweak the immarg values.
call void @llvm.target.some_op(/*ftz=*/.ftz, /*rnd=.rtz*/i8 3, /*.value=*/float %x)
; edit to => 
call void @llvm.target.some_op(/*ftz=*/.ftz, /*rnd=*/.rtne, /*.value=*/float %x)

Background

LLVM intrinsics support 2 kinds of arguments currently: regular runtime inputs to the intrinsic (including varargs), and immediate args which is a way to encode compile time parameterization of an intrinsic. Immediate arguments are required to be compile time constants (either integer or floating-point scalar values currently) and the intent is that generally (though not always) these get encoded in the final machine instructions and hence need to be compile time constants. A very simple example is X86 interrupt intrinsic, which is defined as follows:

def int_x86_int : Intrinsic<[], [llvm_i8_ty], [ImmArg<ArgIndex<0>>]>;

The single input to the intrinsic is an 8 bit interrupt vector and is required to be a compile time immediate since its encoded as an 8-bit immediate into the instruction. Without immediate arguments, we would need to define up to 256 different intrinsics for each possible interrupt vector value. This would add a lot of boilerplate code in both the intrinsic definitions as well as actual LLVM code (IR transforms or backend code) to handle all these intrinsics, as well as the frontend which is going to generate these intrinsic calls. With immediate arguments, we can define just one LLVM intrinsic. The downside is that the immediate arguments to these intrinsics are just integers and are printed out as such in the LLVM IR. If we had to instead define 256 (or a subset) of different intrinsics, we could choose to name them as llvm.x86.int.dos (or even llvm.x86.int.21h for DOS interrupts) and llvm.x86.int.video (instead of llvm.x86.int.16 for BIOS video interrupts) for better readability, but at the cost of dealing with a large number of intrinsic ID enums.

At the LLVM IR level, LLVM’s IR verifier checks that the arguments marked as ImmArg in the TD definitions of intrinsics are actually constant values, but beyond that LLVM does not seem to do anything more with this information. That means that frontends that generate these intrinsics have to use constant values for immediate arguments and LLVM transformations and backends that handle these intrinsics in LLVM IR can assume that the immediate arg values will be compile time constants and do not need to handle cases where these could be runtime values (like generating switch case code over all possible values).

In short, immediate arguments help improve the compiler implementation ergonomics and convenience, at the potential cost of readability. Immediate arguments are used extensive in LLVM upstream and downstream backends. They play a role similar to operation attributes in MLIR, but limited to integer and float types.

All intrinsic arguments (immediate or otherwise) are specified in the intrinsic definition by providing a list of parameter types when defining an intrinsics. No names are associated with either the input(s) or the output(s) of the intrinsic.

Motivation

GPUs and other accelerators are rapidly evolving to support new features and performance improvements for critical workloads like ML inference and training. As an example, NVIDIA GPUs have added special purpose accelerators called Tensor Cores to accelerate matrix multiplication, and these accelerators keep evolving drastically with each GPU generation, either to support new use cases or improve performance. Compiler support at the lowest levels of the compiler stack for these generally relies on intrinsics since the programming model for these accelerators is nuanced, with several degrees of freedom and restrictions to configure, and LLVM intrinsics is the only available mechanism to introduce target dependent extensions to LLVM IR (unlike MLIR that allows adding custom operations). Built on top on this LLVM support are libraries like CuDNN/CUTLASS and higher-level compilers like XLA and Triton.

One challenge in supporting such accelerators with intrinsics is that the myriad variety of ways to configure them can lead to issues similar to what immediate arguments attempt to address. These configuration options are in several cases encoded in the instruction, so we need to use immediate arguments for these configuration options. As a result, these intrinsics can end up with a dozen or more of such immediate arguments, leading to unreadable LLVM IR. Additionally, as these intrinsics potentially evolve over time (either internally during the HW/SW codesign phase or across GPU generations), we may need to add new variants of these intrinsics with just minor changes in the configuration options available. However, the cost of doing these changes in the compiler may be non-trivial due to lots of argument position changes for these intrinsics.

Note that even without tensor cores, some GPU specific intrinsics can have a long set of immargs for various configuration bits. This includes things like texture related intrinsics, or load/store intrinsics with additional qualifiers for caching behavior, ordering etc. For example, llvm.amdgcn.ds.ordered.add has 6 immediate arguments and NVIDIA has some internal load/store intrinsics that have ~10 immediate arguments. The same readability and extensibility issues apply to these intrinsics as well. In addition to immediate arguments, several such intrinsics accept a number of runtime arguments as well. These long lists of runtime arguments hinder readability as well. It may be possible to logically sequence these runtime arguments so that their meaning can be decoded from the position, but that may not always be possible and is additional cognitive load on anyone looking at a call to one of these intrinsics.

As an example, most GPUs support a texture read/write intrinsic that can write a given array slice and level-of-detail (LOD) of mipmapped texture. Such intrinics may have one i32 argument for the array slice and another i32 argument for the LOD level to read/write. Additional, 1D variants of these will also have an i32 argument for the texture coordinate as well. So a call to such an intrinsic will have 3 i32 scalar values as arguments and it may not be imemdiately obvious which i32 corresponds to which value. As an example, the Metal shading language specification defines a texture array read as:

Tv read(uint coord, uint array, uint lod = 0) const;  // for 1DArray
Tv read(uint2 coord, uint array, uint lod = 0) const; // for 2DArray

When using LLVM intrinsics to represent this builtin, it would help to have some hints as to which argument is which value.

This proposal is an attempt to address some of these issues, both from a LLVM IR readability and hackability POV as well as the C++ code that handles these intrinsics.

With the current LLVM intrinsic design, we have 3 choices to represent any intrinsic with some compile-time parameterization. We will consider a running example of a floating-point unary operation that take 1 float input and its rounding mode and FTZ (flush denorm to zero) behavior can be configured at compile time. FTZ can be on or off and rounding mode can have 4 possible values (rne, rtz, rup, rdn). The design choices for such an intrinsic are:

  1. Have a different intrinsic for each possible combination of configurations, and only have runtime values as intrinsic arguments. This would lead to 8 different intrinsics: some_op.{ftz/noftz}.{rne/rtz/rup/rdn}. There is obvious readability advantage for these intrinsics, but compiler implementation is expensive as we need to define all 8 variants in TD files and handle them in codegen. Note that C++ code/TD defs can be structured to mitigate these effects of duplication in some cases, but maybe not always and at the end of the day, we have code dealing with 8 different but very similar intrinsics.

  2. Have one i1 immarg for ftz, and a i8 immarg for rounding mode. This reduces the number of intrinsics to 1, but LLVM IR is less readable as one has to decode the meaning of each immarg when reading the IR (which depends on its position in the arg list and its actual value). Mutating the IR by hand also not as easy for experimentation as you need to know the encoding of these immarg values. Additionally, with a large number of args, the IR dumps may become unwieldy and easier to get lost.

  3. Have both the FTZ and Rounding mode packed into a single 8/16/32-bit config word. This keeps the number of immargs down to one/few, but with packed immargs, the IR dumps become even more opaque. However, for some backends this form may offer better implementation ergonomics in terms of dealing with the variety of configuration options available and evolving them over time (tensor core related intrinsics are a motivating example here). As an example, in the packed immediate argument mode, we can repurpose one of the existing unused bits in the packing for a new configuration bit. This keeps the existing code working (including existing LLVM IR) and adding support for the new configuration at select places in the backend is much easier than say supporting a new immarg (which results in potential rearrangement of argument position as well as breaking existing LLVM IR unless we also implement an auto-upgrade path). It does not obviate the need to make sure the new modifier is handled correctly everywhere and design the packing with some foresight in mind, but can definitely reduce unnecessary churn (and resulting bugs) in the code.

Ideally, we would like good IR ergonomics of #1 (easy to read and modify the IR) coupled with flexibility for backends to choose either option #2 or #3 as a way of encoding the immediate arguments. Additionally, decoding the meaning of each argument from the position is error-prone, so having argument names attached to intrinsic arguments and being able to print them can help readability as well. Together these are essentially proposals for in-built pretty printing support for intrinsics that individual intrinsics can opt-in.

Requirements

If we were to support something like this in LLVM, below are some requirements that need to be satisfied to be able incorporate this feature in LLVM:

  1. These pretty printing features need to be opt-in, so existing intrinsics work without any changes and even intrinsics that adopt these features should be printed in “raw” mode by default (i.e., pretty printing should be disabled by default). LLVM’s AsmWriter should accept a flag to turn it on (which will be wired to command line arguments for various tools like llvm-dis/opt etc).
  2. Any intrinsic that does not make use of these features should not pay any significant compile time cost in terms of printing and parsing that intrinsic. Implementation wise, any potentially expensive code path to support this should be exercised only after a cheap check to see if the intrinsic in question actually has opted into these features.
  3. Intrinsics should be able to opt in gradually, and any modifications to the intrinsic definitions should be incremental changes to current Intrinsic.td definitions.
  4. [?] Similar to current “raw” mode intrinsics, the pretty printed syntax should enable intrinsic upgrade. That means unknown/unrecognized syntax should not result in a parse error and be handed over to the auto upgrader to give it a chance to auto upgrade. And if auto-upgrade fails, these failed-to-parse intrinsics should survive in the IR as unknown intrinsics. Note that this is [?], in the sense the output that LLVM’s AsmWriter will produce will always have arg names and pretty printed immarg values in comments, and one has to explicitly edit these delete the “raw” value of an immarg and “expose” the pretty printed value. We could say that in such cases, since the input is non-standard, upgrade support will not work for this path and the parser is expected to parse only the current versions of such pretty printed immarg values.
  5. We should have end-to-end unit tests to test various aspects of this RFC. What that means is either volunteering a couple of existing LLVM intrinsics to use these features and serve as a test vehicle, or having a new set of test intrinsics, built in into LLVM for the sole purpose of e2e testing (less desirable). It seems we would still need to have test intrinsics that use these features while its being developed, and once rready, we can adopt some existing intrinsics to use these features and then deprecate the test intrinsics (and switch over the unit tests to the existing intrnsics).
  6. (Stretch goal) LLVM’s intrinsic infrastructure and LLVM’s Asm Printer and Parser will now have additional code and static data to help support this feature. The size of that depends on how many intrinsics actually adopt this feature. When code/data size are of concern for a particular deployment, it should be possible to disable these features at build time without any other changes. So, it should be possible to enable/disable a built time option to strip out pretty printing support say when building the final deployment version of the compiler but keep it enabled when building internal versions used for debugging (Note: This is different from being enabled/disabled in different build configurations like Assert/Release/Debug). This could rely on either a complete stripping out of this feature using C++ preprocessor or similar features or through a combination of that and compiler DCE (for instance, if all code to exercise this is guarded by a 1-bit per-intrinsic query, if that query always returns false, the compiler could DCE all the pretty printing code).
  7. (Stretch goal) Related to the stretch goal above, it might be good to isolate the support for this into a new IntrinsicFormat component in LLVM and only AsmParser and AsmWriter components link with it. Currently, AsmWriter is part of LLVM Core, so that will require some restructuring of dependencies, so just mentioned here for completeness.

Proposal (1): Support C Style comment in LLVM IR

This is fairly straightforward to implement (see draft PR: [LLVM][AsmParser] Add support for C style comments by jurahul · Pull Request #111554 · llvm/llvm-project · GitHub). If there are concerns about allowing this generally, it may be possible to make the support modal, disabled by default, and LLVM’s AsmParser will enable it only when parsing a call instruction’s argument list if the callee is a function that could be an intrinsic (name starts with llvm.) and disable after argument list parsing. However, being able to add inline comments could have its own utility say when writing .ll test cases. So the prefererence is to add this generically.

Proposal (2): Support named intrinsic arguments

This feature will allow specifying argument names for intrinsic arguments.

  1. LLVM’s Intrinsic class (in Intrinsics.td ) will support specifying names for arguments. This will be supported by a new ArgName list that can be specified with the intrinsic, as follows:

    class ArgName<ArgIndex idx, string name> {
      int ArgNo = idx.Value;
      string Name = name;
    }
    
    class Intrinsic<list<LLVMType> ret_types,
                    list<LLVMType> param_types = [],
                    list<IntrinsicProperty> intr_properties = [],
                    string name = "",
                    list<SDNodeProperty> sd_properties = [],
                    bit disable_default_attributes = true,
                    list<ArgName> arg_names = []> : SDPatternOperator {
    ...
    }
    

    This optional list can be used to specify names for all or a subset of intrinsic arguments. The names need to be unique and confirm to the following syntax: [a-zA-Z][0-9a-zA-z_]* (essentially, all valid un-escaped LLVM identifiers but no $ or . which we will use for special purpose during intrinsic upgrade assuming support for that is needed). If an intrinsic argument has an argument name specified, it will be printed before that argument using the inline comment syntax as : /* ArgName= */. The LLVM’s IR parser will ignore these comments. That also means that there is no linting for these names if say they mismatch the actual specified names in the intrinsic definitions, and are treated truly as comments.

    Additionally, the AsmWriter can also print argument names in intrinsic declarations. Given that they are printed with the call, printing argument names with intrinsic declarations has questionable utility, but can be done for consistency [AI: Decide if needed or not]

  2. LLVM’s AsmWriter will support a bool arg that will enable or disable intrinsic pretty printing. Unlike AsmParser which is a single class that drives LLVM assembly parsing, LLVM IR printer entry points is scattered throughout the code in the form of per class print function that take arguments like IsForDebug for some control of what is printed. So one option is to do some refactor beforehand to replace all IsForDebug with a AsmPrintOptions struct, which will include both IsForDebug and EnableFormattedIntrinsics to control pretty printing of intrinsics. The value of EnableFormattedIntrinsics will be false by default, and llvm-dis (and any other llvm tool) opt will take a new command line option to enable formatted intrinsics. This single bool will control both argument name and imm arg formatting.

  3. Implementation wise, LLVM’s intrinsic emitter will generate a 1-bit isFormatted table similar to isOverloaded. The bit will be set if the intrinsic has any named argument or any immediate argument with formatting enabled. In AssemblyWriter::printInstruction code that handles CallInst, there we will add this 1 bit check, and then execute the formatted intrinsic arg code path, where the code will first query that intrinsics argument names (by calling a new intrinsic emitter generated Intrinsics::getArgumentNames(Intrinsic::ID) function) and print any non-null entries as comments preceding the argument values. This function will look like void Intrinsics::getArgumentNames(Intrinsic::ID, SmallVectorImpl<const char *> Names) (instead of returning an ArrayRef<const char *>) so that we can optimize the storage of arg names internally using classes like StringToOffsetTable and SequenceToOffsetTable to dedupe same arg names used across multiple intrinsics.

  4. Note that once we have argument names for intrinsics, its also possible to generate enumerations from those argument names, to use instead of magic argument numbers in the code. As an example, for an arg named intvec of intrinsic llvm.x86.int, we can generate an enum

    enum Intrinsic::x86::int_args { // enums for argument indexes for a specific intrinsic.
       intvec = 0,
    }
    

    And use that instead of 0 in the code. This might be useful for intrinsics with several arguments, however, we do not plan to implement this as a part of this RFC. If there is enough interest, we can start another RFC later down the road to discuss this. Additionally something like Value* IntrinsicInst::getArgOperand(StringRef Name) is also a possibility that can be explored later. This uses the intrinsic argument names for things beyond formatting, so need to consider how it interacts with Requirement (6) above where we would like to strip out this support conditionally for code/data size reasons.

This effectively concludes proposal (2).

Proposal (3): Support for formatted immediate arguments

This feature will allow specifying an optional ImmArgFormat object for each ImmArg in the intrinsic definition. The ImmArgFormat object eseentially captures the printer and parse for a given ImmArg.

  1. LLVM’s ImmArg property, which is used to define immediate arguments, will support an optional format defined as follows:

    class ImmArgFormat<string Name, LLVMType type, string CppNameSpace=""> {
      string printerName = !strconcat("print", Name);
      string parserName = !strconcat("parse", Name);
      LLVMType Type = Type;
      string CppNameSpace;
    }
    
    // A record used as the default value when no printing support is opted in.
    def NoFormat : ImmArgFormat<"", llvm_void_ty>;
    
    class ImmArg<AttrIndex idx, ImmArgFormat F = NoFormat> : IntrinsicProperty {
      int ArgNo = idx.Value;
      ImmArgFormat Fmt = F;
    }
    
    def RoundingModeFmt : ImmArgFmt<"RoundingMode", llvm_i8_ty>;
    
    def int_target_some_op : Intrinsic<[], [llvm_i8_ty],[].., 
                                     [ImArgs<ArgIndex<0>, RoundingModeFmt>],..]>;
    

    LLVM’s intrinsic emitter backend will use the NoFormat object as a marker to infer that no immediate arg formatting was enabled for that particular ImmArg. Otherwise, it will generate a declaration of the printer and parser function for this formatter in the Intrinsic::<CppNameSpace> namespace as follows:

    namespace llvm::Intrinsic::<CppNameSpace> {
      // print the value of the imm arg. Return false if printing failed.
      // (unknown value encountered for instance)
      bool print##Name(raw_ostream &OS, <type> Val);
    
      // Parse the formatted value. Return std::nullopt if parsing failed.
      std::optional<type> parse##Name(StringRef FormattedValue); 
    }
    

    where <type> is the C++ type corresponding to the LLVMType specified in the ImmArgFormat class. Note that at least initially, only a subset of integer typed ImmArg will support formatting (i.e, only i1, i8, i16, i32, i64, and i128 maybe). If desired, this could be extended later on.

    This could be generated as a part of the gen-intrinsic-enums command and in the IntrinsicEnums.inc file. It might be better to rename the llvm-tblgen option to gen-intrinsic-decl and the file to IntrinsicDecls.inc. This could be done as one of the preparatory steps.

  2. Developers who add a new ImmArgFormat to their TableGen definitions need to provide implementation of the print and parse functions whose declarations are generated by TableGen. We will add a new IntrinsicFormat.cpp file to host definitions of such functions. It may also make sense to have a per target file to host any target specific intrinsic format handling code. As an example, IntrinsicFormat.cpp can host printers and parsers for target independent intrinsics and IntrinsicFormatNVVM.cpp can host printers and parsers for any NVVM intrinsics. These files can be created on demand as various targets adopt this feature. Using the <CppNameSpace> = Target, we can get different namespaces for different targets, and/or potentially share formatters among different targets, as well as reuse target independent formatters in target specific intrinsics if it makes sense.

  3. The intrinsic emitter will also generate 2 functions to lookup the printer and parser for a given intrinsic. These functions will use lookup tables that the intrinsic emitter will generate. The 2 functions will have the following prototype:

    // Return printer/parser for immediate arguments for intrinsic `ID`. 
    // If arg #i of the intrinsic has a formatter specified, Printers[i]/Parsers[i]
    // will contain the pointer to the to print/parse function for that argument.
    // An absence of a printer/parser will be inferred for arg #j if 
    // j >= Printers.size() || Printers[j] == nullptr. This will help the potentially
    // common case of ImmArgs at the start of the argument list and runtime args at end.
    // We could also optimize for the case of ImmArgs at end of arg list by returning
    // a vector and a 'StartIndex`, so entry j in vector corresponds to Arg# StartIndex + j.
    void Intrinsic::getArgPrinters(Intrinsic::ID ID, SmallVectorImpl<void *> &Printers);
    void Intrinsic::getArgParsers(Intrinsic::ID ID, mallVectorImpl<void *> &Parsers);
    

    Internally, these functions can be supported by a simple per-intrinsic LUT, or something more sophisticated to expoit sparsity and other properties. These LUTs need to be statically initialized, so cannot be std::map<std::pair<Intrinsic::ID, unsigned>, void *>. One simple idea is to have a linearized array of its printers (only for intrinsics with atleast one formatted immarg) and then have a IID->uint16_t offset into this linearized array as another table. Given that immediate argument formatting is not compile time critical, we should prioritize reducing the size of any static data to back up these functions. Assuming that, a simple linearized array will have lot of null pointers. So the proposed encoding of this will
    be as follows:

    static constexpr std::pair<void *, void *> IntrinsicFormatters[] = {
       {nullptr, nullptr}                                      // End of list.
       {nvvm::printRoundMode, nvvm::parseRoundingMode},        // slot 1
       {x86::printInterruptVector, x86::parseInterruptVector}, // slot 2
       ...
    }; // Assume this table has <= 2^16 slots, so can use a uint16_t to index.
    
    // Define a per-intrinsic linked list of formatters for arguments of that
    // intrinsic. The "data" in that linked list is ArgNo and the formatter
    // index (both 16-bit) and the "next" is a index of the next formatter for
    // this intrinsic. The list is terminated with "next" = 0 to encode end of
    // the linked list.
    struct IntrinsicFormatterLinkedListEntry {
      uint16_t ArgNo;
      uint16_t FormatterIndex; // Index into IntrinsicFormatters array.
    };
    static constexpr IntrinsicFormatterLinkedListEntry IntrinsicFormatterLinkedList[] = {
        {~0, 0}; // This encodes the end of the list.
        {0, 1};  // This encodes a LL entry for ImmArg<0> formated as Round mode. 
        {~0, 0}; // This encodes the end of the list.
    };
    
    // per intrinsic table for index of head of the list of formatters for that
    // intrinsic. 0 indicates that the intrinsic has not formatters.
    static constexpr uint16_t IntrinsicFormatterLinkedListHead[] = {
         0, // Intrinsics that do not use formatters will directly point to 0, which is EOL.
         ...
         1, // For this intrinsic, the list starting at index 1 encodes a single
            // ImmArg Arg0 formatted as rounding mode.
    };
    

    As can be seen above, all of this data can be statically initialized and then the 'getArgPrinterandgetArgParsercan decode the list and fill any holes withnullptr`.

  4. AsmPrinter will have a top-level function to print formatted args, which will handle both argument names as well as formatted immediate arguments. This will be called from the handling of CallInst in printInstruction if the call being printed is an intrinsic call that used pretty printing. This function will then get argument names and immediate argumen printers for that intrinsic and pretty print the arguments. Since the printer can fail, the function will, for each arg, call the printer on a temporary raw_string_ostream and then commit that only if the printing succeeds, so that the individual print functions need not have that logic (for example, when an ImmArg is a composite one with packed bitfields). This function could look like:

    bool AsmWriter::PrintFormattedArgs(raw_ostream &OS, const IntrinsicInst &I) {
      Intrinsic::ID ID = I.getIntrinsicID();
      assert(Intrinsic::isFormatted(ID));
      SmallVector<const char *> ArgNames;
      SmallVector<void *> ImmArgPrinters;
    
      Intrinsic::getArgNames(ID, ArgNames); // null if no arg name.
      Intrinsic::getArgPrinters(ID, ImmArgPrinters); // null if not formatted.
    
      ListSeparator LS;
      for (unsigned op = 0, Eop = CI->arg_size(); op < Eop; ++op) {
        OS << LS;
        if (ArgNames[Op])
          Out << "/*" << ArgNames[Op] << "=*/";
        Value *Arg = CI->getArgOperand(op);
        if (op < ImmArgPrinters.size() && ImmArgPrinters[op]) {
            // Print to a temporary string in case the formatting fails.
            std::string Buffer;
            raw_string_stream SS(Buffer);
            ConstantInt *ImmArg = cast<ConstantInt>(Arg);
            bool Failed = false;
            switch (ImmArg->getBitWidth()) {
            case 1: {
               using printer_ty = function_ref<bool(raw_ostream &, bool)>;
               auto *printer =  reinterpret_cast<printer_ty>(ImmArgPrinter[op]);
               Failed = printer(SS, !ImmArg->isZero());
               break;
               }
            case 8: {
               using printer_ty = function_ref<bool(raw_ostream &, uint8_t)>;
               auto *printer =  reinterpret_cast<printer_ty>(ImmArgPrinter[op]);
               Failed = printer(SS, static_cast<uint8_t>(ImmArg->getZExtValue());
               break;
              }
            case 16/32/64:
            } // end switch
            if (!Failed)
              continue;
            // If formatted immarg printing failed, print fall back to raw/default
            // printing.
        }
        writeParamOperand(Arg, PAL.getParamAttrs(op));
      }
    } // end AsmWriter::PrintFormattedArgs.
    
  5. For LLVM IR tewaking by hand, we also want the parse to be able to parse formatted immediate arg values. Currently, each intrinic argument has the following syntax: <type> <attributes> <value>, where <type> is an LLVM type. LLVM types never start with a . (see LLParser::ParseType), so we can use the presence of a ‘.’ as our cue to infer that the value is formatted immediate arg and use the intrinsics parser function for that arg to interpret the value. Currently, LLVM’s lexer does not recognize something like .ftz.rtz as a valid token, so we’d need to extend the lexer to recognize this as new string valued token, say FormattedImmArgVal similar to MetadataVar or LocalVar which are ! and @ prefixed names. Assuming this, the LLParser::parseParameterList can be factored into 2 functions, one outer level driver and one to parse a single parameter. This will enable LLParser::parseCall to check if the next token is a FormattedImmArgVal and if so use the intrinsics immarg parser for the current arg to parse it. That would need mapping from the CalleeID that was parsed to Intrinsic::ID, but if we do it lazily, we will incur that cost only if we encounter this syntax. And if we do not encounter the FormattedImmArgVal token, the code will call the function to parse a single “regular” parameter.

  6. For the parser support, one question is what happens if we are unable to parse the formatted immediate arg. One option is to fail the entire parsing (for ex, if we are parsing an unknown intrinsic, or the associated immarg is not formatted, or pasing fails). This seems ok if we expect the parser to be only able to parse the “current” formatting. However, if we want to support intrinsic upgrade in the presence of the formatted syntax, we somehow need to capture the parsed value and then let the intrinsic upgrade take care of upgrading it. For unparsed immarg values, we will append a $arg<N><FormattedImmArgVal> string to the name of the intrinsic and establish that as a handshake between the parser and intrinsic auto-upgrade. As an example, if in the following input, if arg2 fails to parse (.what), the parser’s output will have a function as below:

    ; Input .ll assembly:
    call void @llvm.target.foo(/*edge_x=*/.modeX, /*edge_y=*/.modeX, /*edge_z=*/.what, i32 %x)
    
    ; Parsed LLVM IR:
    call void @llvm.target.foo$arg2.what(/*edge_x=*/i32 0, /*edge_y=*/i32 0, i32 %x)
    

    The expectation is that the intrinsic auto upgrader can then decode the $arg2.what in the intrinsc name and map it to the appropriate value. Whether to incur this additional complexity of supporting upgrading of this syntax depends on use cases. If folks want to keep around .ll files with the formatted imm arg syntax exposed to the parser and have it continue to work, this is required, else its optional.

This concludes proposal (3).

Discussion/future ideas

  1. With this approach, backends can now more readily choose to pack different configuration bits into single immediate argument assuming the appropriate formatting support is added as well (i.e., LLVM IR readibility concerns can be addressed using formatted immediate arguments). C++ code that handles the printing and parsing of packed immediate arguments will likely use structs and unions to codify the packing and we can establish some convention of where they go. Each target can have a Intrinsics.h and Intrinsics.cpp file to host ant intrinsic specific code for that target (in llvm/include/IR and llvm/lib/IR). The .h file will define struct and any helper function declarations for dealing with intrinsics and that code can be expected to be in the target’s namespace. So instead of the IntrinsicFormatNVVM.cpp suggested earlier, we will have just IntrinsicsNVVM.h for any struct/union and helper function declarations, and IntrinsicsNVVM.cpp that will define these helper function as well as print/parse functions for any formatted imm args used by NVVM intrinsics.

  2. Extending immediate args to support automatic generation of printing and parsing code: The proposal as above supports “manual” printing/parsing of immediate arguments, where the code to print and parse is written separately in C++. This mode allows backend developers complete flexibility in how they want their immediate arguments to be printed. So this “manual mode” is a must have. However, in several common cases, the code to print and parse imm args might be a simple per-immarg printing and parsing. To support that, we could potentially extend the intrinsic supports in TableGen to auto generate such printing and parsing code. As a very basic example, for each immediate arg, we may be able to specify simple enumeration as follows:

    // Immediate arg enums generate the following code:
    // enum {class?} EnumName : uint<NumBits>_t {
    //   EnumValueName[0] = EnumValues[0];
    //   EnumValueName[1] = EnumValues[1];
    // };
    //
    // print<EnumName>(raw_ostream &OS, uint<N>_t Value);
    // parse<EnumName>(..., ConstantInt *RetVal...); // RetVal will be of type uint<N>_t.
    
    class ImmArgEnum {
       list<string> EnumValueNames;
       list<int> EnumValues;
       int : NumBits; // Number of bits to use for this enum.
       bool : IsClass; // generate regular enum or enum class.
       string EnumName; // name of the enum generated in the code.
      ImmArgFormat Fmt = ...; // an ImmArgFormat for this enum.
    };
    
    // For i1 types, prints true value or false value based on 0/1 value of the i1.
    class ImmArgBool {
      string TrueVal;
      string FalseVal;
    };
    
    def RoundingMode :  ImmArgEnum {
      let EnumValueNames = ["rne", "rtz", "rup", "rdn"];
      let EnumValues = [0, 1, 2, 3]; // Could be auto assigned if not specified.
      let NumBits = 8;
      let EnumName = "RoundingMode"; // May be auto assigned based on the record name.
    }
    
    def FtzMode : ImmArgBool {
      let TrueValue = "ftz;
      let FalseValue = "noftz"
    }
    
    // specify that arg0 needs to be the rounding mode enum printer.
    ImmArg<ArgIndex<0>, RoundingMode.Fmt>
    
    // May be there is a way to make this simpler as follows:
    ImmEnumArg<ArgIndex<0>, RoundingMode>
    ImmBoolArg<ArgIndex<2>, FtzMode>
    

    Note that we are not proposing this in as a part of this RFC, but could be considered in future.

Staging

Implementation of this RFC will need to happen in stages. We propose the following staging:

  1. Implement proposal (1) (C Style comments). There is already a draft PR for this here: [LLVM][AsmParser] Add support for C style comments by jurahul ¡ Pull Request #111554 ¡ llvm/llvm-project ¡ GitHub.
  2. Prep #0: [NFC] Introduce AsmPrintOptions with a single IsForDebug field in it.
  3. Prep #1: [NFC] Rename gen-intrinsic-enums to gen-intrinsic-decls and IntrinsicEnums.inc to IntrinsicDecs.inc.
  4. Add support for intrinisic arg names. Can be split into 2 parts:
    • Add Intrinsics.td, intrinsic emitter, and Intrinsic::getArgNames function, with test intrinsics for e2e testing, and unit tests that query arg names and verify them
    • Add AsmWriter support for argument names (add new flag to AsmPrintOptions and print names, extend llvm-dis to accept option -print-formatted-intrinsics and add LLVM LIT tests to test the formatted intrinsic syntax with arg names)
  5. Add support for formatted immediate args. Can be split into 3 parts:
    • Add Intrinsics.td, intrinsic emitter, and Intrinsic::getArgPrinters/getArgParsers, and unit tests to query parsers and verify printing and parsing (unit test will query and call these functions, so no hookup in AsmPrinter, parser yet)
    • Add Asm printer support.
    • Add Asm parser support (no intrinsic upgrade support).
  6. Add intrinsic upgrade support if desired. Again, need e2e testing of some form here.
  7. Add support for build time stripping out of this feature.
  8. Adopt these features for some existing intrinsics (may be a few NVVM intrinsics), migrate existing unit tests to them, and drop the test intrinsics that were added for e2e testing.
1 Like

I wonder if folks had time to go through this at least at a high level and if there is any initial feedback. I was hoping to start working on steps 1-3 (in Staging) to begin with as they are prep work for later changes and may have some merit on their own.

My initial feedback is that this RFC is way too long. It took me nearly half an hour of pure reading time just to get through this block of text. Rather than focusing on the high-level direction, your RFC focuses on lots of little details that are not relevant to the wider community. This kind of RFC does not need (and should not) give a detailed accounting of which data structures exactly you plan to use to implement a particular aspect of the proposal. Please try to keep future RFCs more to the point. I don’t know if you’ll get much feedback for this RFC as it stands now.

Maybe @nhaehnle has some opinions on this – I think he has spent more time than most thinking about how we can move LLVM IR in a more extensible direction.

For example, I believe that his structured data proposal provides a somewhat generalized alternative for some of your motivating cases.

It’s been a while since I looked at that in detail, but I think it might enable something along the lines of @llvm.foo.bar(...) { flushToZero: i1 false }. If we also allow something enum-like in the structured data that would also give a clean representation for things like roundingMode.

As that proposal outlines, the problem of “giving meaningful names to things” is not unique to intrinsics.


There is a fourth choice missing here: Metadata arguments. For example, this is how constrained FP intrinsics look like:

call double @llvm.experimental.constrained.fadd.f64(
  double %a,
  double %b,
  metadata !"round.dynamic",
  metadata !"fpexcept.strict")

We currently use such string metadata arguments in cases where we essentially want an immarg with a readable textual representation.

Thanks @nikic. I am happy to rework the RFC to be shorter and touch on high level directions (and refer back to this thread for potential details, but I imagine they might change during the course of implementation if this is accepted). Let me work on that.

Metadata arguments is something we did consider internally but decided not to pursue due to (a) metadata arguments being not was widely used as immediate arguments, and (b) potential IR memory footprint issues, and (c) We have a lot of existing intrinsics that we cannot migrate to metadata. The proposal above will allow retrofitting those intrinsics as well.

I have posted a shorter version of the RFC here: [RFC] Pretty printing for LLVM Intrinsic arguments (Short) - IR & Optimizations - LLVM Discussion Forums