[RFC] Pretty printing for LLVM Intrinsic arguments
This RFC proposes adding LLVM infrastructure to support pretty printing and parsing of intrinsics arguments. The motivation is to increase the readability/grokability and hackability of LLVM intrinsics. Although LLVM intrinsics have target defined semantics, when intrinsics use a long list of arguments and/or use immediate arguments with opaque values to encode their behavior, it may become difficult for a human reader to decode/understand a specific call to such an intrinsic. The end goal of this RFC is to enable a human who is reading or modifying LLVM IR to more easily work with LLVM intrinsics. This mostly includes compiler engineers working with and debugging the compiler, and frontend/higher level compiler engineers who want to interface with LLVM and use these intrinsics. In a production compiler flow, Iâd assume that LLVM IR rarely gets printed or parsed, so in that context, LLVM IR printing and parsing is a debug tool for compiler developers, and the goal of this RFC is to increase their productivity by making intrinsics less opaque than what they are today in LLVM assembly.
Additionally, as a biproduct, it may offer additional flexibility for backends when designing their intrinsics.
Summary
Broadly, this RFC is proposing three independent features:
- Support for C style comments
/*...*/
in LLVMâs assembly syntax. - Ability to define names for intrinsics arguments and have them printed in LLMV IR as inline comments.
- Ability to pretty-print the value of immediate arguments and parse the pretty-printed value (this is a hackability aid). These are referred to as formatted immediate arguments in this RFC.
Together, they should help improve the readibility of LLVM intrinsics by proving additional information that will reduce the cognitive load on the human working with intrinsics. Having the ability to parse pretty-printed values of immediate arguments also free up enginners from needing to know exact encoding of immediate args if they want to change it, they just need to know the pretty printed/formatted encoding to change it.
A TL;DR summary of the proposal is as follows:
; Feature (1): C style comments.
; -----------------------------------------------
/* LLVM assemby will support C style comments that span multiple lines.
* - Below we want to test fadd(fmul)=>fma transform with FTZ enabled
* - Additional, we need to use rounding mode rtz to exercise the special
* handling of this mode in function HandleRoundingModes.
*/
%t = llvm.target.fmul.ftz(float %x, float %y)
%w = llvm.target.fadd.rtz(float %t, float %w)
; LLVM assemby will support "inline" comments that do not span till end of line.
%y = /*ignored*/ call @log(/*num=*/ %x, /*base=*/float 10.0) /* ignored */
; Feature (2): Argument names for intrinsics.
; -----------------------------------------------
; Intrinsic declarations will optionally print argument names using inline comments.
declare void @llvm.target.some_op(/*ftz=*/i1, /*rnd=*/i8, /*value=*/float)
; Argument names will be printed in intrinsic calls.
call void @llvm.target.some_op(/*ftz=*/i1 0, /*rnd=*/i8 3, /*value=*/float %x)
; Feature (3): Formatted immediate arguments.
; -----------------------------------------------
; Immediate argument values will be pretty printed as 1 or more '.' prefixed tokens
; using inline comment following the argument name.
call void @llvm.target.some_op(/*ftz=.ftz*/i1 0, /*rnd=.rtz*/i8 3, /*value=*/float %x)
; Argument names and pretty printing for immarg are orthogonal and can be opted in
; on a per argument basis. Here:
; - arg0 just uses arg name feature.
; - arg1 uses arg name and formatted imm arg.
; - arg2 uses just formatted imm arg
; - arg3 does not use arg name (and is not an immarg)
call void @llvm.target.some_other_op(/*ftz=*/i1 0, /*rnd=.rtz*/i8 3, /*.low_precision*/i8 0, float %x)
; IR parser will support reading formatted immediate arg values, so that IR can
; be hand-edited to tweak the immarg values.
call void @llvm.target.some_op(/*ftz=*/.ftz, /*rnd=.rtz*/i8 3, /*.value=*/float %x)
; edit to =>
call void @llvm.target.some_op(/*ftz=*/.ftz, /*rnd=*/.rtne, /*.value=*/float %x)
Background
LLVM intrinsics support 2 kinds of arguments currently: regular runtime inputs to the intrinsic (including varargs), and immediate args which is a way to encode compile time parameterization of an intrinsic. Immediate arguments are required to be compile time constants (either integer or floating-point scalar values currently) and the intent is that generally (though not always) these get encoded in the final machine instructions and hence need to be compile time constants. A very simple example is X86 interrupt intrinsic, which is defined as follows:
def int_x86_int : Intrinsic<[], [llvm_i8_ty], [ImmArg<ArgIndex<0>>]>;
The single input to the intrinsic is an 8 bit interrupt vector and is required to be a compile time immediate since its encoded as an 8-bit immediate into the instruction. Without immediate arguments, we would need to define up to 256 different intrinsics for each possible interrupt vector value. This would add a lot of boilerplate code in both the intrinsic definitions as well as actual LLVM code (IR transforms or backend code) to handle all these intrinsics, as well as the frontend which is going to generate these intrinsic calls. With immediate arguments, we can define just one LLVM intrinsic. The downside is that the immediate arguments to these intrinsics are just integers and are printed out as such in the LLVM IR. If we had to instead define 256 (or a subset) of different intrinsics, we could choose to name them as llvm.x86.int.dos
(or even llvm.x86.int.21h
for DOS interrupts) and llvm.x86.int.video
(instead of llvm.x86.int.16
for BIOS video interrupts) for better readability, but at the cost of dealing with a large number of intrinsic ID enums.
At the LLVM IR level, LLVMâs IR verifier checks that the arguments marked as ImmArg
in the TD definitions of intrinsics are actually constant values, but beyond that LLVM does not seem to do anything more with this information. That means that frontends that generate these intrinsics have to use constant values for immediate arguments and LLVM transformations and backends that handle these intrinsics in LLVM IR can assume that the immediate arg values will be compile time constants and do not need to handle cases where these could be runtime values (like generating switch case code over all possible values).
In short, immediate arguments help improve the compiler implementation ergonomics and convenience, at the potential cost of readability. Immediate arguments are used extensive in LLVM upstream and downstream backends. They play a role similar to operation attributes in MLIR, but limited to integer and float types.
All intrinsic arguments (immediate or otherwise) are specified in the intrinsic definition by providing a list of parameter types when defining an intrinsics. No names are associated with either the input(s) or the output(s) of the intrinsic.
Motivation
GPUs and other accelerators are rapidly evolving to support new features and performance improvements for critical workloads like ML inference and training. As an example, NVIDIA GPUs have added special purpose accelerators called Tensor Cores to accelerate matrix multiplication, and these accelerators keep evolving drastically with each GPU generation, either to support new use cases or improve performance. Compiler support at the lowest levels of the compiler stack for these generally relies on intrinsics since the programming model for these accelerators is nuanced, with several degrees of freedom and restrictions to configure, and LLVM intrinsics is the only available mechanism to introduce target dependent extensions to LLVM IR (unlike MLIR that allows adding custom operations). Built on top on this LLVM support are libraries like CuDNN/CUTLASS and higher-level compilers like XLA and Triton.
One challenge in supporting such accelerators with intrinsics is that the myriad variety of ways to configure them can lead to issues similar to what immediate arguments attempt to address. These configuration options are in several cases encoded in the instruction, so we need to use immediate arguments for these configuration options. As a result, these intrinsics can end up with a dozen or more of such immediate arguments, leading to unreadable LLVM IR. Additionally, as these intrinsics potentially evolve over time (either internally during the HW/SW codesign phase or across GPU generations), we may need to add new variants of these intrinsics with just minor changes in the configuration options available. However, the cost of doing these changes in the compiler may be non-trivial due to lots of argument position changes for these intrinsics.
Note that even without tensor cores, some GPU specific intrinsics can have a long set of immargs for various configuration bits. This includes things like texture related intrinsics, or load/store intrinsics with additional qualifiers for caching behavior, ordering etc. For example, llvm.amdgcn.ds.ordered.add
has 6 immediate arguments and NVIDIA has some internal load/store intrinsics that have ~10 immediate arguments. The same readability and extensibility issues apply to these intrinsics as well. In addition to immediate arguments, several such intrinsics accept a number of runtime arguments as well. These long lists of runtime arguments hinder readability as well. It may be possible to logically sequence these runtime arguments so that their meaning can be decoded from the position, but that may not always be possible and is additional cognitive load on anyone looking at a call to one of these intrinsics.
As an example, most GPUs support a texture read/write intrinsic that can write a given array slice and level-of-detail (LOD) of mipmapped texture. Such intrinics may have one i32
argument for the array slice and another i32
argument for the LOD level to read/write. Additional, 1D variants of these will also have an i32
argument for the texture coordinate as well. So a call to such an intrinsic will have 3 i32 scalar values as arguments and it may not be imemdiately obvious which i32
corresponds to which value. As an example, the Metal shading language specification defines a texture array read as:
Tv read(uint coord, uint array, uint lod = 0) const; // for 1DArray
Tv read(uint2 coord, uint array, uint lod = 0) const; // for 2DArray
When using LLVM intrinsics to represent this builtin, it would help to have some hints as to which argument is which value.
This proposal is an attempt to address some of these issues, both from a LLVM IR readability and hackability POV as well as the C++ code that handles these intrinsics.
With the current LLVM intrinsic design, we have 3 choices to represent any intrinsic with some compile-time parameterization. We will consider a running example of a floating-point unary operation that take 1 float input and its rounding mode and FTZ (flush denorm to zero) behavior can be configured at compile time. FTZ can be on
or off
and rounding mode can have 4 possible values (rne, rtz, rup, rdn
). The design choices for such an intrinsic are:
-
Have a different intrinsic for each possible combination of configurations, and only have runtime values as intrinsic arguments. This would lead to 8 different intrinsics:
some_op.{ftz/noftz}.{rne/rtz/rup/rdn}
. There is obvious readability advantage for these intrinsics, but compiler implementation is expensive as we need to define all 8 variants in TD files and handle them in codegen. Note that C++ code/TD defs can be structured to mitigate these effects of duplication in some cases, but maybe not always and at the end of the day, we have code dealing with 8 different but very similar intrinsics. -
Have one
i1
immarg for ftz, and ai8
immarg for rounding mode. This reduces the number of intrinsics to 1, but LLVM IR is less readable as one has to decode the meaning of each immarg when reading the IR (which depends on its position in the arg list and its actual value). Mutating the IR by hand also not as easy for experimentation as you need to know the encoding of these immarg values. Additionally, with a large number of args, the IR dumps may become unwieldy and easier to get lost. -
Have both the FTZ and Rounding mode packed into a single 8/16/32-bit config word. This keeps the number of immargs down to one/few, but with packed immargs, the IR dumps become even more opaque. However, for some backends this form may offer better implementation ergonomics in terms of dealing with the variety of configuration options available and evolving them over time (tensor core related intrinsics are a motivating example here). As an example, in the packed immediate argument mode, we can repurpose one of the existing unused bits in the packing for a new configuration bit. This keeps the existing code working (including existing LLVM IR) and adding support for the new configuration at select places in the backend is much easier than say supporting a new immarg (which results in potential rearrangement of argument position as well as breaking existing LLVM IR unless we also implement an auto-upgrade path). It does not obviate the need to make sure the new modifier is handled correctly everywhere and design the packing with some foresight in mind, but can definitely reduce unnecessary churn (and resulting bugs) in the code.
Ideally, we would like good IR ergonomics of #1 (easy to read and modify the IR) coupled with flexibility for backends to choose either option #2 or #3 as a way of encoding the immediate arguments. Additionally, decoding the meaning of each argument from the position is error-prone, so having argument names attached to intrinsic arguments and being able to print them can help readability as well. Together these are essentially proposals for in-built pretty printing support for intrinsics that individual intrinsics can opt-in.
Requirements
If we were to support something like this in LLVM, below are some requirements that need to be satisfied to be able incorporate this feature in LLVM:
- These pretty printing features need to be opt-in, so existing intrinsics work without any changes and even intrinsics that adopt these features should be printed in ârawâ mode by default (i.e., pretty printing should be disabled by default). LLVMâs AsmWriter should accept a flag to turn it on (which will be wired to command line arguments for various tools like llvm-dis/opt etc).
- Any intrinsic that does not make use of these features should not pay any significant compile time cost in terms of printing and parsing that intrinsic. Implementation wise, any potentially expensive code path to support this should be exercised only after a cheap check to see if the intrinsic in question actually has opted into these features.
- Intrinsics should be able to opt in gradually, and any modifications to the intrinsic definitions should be incremental changes to current Intrinsic.td definitions.
- [?] Similar to current ârawâ mode intrinsics, the pretty printed syntax should enable intrinsic upgrade. That means unknown/unrecognized syntax should not result in a parse error and be handed over to the auto upgrader to give it a chance to auto upgrade. And if auto-upgrade fails, these failed-to-parse intrinsics should survive in the IR as unknown intrinsics. Note that this is [?], in the sense the output that LLVMâs AsmWriter will produce will always have arg names and pretty printed immarg values in comments, and one has to explicitly edit these delete the ârawâ value of an immarg and âexposeâ the pretty printed value. We could say that in such cases, since the input is non-standard, upgrade support will not work for this path and the parser is expected to parse only the current versions of such pretty printed immarg values.
- We should have end-to-end unit tests to test various aspects of this RFC. What that means is either volunteering a couple of existing LLVM intrinsics to use these features and serve as a test vehicle, or having a new set of test intrinsics, built in into LLVM for the sole purpose of e2e testing (less desirable). It seems we would still need to have test intrinsics that use these features while its being developed, and once rready, we can adopt some existing intrinsics to use these features and then deprecate the test intrinsics (and switch over the unit tests to the existing intrnsics).
- (Stretch goal) LLVMâs intrinsic infrastructure and LLVMâs Asm Printer and Parser will now have additional code and static data to help support this feature. The size of that depends on how many intrinsics actually adopt this feature. When code/data size are of concern for a particular deployment, it should be possible to disable these features at build time without any other changes. So, it should be possible to enable/disable a built time option to strip out pretty printing support say when building the final deployment version of the compiler but keep it enabled when building internal versions used for debugging (Note: This is different from being enabled/disabled in different build configurations like Assert/Release/Debug). This could rely on either a complete stripping out of this feature using C++ preprocessor or similar features or through a combination of that and compiler DCE (for instance, if all code to exercise this is guarded by a 1-bit per-intrinsic query, if that query always returns false, the compiler could DCE all the pretty printing code).
- (Stretch goal) Related to the stretch goal above, it might be good to isolate the support for this into a new
IntrinsicFormat
component in LLVM and onlyAsmParser
andAsmWriter
components link with it. Currently,AsmWriter
is part of LLVM Core, so that will require some restructuring of dependencies, so just mentioned here for completeness.
Proposal (1): Support C Style comment in LLVM IR
This is fairly straightforward to implement (see draft PR: [LLVM][AsmParser] Add support for C style comments by jurahul ¡ Pull Request #111554 ¡ llvm/llvm-project ¡ GitHub). If there are concerns about allowing this generally, it may be possible to make the support modal, disabled by default, and LLVMâs AsmParser
will enable it only when parsing a call
instructionâs argument list if the callee is a function that could be an intrinsic (name starts with llvm.
) and disable after argument list parsing. However, being able to add inline comments could have its own utility say when writing .ll test cases. So the prefererence is to add this generically.
Proposal (2): Support named intrinsic arguments
This feature will allow specifying argument names for intrinsic arguments.
-
LLVMâs
Intrinsic
class (inIntrinsics.td
) will support specifying names for arguments. This will be supported by a newArgName
list that can be specified with the intrinsic, as follows:class ArgName<ArgIndex idx, string name> { int ArgNo = idx.Value; string Name = name; } class Intrinsic<list<LLVMType> ret_types, list<LLVMType> param_types = [], list<IntrinsicProperty> intr_properties = [], string name = "", list<SDNodeProperty> sd_properties = [], bit disable_default_attributes = true, list<ArgName> arg_names = []> : SDPatternOperator { ... }
This optional list can be used to specify names for all or a subset of intrinsic arguments. The names need to be unique and confirm to the following syntax:
[a-zA-Z][0-9a-zA-z_]*
(essentially, all valid un-escaped LLVM identifiers but no$
or.
which we will use for special purpose during intrinsic upgrade assuming support for that is needed). If an intrinsic argument has an argument name specified, it will be printed before that argument using the inline comment syntax as :/* ArgName= */
. The LLVMâs IR parser will ignore these comments. That also means that there is no linting for these names if say they mismatch the actual specified names in the intrinsic definitions, and are treated truly as comments.Additionally, the
AsmWriter
can also print argument names in intrinsic declarations. Given that they are printed with the call, printing argument names with intrinsic declarations has questionable utility, but can be done for consistency [AI: Decide if needed or not] -
LLVMâs
AsmWriter
will support a bool arg that will enable or disable intrinsic pretty printing. UnlikeAsmParser
which is a single class that drives LLVM assembly parsing, LLVM IR printer entry points is scattered throughout the code in the form of per classprint
function that take arguments likeIsForDebug
for some control of what is printed. So one option is to do some refactor beforehand to replace allIsForDebug
with aAsmPrintOptions
struct, which will include bothIsForDebug
andEnableFormattedIntrinsics
to control pretty printing of intrinsics. The value ofEnableFormattedIntrinsics
will be false by default, andllvm-dis
(and any other llvm tool)opt
will take a new command line option to enable formatted intrinsics. This single bool will control both argument name and imm arg formatting. -
Implementation wise, LLVMâs intrinsic emitter will generate a 1-bit
isFormatted
table similar toisOverloaded
. The bit will be set if the intrinsic has any named argument or any immediate argument with formatting enabled. InAssemblyWriter::printInstruction
code that handlesCallInst
, there we will add this 1 bit check, and then execute the formatted intrinsic arg code path, where the code will first query that intrinsics argument names (by calling a new intrinsic emitter generatedIntrinsics::getArgumentNames(Intrinsic::ID)
function) and print any non-null entries as comments preceding the argument values. This function will look likevoid Intrinsics::getArgumentNames(Intrinsic::ID, SmallVectorImpl<const char *> Names)
(instead of returning anArrayRef<const char *>
) so that we can optimize the storage of arg names internally using classes likeStringToOffsetTable
andSequenceToOffsetTable
to dedupe same arg names used across multiple intrinsics. -
Note that once we have argument names for intrinsics, its also possible to generate enumerations from those argument names, to use instead of magic argument numbers in the code. As an example, for an arg named
intvec
of intrinsicllvm.x86.int
, we can generate an enumenum Intrinsic::x86::int_args { // enums for argument indexes for a specific intrinsic. intvec = 0, }
And use that instead of
0
in the code. This might be useful for intrinsics with several arguments, however, we do not plan to implement this as a part of this RFC. If there is enough interest, we can start another RFC later down the road to discuss this. Additionally something likeValue* IntrinsicInst::getArgOperand(StringRef Name)
is also a possibility that can be explored later. This uses the intrinsic argument names for things beyond formatting, so need to consider how it interacts with Requirement (6) above where we would like to strip out this support conditionally for code/data size reasons.
This effectively concludes proposal (2).
Proposal (3): Support for formatted immediate arguments
This feature will allow specifying an optional ImmArgFormat
object for each ImmArg
in the intrinsic definition. The ImmArgFormat
object eseentially captures the printer and parse for a given ImmArg
.
-
LLVMâs
ImmArg
property, which is used to define immediate arguments, will support an optional format defined as follows:class ImmArgFormat<string Name, LLVMType type, string CppNameSpace=""> { string printerName = !strconcat("print", Name); string parserName = !strconcat("parse", Name); LLVMType Type = Type; string CppNameSpace; } // A record used as the default value when no printing support is opted in. def NoFormat : ImmArgFormat<"", llvm_void_ty>; class ImmArg<AttrIndex idx, ImmArgFormat F = NoFormat> : IntrinsicProperty { int ArgNo = idx.Value; ImmArgFormat Fmt = F; } def RoundingModeFmt : ImmArgFmt<"RoundingMode", llvm_i8_ty>; def int_target_some_op : Intrinsic<[], [llvm_i8_ty],[].., [ImArgs<ArgIndex<0>, RoundingModeFmt>],..]>;
LLVMâs intrinsic emitter backend will use the
NoFormat
object as a marker to infer that no immediate arg formatting was enabled for that particularImmArg
. Otherwise, it will generate a declaration of the printer and parser function for this formatter in theIntrinsic::<CppNameSpace>
namespace as follows:namespace llvm::Intrinsic::<CppNameSpace> { // print the value of the imm arg. Return false if printing failed. // (unknown value encountered for instance) bool print##Name(raw_ostream &OS, <type> Val); // Parse the formatted value. Return std::nullopt if parsing failed. std::optional<type> parse##Name(StringRef FormattedValue); }
where
<type>
is the C++ type corresponding to theLLVMType
specified in theImmArgFormat
class. Note that at least initially, only a subset of integer typedImmArg
will support formatting (i.e, only i1, i8, i16, i32, i64, and i128 maybe). If desired, this could be extended later on.This could be generated as a part of the
gen-intrinsic-enums
command and in theIntrinsicEnums.inc
file. It might be better to rename thellvm-tblgen
option togen-intrinsic-decl
and the file toIntrinsicDecls.inc
. This could be done as one of the preparatory steps. -
Developers who add a new
ImmArgFormat
to their TableGen definitions need to provide implementation of the print and parse functions whose declarations are generated by TableGen. We will add a newIntrinsicFormat.cpp
file to host definitions of such functions. It may also make sense to have a per target file to host any target specific intrinsic format handling code. As an example,IntrinsicFormat.cpp
can host printers and parsers for target independent intrinsics andIntrinsicFormatNVVM.cpp
can host printers and parsers for any NVVM intrinsics. These files can be created on demand as various targets adopt this feature. Using the<CppNameSpace> = Target
, we can get different namespaces for different targets, and/or potentially share formatters among different targets, as well as reuse target independent formatters in target specific intrinsics if it makes sense. -
The intrinsic emitter will also generate 2 functions to lookup the printer and parser for a given intrinsic. These functions will use lookup tables that the intrinsic emitter will generate. The 2 functions will have the following prototype:
// Return printer/parser for immediate arguments for intrinsic `ID`. // If arg #i of the intrinsic has a formatter specified, Printers[i]/Parsers[i] // will contain the pointer to the to print/parse function for that argument. // An absence of a printer/parser will be inferred for arg #j if // j >= Printers.size() || Printers[j] == nullptr. This will help the potentially // common case of ImmArgs at the start of the argument list and runtime args at end. // We could also optimize for the case of ImmArgs at end of arg list by returning // a vector and a 'StartIndex`, so entry j in vector corresponds to Arg# StartIndex + j. void Intrinsic::getArgPrinters(Intrinsic::ID ID, SmallVectorImpl<void *> &Printers); void Intrinsic::getArgParsers(Intrinsic::ID ID, mallVectorImpl<void *> &Parsers);
Internally, these functions can be supported by a simple per-intrinsic LUT, or something more sophisticated to expoit sparsity and other properties. These LUTs need to be statically initialized, so cannot be
std::map<std::pair<Intrinsic::ID, unsigned>, void *>
. One simple idea is to have a linearized array of its printers (only for intrinsics with atleast one formatted immarg) and then have a IID->uint16_t offset into this linearized array as another table. Given that immediate argument formatting is not compile time critical, we should prioritize reducing the size of any static data to back up these functions. Assuming that, a simple linearized array will have lot of null pointers. So the proposed encoding of this will
be as follows:static constexpr std::pair<void *, void *> IntrinsicFormatters[] = { {nullptr, nullptr} // End of list. {nvvm::printRoundMode, nvvm::parseRoundingMode}, // slot 1 {x86::printInterruptVector, x86::parseInterruptVector}, // slot 2 ... }; // Assume this table has <= 2^16 slots, so can use a uint16_t to index. // Define a per-intrinsic linked list of formatters for arguments of that // intrinsic. The "data" in that linked list is ArgNo and the formatter // index (both 16-bit) and the "next" is a index of the next formatter for // this intrinsic. The list is terminated with "next" = 0 to encode end of // the linked list. struct IntrinsicFormatterLinkedListEntry { uint16_t ArgNo; uint16_t FormatterIndex; // Index into IntrinsicFormatters array. }; static constexpr IntrinsicFormatterLinkedListEntry IntrinsicFormatterLinkedList[] = { {~0, 0}; // This encodes the end of the list. {0, 1}; // This encodes a LL entry for ImmArg<0> formated as Round mode. {~0, 0}; // This encodes the end of the list. }; // per intrinsic table for index of head of the list of formatters for that // intrinsic. 0 indicates that the intrinsic has not formatters. static constexpr uint16_t IntrinsicFormatterLinkedListHead[] = { 0, // Intrinsics that do not use formatters will directly point to 0, which is EOL. ... 1, // For this intrinsic, the list starting at index 1 encodes a single // ImmArg Arg0 formatted as rounding mode. };
As can be seen above, all of this data can be statically initialized and then the 'getArgPrinter
and
getArgParsercan decode the list and fill any holes with
nullptr`. -
AsmPrinter
will have a top-level function to print formatted args, which will handle both argument names as well as formatted immediate arguments. This will be called from the handling ofCallInst
inprintInstruction
if the call being printed is an intrinsic call that used pretty printing. This function will then get argument names and immediate argumen printers for that intrinsic and pretty print the arguments. Since the printer can fail, the function will, for each arg, call the printer on a temporaryraw_string_ostream
and then commit that only if the printing succeeds, so that the individual print functions need not have that logic (for example, when anImmArg
is a composite one with packed bitfields). This function could look like:bool AsmWriter::PrintFormattedArgs(raw_ostream &OS, const IntrinsicInst &I) { Intrinsic::ID ID = I.getIntrinsicID(); assert(Intrinsic::isFormatted(ID)); SmallVector<const char *> ArgNames; SmallVector<void *> ImmArgPrinters; Intrinsic::getArgNames(ID, ArgNames); // null if no arg name. Intrinsic::getArgPrinters(ID, ImmArgPrinters); // null if not formatted. ListSeparator LS; for (unsigned op = 0, Eop = CI->arg_size(); op < Eop; ++op) { OS << LS; if (ArgNames[Op]) Out << "/*" << ArgNames[Op] << "=*/"; Value *Arg = CI->getArgOperand(op); if (op < ImmArgPrinters.size() && ImmArgPrinters[op]) { // Print to a temporary string in case the formatting fails. std::string Buffer; raw_string_stream SS(Buffer); ConstantInt *ImmArg = cast<ConstantInt>(Arg); bool Failed = false; switch (ImmArg->getBitWidth()) { case 1: { using printer_ty = function_ref<bool(raw_ostream &, bool)>; auto *printer = reinterpret_cast<printer_ty>(ImmArgPrinter[op]); Failed = printer(SS, !ImmArg->isZero()); break; } case 8: { using printer_ty = function_ref<bool(raw_ostream &, uint8_t)>; auto *printer = reinterpret_cast<printer_ty>(ImmArgPrinter[op]); Failed = printer(SS, static_cast<uint8_t>(ImmArg->getZExtValue()); break; } case 16/32/64: } // end switch if (!Failed) continue; // If formatted immarg printing failed, print fall back to raw/default // printing. } writeParamOperand(Arg, PAL.getParamAttrs(op)); } } // end AsmWriter::PrintFormattedArgs.
-
For LLVM IR tewaking by hand, we also want the parse to be able to parse formatted immediate arg values. Currently, each intrinic argument has the following syntax:
<type> <attributes> <value>
, where<type>
is an LLVM type. LLVM types never start with a.
(seeLLParser::ParseType
), so we can use the presence of a â.â as our cue to infer that the value is formatted immediate arg and use the intrinsics parser function for that arg to interpret the value. Currently, LLVMâs lexer does not recognize something like.ftz.rtz
as a valid token, so weâd need to extend the lexer to recognize this as new string valued token, sayFormattedImmArgVal
similar toMetadataVar
orLocalVar
which are!
and@
prefixed names. Assuming this, theLLParser::parseParameterList
can be factored into 2 functions, one outer level driver and one to parse a single parameter. This will enableLLParser::parseCall
to check if the next token is aFormattedImmArgVal
and if so use the intrinsics immarg parser for the current arg to parse it. That would need mapping from theCalleeID
that was parsed toIntrinsic::ID
, but if we do it lazily, we will incur that cost only if we encounter this syntax. And if we do not encounter theFormattedImmArgVal
token, the code will call the function to parse a single âregularâ parameter. -
For the parser support, one question is what happens if we are unable to parse the formatted immediate arg. One option is to fail the entire parsing (for ex, if we are parsing an unknown intrinsic, or the associated immarg is not formatted, or pasing fails). This seems ok if we expect the parser to be only able to parse the âcurrentâ formatting. However, if we want to support intrinsic upgrade in the presence of the formatted syntax, we somehow need to capture the parsed value and then let the intrinsic upgrade take care of upgrading it. For unparsed immarg values, we will append a
$arg<N><FormattedImmArgVal>
string to the name of the intrinsic and establish that as a handshake between the parser and intrinsic auto-upgrade. As an example, if in the following input, if arg2 fails to parse (.what
), the parserâs output will have a function as below:; Input .ll assembly: call void @llvm.target.foo(/*edge_x=*/.modeX, /*edge_y=*/.modeX, /*edge_z=*/.what, i32 %x) ; Parsed LLVM IR: call void @llvm.target.foo$arg2.what(/*edge_x=*/i32 0, /*edge_y=*/i32 0, i32 %x)
The expectation is that the intrinsic auto upgrader can then decode the
$arg2.what
in the intrinsc name and map it to the appropriate value. Whether to incur this additional complexity of supporting upgrading of this syntax depends on use cases. If folks want to keep around .ll files with the formatted imm arg syntax exposed to the parser and have it continue to work, this is required, else its optional.
This concludes proposal (3).
Discussion/future ideas
-
With this approach, backends can now more readily choose to pack different configuration bits into single immediate argument assuming the appropriate formatting support is added as well (i.e., LLVM IR readibility concerns can be addressed using formatted immediate arguments). C++ code that handles the printing and parsing of packed immediate arguments will likely use structs and unions to codify the packing and we can establish some convention of where they go. Each target can have a Intrinsics.h and Intrinsics.cpp file to host ant intrinsic specific code for that target (in llvm/include/IR and llvm/lib/IR). The .h file will define struct and any helper function declarations for dealing with intrinsics and that code can be expected to be in the targetâs namespace. So instead of the
IntrinsicFormatNVVM.cpp
suggested earlier, we will have justIntrinsicsNVVM.h
for any struct/union and helper function declarations, andIntrinsicsNVVM.cpp
that will define these helper function as well as print/parse functions for any formatted imm args used by NVVM intrinsics. -
Extending immediate args to support automatic generation of printing and parsing code: The proposal as above supports âmanualâ printing/parsing of immediate arguments, where the code to print and parse is written separately in C++. This mode allows backend developers complete flexibility in how they want their immediate arguments to be printed. So this âmanual modeâ is a must have. However, in several common cases, the code to print and parse imm args might be a simple per-immarg printing and parsing. To support that, we could potentially extend the intrinsic supports in TableGen to auto generate such printing and parsing code. As a very basic example, for each immediate arg, we may be able to specify simple enumeration as follows:
// Immediate arg enums generate the following code: // enum {class?} EnumName : uint<NumBits>_t { // EnumValueName[0] = EnumValues[0]; // EnumValueName[1] = EnumValues[1]; // }; // // print<EnumName>(raw_ostream &OS, uint<N>_t Value); // parse<EnumName>(..., ConstantInt *RetVal...); // RetVal will be of type uint<N>_t. class ImmArgEnum { list<string> EnumValueNames; list<int> EnumValues; int : NumBits; // Number of bits to use for this enum. bool : IsClass; // generate regular enum or enum class. string EnumName; // name of the enum generated in the code. ImmArgFormat Fmt = ...; // an ImmArgFormat for this enum. }; // For i1 types, prints true value or false value based on 0/1 value of the i1. class ImmArgBool { string TrueVal; string FalseVal; }; def RoundingMode : ImmArgEnum { let EnumValueNames = ["rne", "rtz", "rup", "rdn"]; let EnumValues = [0, 1, 2, 3]; // Could be auto assigned if not specified. let NumBits = 8; let EnumName = "RoundingMode"; // May be auto assigned based on the record name. } def FtzMode : ImmArgBool { let TrueValue = "ftz; let FalseValue = "noftz" } // specify that arg0 needs to be the rounding mode enum printer. ImmArg<ArgIndex<0>, RoundingMode.Fmt> // May be there is a way to make this simpler as follows: ImmEnumArg<ArgIndex<0>, RoundingMode> ImmBoolArg<ArgIndex<2>, FtzMode>
Note that we are not proposing this in as a part of this RFC, but could be considered in future.
Staging
Implementation of this RFC will need to happen in stages. We propose the following staging:
- Implement proposal (1) (C Style comments). There is already a draft PR for this here: [LLVM][AsmParser] Add support for C style comments by jurahul ¡ Pull Request #111554 ¡ llvm/llvm-project ¡ GitHub.
- Prep #0: [NFC] Introduce
AsmPrintOptions
with a singleIsForDebug
field in it. - Prep #1: [NFC] Rename
gen-intrinsic-enums
togen-intrinsic-decls
andIntrinsicEnums.inc
toIntrinsicDecs.inc
. - Add support for intrinisic arg names. Can be split into 2 parts:
- Add
Intrinsics.td
, intrinsic emitter, andIntrinsic::getArgNames
function, with test intrinsics for e2e testing, and unit tests that query arg names and verify them - Add AsmWriter support for argument names (add new flag to
AsmPrintOptions
and print names, extend llvm-dis to accept option-print-formatted-intrinsics
and add LLVM LIT tests to test the formatted intrinsic syntax with arg names)
- Add
- Add support for formatted immediate args. Can be split into 3 parts:
- Add
Intrinsics.td
, intrinsic emitter, andIntrinsic::getArgPrinters/getArgParsers
, and unit tests to query parsers and verify printing and parsing (unit test will query and call these functions, so no hookup in AsmPrinter, parser yet) - Add Asm printer support.
- Add Asm parser support (no intrinsic upgrade support).
- Add
- Add intrinsic upgrade support if desired. Again, need e2e testing of some form here.
- Add support for build time stripping out of this feature.
- Adopt these features for some existing intrinsics (may be a few NVVM intrinsics), migrate existing unit tests to them, and drop the test intrinsics that were added for e2e testing.