[RFC] Compress Intrinsic Name Table

[RFC] Compress Intrinsic Name Table

This RFC proposes changes in the intrinsic emitter and LLVM Core to reduce the size of the intrinsic name table. The scope is reasonably small, but the proposed IntrinsicName class below I think warrants a review before we can implement this (as well as whether this is worth doing in the first place).

Background and Motivation

LLVM’s TableGen intrinsic emitter backend generates a table of intrinsic names that is #included in the Intrinsics.cpp file and is used to get the base name of intrinsics given the Intrinsic::ID and for looking up an intrinsic (i.e., name → Intrinsic::ID mapping). LLVM intrinsic names follow a naming convention:

   intrinsic_name ::= "llvm" ['.' target_prefix]? '.' suffix

The intrinsic name table is an array of C strings, sorted by Intrinsic::ID and containing the base name of the intrinsic. The only exception is intrinsic #0, which is an invalid intrinsic, and its name is defined to be "not_intrinsic". (For overloaded intrinsics, base name is the name without the mangling suffix. For non-overloaded intrinsics, their name is same as base name).

This name table currently is ~385 KB of static data. The motivation for this RFC is that the name table contains redundant data in the form of the llvm. prefix for all strings, and .target prefix for all intrinsics for a given target. If we can eliminate these fixed known pieces from the table, the name table can potentially be reduced to 232KB, which is a 40% reduction in the size.
Additionally, when the llvm.target. prefix is stripped, there could be additional opportunities to compress the name table by using either StringToOffetTable or SequenceToOffsetTable classes to emit the string table (which dedup identical string and strings with a common suffix respectively).

Proposal

The proposal essentially is to do the above, where the emitted intrinsic name table will contain just the suffix portion of the name. On the LLVM Core side, this entails changes to functions that consume IntrinscNameTable.

  1. For lookupIntrinsicID and lookupLLVMIntrinsicByName, we already skip over the llvm.target suffix, so it should be straightforward to adopt this change in those functions (by first stripping llvm.target. from Name).
  2. The Intrinsic::getName and Intrinsic::getBaseName functions return a StringRef that points to one of the entries in the IntrinsicNameTable, so that would need to be changed. One option is to change these to return a std::string, but that has a runtime overhead of needing to materialize the entire name of the intrinsic every time this function is called. Other option is to defer that materialization till needed by adding a new Twine like class that represents the intrinsic name using its components (we cannot use Twine here as explained in the comment in Twine.h since it holds pointers to temporary stack objects).
   class IntrinsicName {
    const StringRef TargetPrefix;
    const Stringref Suffix; // First '.' stripped out.

    // Allow implicit conversion to std::string.
    std::string operator() const {
        if (TargetPrefix.empty())
            return "llvm." + Suffix;
        else
            return "llvm." + TargetPrefix + "." + Suffix;
    }
   };
   raw_ostream &operator<<(raw_ostream &OS, const IntrinsicName&);

Both getBaseName() and getName() will now return an IntrinsicName as opposed to a StringRef. They will lookup the Suffix by indexing into the IntrinsicNameTable array, and the TargetPrefix with a binary search in the TargetInfos table that is also generated by the intrinsic emitter and which captures, per target, the range of Intrinsic::ID that belong to that target (and has TargetPrefix as one of the fields in the table entry).

That means the cost of getBaseName will increase from O(1) to O(log(T)) where T is the number of targets, which is generally small (currently 20).

The automatic std::string conversion is the deferred materialization that will kick in for code like:

  M.getFunction(Intrinsic::getName(Intrinsic::type_test)))

The << operator will help avoid materializing the name for code like:

OS << "intrinsic(@" << Intrinsic::getBaseName(ID) << ')';

This means that the change should mostly be limited to the Intrinsics.cpp and Intrinsics.h files. If need be, additional functions can be added to this class (for ex, equality and other comparison with StringRef).

Open question

Handling of “not_intrinsic”. The name of the invalid intrinsic does not fit in the naming convention assumed. To accommodate that, we can add a bool to IntrinsicName to encode "not_intrinsic" and preserve the current behavior. However, the name "not_intrinsic" does not seem to be used anywhere, so under the new scheme, another option can also be that we can change it to "llvm.not_intrinsic", but that would change the behavior of code like:

auto *F = M.getFunction(Intrinsic::getName(Intrinsic::not_intrinsic)))
errs() << F->isIntrinsic() << '\n';

With the current code, this prints 0, since the name returned does not have a “llvm.” prefix, so it’s not considered an intrinsic. With the proposed change, it will be considered an intrinsic. Note that in both cases F->getIntrinsicID() will still return Intrinsic::not_intrinsic.

Staging

If the proposal is accepted, we can stage the implementation as follows.

  1. [Prep]: Address the open question by changing current “not_intrinsic” to “llvm.not_intrinsic” and see if it works If not, we need another bool in the IntrinsicName to encode "not_intrinsic" string.
  2. Implement this proposal as is.
  3. Add support for additional coincidental compression using StringToOffsetTable, so that common suffix is deduped, as well as suffix that’s a suffix of another one is shared (using
    StringToOffsetTable). We will experiment with both and decide (use either one of them or none if it does not provide additional benefit). This should not need any changes to LLVM Core.

Have you looked at the possibility of maintaining a table per target?

I like the goal here.

Two observations. First of all, I really don’t like the introduction of an IntrinsicName class just for this. It’s clunky. Is it really necessary for performance?

Skimming over calls to Intrinsic::getName, it looks like pretty much all of them are either directly converted to a std::string anyway or are fed into Module::getFunction, which expects a StringRef. Since you’d have to add a new overload to getFunction anyway, it seems wiser to instead add a Module::getIntrinsic(IntrinsicID) method. Any optimizations to that method can then happen under the hood.

Second, the C ABI has an LLVMIntrinsicGetName function that returns a pointer to a string, i.e. it expects a static allocation owned by LLVM. What I would support as an immediate change is to introduce an LLVMIntrinsicCopyName function analogous to LLVMIntrinsicCopyOverloadedName, and to deprecate LLVMIntrinsicGetName. I suppose you can still make progress by changing the implementation of LLVMIntrinsicGetName to create static allocations of concatenated intrinsic names on demand (with proper locking to support the multi-threaded case).

I was thinking about this RFC yesterday, and I just couldn’t convince myself that this is worth the effort. We’re trying to save ~150kB of read-only memory by introducing compile-time cost. All in all, personally I’d be more inclined to use 150kB more to simplify some code.

What do you think we are gaining here?

@nhaehnle Thanks, so I guess you are suggesting just have getBaseName()/getName() just return std::string? That may be ok, I don’t know if this affects compile time in a significant manner. I did think about the Module::getFunction overload that takes in an intrinsic-id, but I am not sure what optimization I’d do inside. So we may consider that Module::getFunction overload for just convenience.

@madhur13490 are you suggesting having just intrinsics from enabled backends (as defined by LLVM_TARGETS_TO_BUILD) being there in the table. No, I did not consider that given its scope will be much broader. Also, seems past attempt at doing something like this ( [RFC] Stripping unusable intrinsics - Project Infrastructure / LLVM Dev List Archives - LLVM Discussion Forums) did not go anywhere beyond a POC.

@kparzysz the gains are really in size. Size of LLVM binaries is a legitimate concern for several use cases. But yes, whether this 150K reduction is worth it was a doubt as well in my mind, hence the reason for this thread.

I do have one immediate thing I can do that will help here, which is change the IntrinsicNames table from an array of pointers to array of offsets in a single string (generated with StringToOffsetTable) which save ~7k and will come at free of cost (compile time wise).

One other thing I realize is that the intrinsic emitter generated functions getIntrinsicForClangBuiltin and getIntrinsicForMSBuiltin are linked into LLVM Core, but for lot of LLVM deployments, might be unused (for example, when LLVM is linked into a GPU driver/compiler that does not use Clang as a frontend). I’d assume with an LTO enabled build these should be getting stripped out in any case (so no need to do anything more here)?

About the RFC though, looks like there may be some more prep work needed for the C API, but then we are ok to pursue? I can start by just changing getName()/getBaseName() to return std::string and associated C API changes (add the copy one, mark existing one deprecated, and provide an impl that uses ConcurrentHashTable to store the strings in a thread safe manner), and then do the changes described here.

I have a draft PR for the first part (change getName to return std::string), will start review once checks complete. Additionally, we can add the convenience function as well as a replacement for M.getFunction(Intrinsic::getName(Intrinsic::type_test))). There already is an Intrinsic::getDeclaration that gets or creates an intrinsic declaration, so I am planning to rename it to Intrinsic::getOrCreateDeclaration (large, but mostly mechanical change) and then add Intrinsic::getDeclaration which will just be a lookup and no creation.

It would be best to change uses of getFunction(Intrinsic::getName) first, so we don’t unnecessarily eat the cost of creating the std::string temporarily.

I committed one change related to this: [NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration by jurahul · Pull Request #111752 · llvm/llvm-project (github.com) and I have seen multiple comments about whether we should revert it as it caused several out-of-tree codebases to break. My understanding was that LLVM’s C++ API is susceptible to such changes and there is no stability guarantees for the C++ API. I also did not preserve the old “getDeclaration” as my intent was to repurpose it immediately to have a different semantics consistent with Module::getFunction.

But based on the comments on the PR, I wanted to get some consensus on the next steps before I proceed further here. It seems the choices are:

  1. Revert the PR, restore getDeclaration and then add a new getDeclarationIfExists (suggested by @nikic as an option). Long term, we have similar APIs in Module:: and Intrinsic:: but with inconsistent naming. At this point though, at least some out-of-tree folks have migrated to new API.
  2. Go ahead with adding the new getDeclaration with semantics similar to Module::getFunction. Out-of-tree code that has not yet migrated to getOrInsertDeclaration will compile but potentially fail at runtime.
  3. Add a staging API findDeclaration just so that out-of-tree code uses of getDeclaration continue to fail to compile for a longer time and at some point down the road (4 weeks?) rename it to getDeclaration (Assuming in that timeframe very less or no out-of-tree adoption of findDeclaration happens).

It was also suggested in the PR comments that deprecating and repurposing getDeclaration needs to happen across LLVM releases (i.e., over a 6 month period) which does not seem consistent with C++ API stability guaranteed that LLVM offers (see Explicitly spelling out the lack of stability for the C++ API in the Developer Policy? - Project Infrastructure / LLVM Dev List Archives - LLVM Discussion Forums but I don’t think anything got documented eventually).

Can folks (@nikic, @nhaehnle @mehdi_amini) confirm if (3) above seems reasonable to be friendly to out-of-tree code and still have a reasonable timeframe to migrate to the new name?

Thanks
Rahul

The C++ API does not have stability guarantees. At the same time, if you are performing large scale changes, it is courteous to make them in a way that reduces the amount of inconvenience caused to other developers and users of LLVM.

If your PR needs to change 150 files, that’s a good indication that you should take additional care on how to phase in your change.


The reason why I would prefer not to reuse the same name with completely different (opposite!) semantics in the same release, is that many users of LLVM do not have the luxury of following tip-of-tree development. They will only update to the next version when it is released. Changing the semantics of a method in this way seems like an unnecessary discourtesy to me. Even long term, the change will be confusing when looking at code targeting different LLVM version.

I don’t think there is anything bad about an end state where getDeclaration has been replaced with getOrInsertDeclaration and getDeclarationIfExists. Arguably, it’s a better outcome because the intent is more explicit, and no implicit assumptions are made by the API.

Thanks. So I guess what you are suggesting is a 4th option here: readd getDeclaration() (which should ideally have been a part of the rename PR) to help out-of-tree code, mark it deprecated and to be deleted in the next LLVM release (i.e, it can be deleted in LLVM upstream repo as soon as a new release branch is cut out) and then make the new function getDeclarationIfExists, and then when the next release branch is created, delete getDeclaration. Does that sound right?

That sounds reasonable to me.

Thanks. I will start working on the next steps based on this.

Related, is there a way to tag stuff we want to delete in the next release? Like a keyword in a comment or something like that in the code? I can always file an issue and keep it assigned to me I guess, but I can imagine something like this to easily fall through the cracks.

LLVM_DEPRECATED

Thanks. I did use that when I added back getDeclaration(). I guess after each release branch is created (and stabilized) there is some effort to remove all functions marked deprecated (in the main branch). I’ll create an issue to keep track of removing it.

While I am making progress on prep work for this, multiple folks on this thread and on the review have suggested if instead of this, we should pursue stripping out intrinsics for targets that have not been enabled. I looked at the earlier attempt at this in [RFC] Stripping unusable intrinsics - Project Infrastructure / LLVM Dev List Archives - LLVM Discussion Forums, where the discussion seemed to end without any conclusion or action.

After reading through that thread, it seems there are 2 sources of bloat caused by presence of unused intrinsics in the code:

  1. Intrinsic emitter generated code and data to support these intrinsics. Generally, it size is proportional to the number of intrinsics.
  2. Common code that refers to target intrinsics by their Intrinsic::ID enum,

I am wondering if it’s worth pursuing something that is smaller in scope, but can address (1) and potentially/partially (2). My idea was:

  1. We will always generate all intrinsic enums, for both enabled and disabled targets, just like what we do today. However, we will sort them such that all enabled target intrinsics are sorted before disabled targets. This will partition the intrinsic ID enums into an active and inactive set, as follows:
0, // not_intrinsic
<target independent IDs>,
<enabled target 0 IDs>,
<enabled target 1 IDs>,
...
<enabled target N IDs>,
num_active_intrinsics = ...,
<disabled target 0 IDs>
<disabled target 1 IDs>
...
<disabled target N IDs>
num_intrinsics = ...,

This will keep code that refers to intrinsic ID enum keep compiling.

  1. As far as LLVM Core is concerned, any intrinsis ID >= num_active_intrinsics will be considered as unknown intrinsic. What that means is that any intrinsic code or data generated by emitter will only handle the first num_active_intrinsics intrinsic IDs. For the rest, its treatment will be similar to not_intrinsics. For example, Intrinsic::getName might return a pre-canned “unknown_intrinsic” string for such intrinsics. And creating an intrinsic using one of these IDs will instead create an unknown intrinsic with ID = 0. For example, the code below:
Function *F = Intrinsic::getOrInsertDeclaration(M, Intrinsic::x86_int);
assert(F->getIntrinsicID() == Intrinsic::x86_int);

will fail since getIntrinsicID will return 0 when X86 target is not enabled.

  1. Most (if not all) common code refers to target dependent IDs in switch-cases based on Function::getIntrinsicID(). This function is already inlined. We will add a __builtin_assume(IntID < Intrinsic::num_active_intrinsics) to hint to the compiler that the ID returned by this is within this active range. And with that, clang might be able to optimize out cases out of switch-cases in common code that refer to inactive intrinsics.

This will change the existing behavior of code when dealing with target intrinsics that are not enabled, and we will likely need to fix and update target independent tests that use such target intrinsics, but I wanted to check this is an approach worth considering.

@madhur13490 @RKSimon @nhaehnle

Changing the order of intrinsic enums would create ABI incompatibility between LLVM versions with different sets of enabled targets, and I’d prefer to avoid that. I would leave the intrinsic numbering alone.

What you suggested about compiling out the intrinsic id recognition logic for disabled targets, so they always receive id 0, not_intrinsic, seems reasonable.

Is that a common use case? If we want stable enum values and still trim rest of the data for inactive intrinsics, we will need a Intrinsic::ID → compressed id map (say 16 bit table) and use that indirection. But that can all be contained within the Intrinsics.cpp file. However, under this model, if say you have you have one LLVM component with targets A, B, C enabled and another LLVM component with targets X, Y, Z enabled, only intrinsics that are active in both components (i.e., any target independent ones or the intersection of enabled targets for the 2 components) will function correctly (that is, just stable enum values will not make them functional across the 2 components). Will that be ok?

Another way to achieve enum stability is to stash a target id enum (8 bits) and intrinsic id within the target (say 16 bit) into the intrinsic ID enum. That will make the compressed id computable without needing an extra table and might have other advantages.

This sounds like it might break the upgrade path for LTO. It is expected that new LTO dylibs are able to link old bitcode, and if you were to add a new target in a later release, re-numbering these because of that would break this workflow if it happened arbitrarily. If these were preserved for the intersection of enabled backends, I think it might be okay?

Intrinsic::ID enums are not encoded in the bitcode file, so I am not sure what why it would break. Ignoring the new target, a later release may add new intrinsics to existing targets and that also renumbers all intrinsic IDs and I assume that does not break the LTO flow. It seems to me that Intrinsic::ID enums are only required to be consistent within an LLVM build. Even the C API does not expose them directly, but you need to query the ID of an intrinsic using its name first (or an existing Function).