Accelerating TLI getLibFunc lookups

TLDR: Figuring out whether a declaration is a TLI LibFunc is slow. We hammer that path in CGP. I’m proposing storing the ID of a TLI LibFunc in the same IntID field in Function we use for IntrinsicID to make that fast.

Looking into a compile time issue during codegen (LLC) for a large IR file, I came across something interesting. Due to the presence of a very large number of intrinsics in the particular example, we were spending almost 30% of time in CodeGenPrep::optimizeCallInst, and within that, almost all of it in the FortifiedLibCallSimplifier. Now, since the IR file in question has no fortified libcalls, that seemed a bit odd.

Looking into, it turns out that figuring out that an arbitrary direct call is not a call to a LibCall requires a full name normalization and table lookup that a successful one does. We could simply make the lookup itself faster - it looks like we could probably tablegen a near optimal character switch lookup table - but that still leaves us with the normalization. We could cache the lookup, but then we have an analysis invalidation problem for all users of TLI. Not unsolvable, but not fun if we have a better option.

Instead, I noticed that we have no overlap between intrinsics, and target library functions. Assuming we’re happy with that, and don’t see that changing in the future, that gives us an opportunity. We could cache the libfunc ID into the Function itself, just like we do for intrinsics today.

What would that look like in practice you ask?

  • We’d move the definition of LibFunc into include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of TLI)
  • We’d change IntID field in GlobalValue to be union of IntrinsicID and LibFunc.
  • We’d change Function::getIntrisicID to check the HasLLVMReservedName flag (already existing), and return Intrinsic::not_inrinsic value if not set.
  • We’d add a corresponding getLibFuncID, and isLibFunc function to Function.
  • We’d modify recalculateIntrinsicID to compute the libfunc enum as well.

The tradeoff is that function construction and renaming would become slightly slower, but determining whether a function was a library function would become fast. We could also populate the value lazily, but that seems like complexity with little benefit.

Thoughts? Objections? Better ideas?

If folks are on board with this, I’m happy to prepare a patch.

Philip

So if we know that an there are no intrinsic calls being simplified
by the FortifiedLibCallSimplifier, then I guess that we could
early out and return false inside the if

  IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
  if (II) {
     ...
  }

that is before the FortifiedLibCallSimplifier simplifications.

That would at least reduce the amount of time spend in the
FortifiedLibCallSimplifier trying to lookup intrinsics.

And if there are some intrinsics that should be dealt with by
the FortifiedLibCallSimplifier we could make sure we fallthrough
by adding some cases to the switch inside that if statement.

Such a solution is ofcourse not as general as the one you are
suggesting, but it might be a simple solution if the problem
is that we try to lookup intrinsic calls inside the
FortifiedLibCallSimplifier.

/Björn

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Philip Reames
via llvm-dev
Sent: den 24 april 2019 02:50
To: llvm-dev <llvm-dev@lists.llvm.org>
Subject: [llvm-dev] Accelerating TLI getLibFunc lookups

TLDR: Figuring out whether a declaration is a TLI LibFunc is slow. We
hammer that path in CGP. I'm proposing storing the ID of a TLI LibFunc in
the same IntID field in Function we use for IntrinsicID to make that fast.
Looking into a compile time issue during codegen (LLC) for a large IR file,
I came across something interesting. Due to the presence of a very large
number of intrinsics in the particular example, we were spending almost 30%
of time in CodeGenPrep::optimizeCallInst, and within that, almost all of it
in the FortifiedLibCallSimplifier. Now, since the IR file in question has
no fortified libcalls, that seemed a bit odd.
Looking into, it turns out that figuring out that an arbitrary direct call
is *not* a call to a LibCall requires a full name normalization and table
lookup that a successful one does. We could simply make the lookup itself
faster - it looks like we could probably tablegen a near optimal character
switch lookup table - but that still leaves us with the normalization. We
could cache the lookup, but then we have an analysis invalidation problem
for all users of TLI. Not unsolvable, but not fun if we have a better
option.
Instead, I noticed that we have no overlap between intrinsics, and target
library functions. Assuming we're happy with that, and don't see that
changing in the future, that gives us an opportunity. We could cache the
libfunc ID into the Function itself, just like we do for intrinsics today.
What would that look like in practice you ask?
• We'd move the definition of LibFunc into
include/IR/TargetLibraryFunctions.h/def (only the enum, not the rest of
TLI)
• We'd change IntID field in GlobalValue to be union of IntrinsicID and
LibFunc.
• We'd change Function::getIntrisicID to check the HasLLVMReservedName flag
(already existing), and return Intrinsic::not_inrinsic value if not set.
• We'd add a corresponding getLibFuncID, and isLibFunc function to
Function.
• We'd modify recalculateIntrinsicID to compute the libfunc enum as well.
The tradeoff is that function construction and renaming would become
slightly slower, but determining whether a function was a library function
would become fast. We could also populate the value lazily, but that seems
like complexity with little benefit.
Thoughts? Objections? Better ideas?
If folks are on board with this, I'm happy to prepare a patch.
Philip

So if we know that an there are no intrinsic calls being simplified
by the FortifiedLibCallSimplifier, then I guess that we could
early out and return false inside the if

  IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
  if (II) {
     ...
  }

that is before the FortifiedLibCallSimplifier simplifications.

That would at least reduce the amount of time spend in the
FortifiedLibCallSimplifier trying to lookup intrinsics.

And if there are some intrinsics that should be dealt with by
the FortifiedLibCallSimplifier we could make sure we fallthrough
by adding some cases to the switch inside that if statement.

Such a solution is ofcourse not as general as the one you are
suggesting, but it might be a simple solution if the problem
is that we try to lookup intrinsic calls inside the
FortifiedLibCallSimplifier.

Your framing does work for this particular use case. This is actually
how I figured out what was going on. But while CGP is the codepath
which showed up hot in this example, I'm sure there are others.
Filtering within getLibFunc is also a possibility, which is slightly
more general. But both variants leave open the same problem for a
module which is heavy on non-intrinsic non-libfunc calls. The advantage
of my proposed scheme is that all calls are treated equally.