[PATCH] strlen -> strnlen optimization

This addition converts strlen() calls to strnlen() when the result is
compared to a constant. For example, the following:

strlen(s) < 5

Becomes:

strnlen(s, 5) < 5

That way, we don't have to walk through the entire string. There is the
added overhead of maintaining a counter when using strnlen(), but I
thought I'd start with the general case. It may make sense to only use
this optimization with small constants. On the other hand, the impact of
the counter may be negligible in many or most cases due to
hardware-level optimizations.

I often notice the idiom of comparing a packet or buffer against a
minimum length as a sanity check. I was surprised to see that this isn't
optimized, so I thought I'd give it a try.

nlewycky on IRC seemed to think it was a good idea. I'm interested to
hear what others think.

I have little C++ experience, so there are likely improvements to be
made to my patch. I tried to adhere to the style and conventions of the
surrounding code.

This reintroduces emitStrNLen(), which was removed a couple months ago
in r253514. The only change is a getParent()->getParent() -->
getModule() conversion, which was done in emitStrLen() after
emitStrNLen() was removed.

This tests successfully for me on Ubuntu 14.04.3.

Thanks for your time,
Michael

Index: lib/Transforms/Utils/BuildLibCalls.cpp

Hi,

(llvm-dev to BCC)

Thanks for this!

Patches are supposed to be sent to llvm-commits and not llvm-dev (http://llvm.org/docs/DeveloperPolicy.html ).
Also it seems that your patch was inline in the email and not attached? I couldn't apply it for some reason.

First off, you're missing test cases for each ICMP possibility, and one for the wrapping case (see test/Transforms/InstCombine/simplify-libcalls.ll if you need examples).

Also, see other comments inline below.

This addition converts strlen() calls to strnlen() when the result is
compared to a constant. For example, the following:

strlen(s) < 5

Becomes:

strnlen(s, 5) < 5

That way, we don't have to walk through the entire string. There is the
added overhead of maintaining a counter when using strnlen(), but I
thought I'd start with the general case. It may make sense to only use
this optimization with small constants. On the other hand, the impact of
the counter may be negligible in many or most cases due to
hardware-level optimizations.

I often notice the idiom of comparing a packet or buffer against a
minimum length as a sanity check. I was surprised to see that this isn't
optimized, so I thought I'd give it a try.

nlewycky on IRC seemed to think it was a good idea. I'm interested to
hear what others think.

I have little C++ experience, so there are likely improvements to be
made to my patch. I tried to adhere to the style and conventions of the
surrounding code.

This reintroduces emitStrNLen(), which was removed a couple months ago
in r253514. The only change is a getParent()->getParent() -->
getModule() conversion, which was done in emitStrLen() after
emitStrNLen() was removed.

This tests successfully for me on Ubuntu 14.04.3.

Thanks for your time,
Michael

Index: lib/Transforms/Utils/BuildLibCalls.cpp

--- lib/Transforms/Utils/BuildLibCalls.cpp (revision 260011)
+++ lib/Transforms/Utils/BuildLibCalls.cpp (working copy)
@@ -52,6 +52,28 @@
  return CI;
}

+Value *llvm::emitStrNLen(Value *Ptr, Value *MaxLen, IRBuilder<> &B,
+ const DataLayout &DL, const TargetLibraryInfo *TLI) {

Interestingly it was removed from the implementation but not from the header...

+ if (!TLI->has(LibFunc::strlen))
+ return nullptr;
+
+ Module *M = B.GetInsertBlock()->getModule();
+ AttributeSet AS[2];
+ AS[0] = AttributeSet::get(M->getContext(), 1, Attribute::NoCapture);
+ Attribute::AttrKind AVs[2] = { Attribute::ReadOnly, Attribute::NoUnwind };
+ AS[1] = AttributeSet::get(M->getContext(), AttributeSet::FunctionIndex, AVs);
+
+ LLVMContext &Context = B.GetInsertBlock()->getContext();
+ Constant *StrNLen = M->getOrInsertFunction(
+ "strnlen", AttributeSet::get(M->getContext(), AS),
+ DL.getIntPtrType(Context), B.getInt8PtrTy(), DL.getIntPtrType(Context), nullptr);
+ CallInst *CI = B.CreateCall(StrNLen, {castToCStr(Ptr, B), MaxLen}, "strnlen");
+ if (const Function *F = dyn_cast<Function>(StrNLen->stripPointerCasts()))
+ CI->setCallingConv(F->getCallingConv());
+
+ return CI;
+}
+
Value *llvm::emitStrChr(Value *Ptr, char C, IRBuilder<> &B,
                        const TargetLibraryInfo *TLI) {
  if (!TLI->has(LibFunc::strchr))
Index: lib/Transforms/Utils/SimplifyLibCalls.cpp

--- lib/Transforms/Utils/SimplifyLibCalls.cpp (revision 260011)
+++ lib/Transforms/Utils/SimplifyLibCalls.cpp (working copy)
@@ -555,6 +555,49 @@
  if (isOnlyUsedInZeroEqualityComparison(CI))
    return B.CreateZExt(B.CreateLoad(Src, "strlenfirst"), CI->getType());

+ // strlen(x) < y --> strnlen(x, y+1) < y
+ //
+ // We ensure that there is only one user to avoid interfering with
+ // CSE.
+ if (!CI->hasOneUse() || !TLI->has(LibFunc::strnlen))
+ return nullptr;

You are adding this optimization inside a function that try multiple optimizations in sequence. An early return is not nice in this context because it prevent from adding other transformation in the same function afterwards. The cleanest way to keep early exits and avoid deep nesting is to extract this transformation in a static helper function.

+ User *U = CI->user_back();
+ ICmpInst *IC;
+ if (!(IC = dyn_cast<ICmpInst>(U)))
+ return nullptr;

ICmpInst *IC = dyn_cast<ICmpInst>(U);
if (!IC)
   return nullptr;

+ IntegerType *SizeType = DL.getIntPtrType(B.GetInsertBlock()->getContext());
+ Value *LHS = IC->getOperand(0), *RHS = IC->getOperand(1);
+
+ ConstantInt *Con;
+ if (!((Con = dyn_cast<ConstantInt>(LHS)) || (Con = dyn_cast<ConstantInt>(RHS))))
+ return nullptr;

I think you can assume the constant to be on the right (and remove the swapOperands):

ConstantInt *Con = dyn_cast<ConstantInt>(IC->getOperand(1));
if(!Con)
  return nullptr;

+ uint64_t con_val = Con->getZExtValue();
+
+ if (RHS == CI)
+ IC->swapOperands();
+
+ switch (IC->getPredicate()) {
+ case ICmpInst::ICMP_EQ:
+ case ICmpInst::ICMP_NE:
+ case ICmpInst::ICMP_UGT:
+ case ICmpInst::ICMP_UGE:
+ case ICmpInst::ICMP_ULE:
+ case ICmpInst::ICMP_SGT:
+ case ICmpInst::ICMP_SGE:
+ case ICmpInst::ICMP_SLE:
+ // XXX: check for wrapping
+ if (con_val == UINT64_MAX)
+ return nullptr;

ISTM that the wrapping check assumes that "sizeof(size_t) == 8", which is not always true.

Best,

Mehdi

This is an optimisation I am very vary of two two reasons:
(1) strnlen is only POSIX2008, so missing on many slightly older
systems.
(2) I do believe that the counting is quite harmful to the performance.
Additionally, I wouldn't be surprised if many systems don't consider
strnlen to be the fast path, so it would be even worse than using e.g.
memchr for this.

Joerg

Joerg Sonnenberger wrote:

> This addition converts strlen() calls to strnlen() when the result is
> compared to a constant. For example, the following:
>
> strlen(s) < 5
>
> Becomes:
>
> strnlen(s, 5) < 5
>
> That way, we don't have to walk through the entire string. There is the
> added overhead of maintaining a counter when using strnlen(), but I
> thought I'd start with the general case. It may make sense to only use
> this optimization with small constants. On the other hand, the impact of
> the counter may be negligible in many or most cases due to
> hardware-level optimizations.

This is an optimisation I am very vary of two two reasons:
(1) strnlen is only POSIX2008, so missing on many slightly older
systems.

That's why we check whether it exists.

glibc has had strnlen since 1996, OpenBSD since 2010, FreeBSD since
2009, OS X since ~2010. I expect that the vast majority of Unix-like
systems have it. Regardless, I don't think this optimization does any
significant harm to systems that lack strnlen.

(2) I do believe that the counting is quite harmful to the performance.

I'll benchmark. If we limit the optimization to cases where the constant
is (for example) < 20, a small performance hit from the counter may not
matter.

Additionally, I wouldn't be surprised if many systems don't consider
strnlen to be the fast path, so it would be even worse than using e.g.
memchr for this.

The strnlen implementations I've looked at don't seem much different
from their strlen counterparts in the common libc's I looked at. For
example, glibc has asm implementations of both for common platforms.
FreeBSD only has asm implementations of strlen, but those are only for
ARM and MIPS. OpenBSD doesn't use asm for either. I wouldn't be
surprised if the extremely simple nature of the functions means that asm
implementations aren't noticeably faster.

Joerg Sonnenberger wrote:
> > This addition converts strlen() calls to strnlen() when the result is
> > compared to a constant. For example, the following:
> >
> > strlen(s) < 5
> >
> > Becomes:
> >
> > strnlen(s, 5) < 5
> >
> > That way, we don't have to walk through the entire string. There is the
> > added overhead of maintaining a counter when using strnlen(), but I
> > thought I'd start with the general case. It may make sense to only use
> > this optimization with small constants. On the other hand, the impact of
> > the counter may be negligible in many or most cases due to
> > hardware-level optimizations.
>
> This is an optimisation I am very vary of two two reasons:
> (1) strnlen is only POSIX2008, so missing on many slightly older
> systems.

That's why we check whether it exists.

glibc has had strnlen since 1996, OpenBSD since 2010, FreeBSD since
2009, OS X since ~2010. I expect that the vast majority of Unix-like
systems have it. Regardless, I don't think this optimization does any
significant harm to systems that lack strnlen.

Breaking correct code is the poster child of significant harm.

> (2) I do believe that the counting is quite harmful to the performance.

I'll benchmark. If we limit the optimization to cases where the constant
is (for example) < 20, a small performance hit from the counter may not
matter.

> Additionally, I wouldn't be surprised if many systems don't consider
> strnlen to be the fast path, so it would be even worse than using e.g.
> memchr for this.

The strnlen implementations I've looked at don't seem much different
from their strlen counterparts in the common libc's I looked at. For
example, glibc has asm implementations of both for common platforms.
FreeBSD only has asm implementations of strlen, but those are only for
ARM and MIPS. OpenBSD doesn't use asm for either. I wouldn't be
surprised if the extremely simple nature of the functions means that asm
implementations aren't noticeably faster.

I'm a bit surprised by FreeBSD, since even on amd64 the trivial lowering
is almost always quite a bit slower than optimized versions...

Joerg

> Joerg Sonnenberger wrote:
> > > This addition converts strlen() calls to strnlen() when the result is
> > > compared to a constant. For example, the following:
> > >
> > > strlen(s) < 5
> > >
> > > Becomes:
> > >
> > > strnlen(s, 5) < 5
> > >
> > > That way, we don't have to walk through the entire string. There is
the
> > > added overhead of maintaining a counter when using strnlen(), but I
> > > thought I'd start with the general case. It may make sense to only
use
> > > this optimization with small constants. On the other hand, the
impact of
> > > the counter may be negligible in many or most cases due to
> > > hardware-level optimizations.
> >
> > This is an optimisation I am very vary of two two reasons:
> > (1) strnlen is only POSIX2008, so missing on many slightly older
> > systems.
>
> That's why we check whether it exists.
>
> glibc has had strnlen since 1996, OpenBSD since 2010, FreeBSD since
> 2009, OS X since ~2010. I expect that the vast majority of Unix-like
> systems have it. Regardless, I don't think this optimization does any
> significant harm to systems that lack strnlen.

Breaking correct code is the poster child of significant harm.

This concern is important but is within LLVM's realm to handle. If we do
not believe the target has strnlen, then TLI should return false for
TLI->has(LibFunc::strnlen).

We have an existing mechanism for determining whether a given target platform has a particular common library function. There is nothing new about the transform being proposed here. This is a problem we know how to solve.

Philip