Is clang-tidy performance-faster-string-find option wrong for not optimized std::string?

Hi,

There is clang-tidy option performance-faster-string-find that detects
the use of the std::basic_string::find method (and related ones) with a
single character string literal as argument. According to it, the use of
a character literal is more efficient.

However, I performed a benchmark and noticed it is the case only for
small string (when the small string optimization is used).

Here is my code:

#include <benchmark/benchmark.h>
#include <string>

static void BM_string_literal(benchmark::State& state)
{
std::string s;

for \(int i = 0; i &lt; state\.range\(0\); i\+\+\)
    s \+= &#39;a&#39;;

s \+= &#39;b&#39;;

benchmark::DoNotOptimize\(s\.data\(\)\);
benchmark::ClobberMemory\(\);
size\_t pos;

for \(auto \_ : state\)
\{
    benchmark::DoNotOptimize\(pos = s\.find\(&quot;b&quot;\)\); // &quot;b&quot; is a string

literal, it should be longer
benchmark::ClobberMemory();
}
}

BENCHMARK(BM_string_literal)->RangeMultiplier(2)->Range(8, 8<<10);;

static void BM_char_literal(benchmark::State& state)
{
std::string s;

for \(int i = 0; i &lt; state\.range\(0\); i\+\+\)
    s \+= &#39;a&#39;;

s \+= &#39;b&#39;;

benchmark::DoNotOptimize\(s\.data\(\)\);
benchmark::ClobberMemory\(\);
size\_t pos;

for \(auto \_ : state\)
\{
    benchmark::DoNotOptimize\(pos = s\.find\(&#39;b&#39;\)\); // &#39;b&#39; is a char

literal, it should be faster
benchmark::ClobberMemory();
}
}
BENCHMARK(BM_char_literal)->RangeMultiplier(2)->Range(8, 8<<10);;

BENCHMARK_MAIN();

According to clang-tidy, I should prefer the code in BM_char_literal
which is faster. However, the results of the benchmark are the following:

[BM_string_literal vs. BM_char_literal]/8
-0.0760 -0.0760 9 8
9 8
[BM_string_literal vs. BM_char_literal]/16
-0.0757 -0.0767 9 8
9 8
[BM_string_literal vs. BM_char_literal]/32
+0.3812 +0.3809 4 5
4 5
[BM_string_literal vs. BM_char_literal]/64
+0.1609 +0.1602 4 5
4 5
[BM_string_literal vs. BM_char_literal]/128
+0.1946 +0.1944 4 5
4 5
[BM_string_literal vs. BM_char_literal]/256
+0.1616 +0.1623 6 6
6 6
[BM_string_literal vs. BM_char_literal]/512
+0.2225 +0.2211 7 9
7 9
[BM_string_literal vs. BM_char_literal]/1024
+0.1052 +0.1051 11 12
11 12
[BM_string_literal vs. BM_char_literal]/2048
+0.0789 +0.0781 18 20
18 20
[BM_string_literal vs. BM_char_literal]/4096
+0.0349 +0.0348 31 32
31 32
[BM_string_literal vs. BM_char_literal]/8192
+0.0053 +0.0042 56 57
56 57

We can see it is faster using a string_literal when the std::string is
at least 32 characters long (I can reproduce these results again and
again, it is not a variance issue).

Is clang-tidy wrong or is there a bug in libc++? Or is my benchmark
wrong somewhere?

To reproduce my case, here are the commands I used (on a debian-stable):

apt-get -y install clang libc++-dev libc++abi-dev git cmake python
python-pip
git clone https://github.com/google/benchmark.git
git clone https://github.com/google/googletest.git benchmark/googletest
pushd benchmark
cmake -E make_directory "build"
cmake -E chdir "build" cmake -DCMAKE_C_COMPILER=clang
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_FLAGS="-stdlib=libc++" -DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON ../
cmake --build "build" --config Release --target install
popd
pip install scipy
clang++ -stdlib=libc++ -O3 bench.cpp -lbenchmark -lpthread -o bench
./benchmark/tools/compare.py filters ./bench BM_string_literal
BM_char_literal

Thanks.