There is a desire to be able to create constexpr StringRefs to avoid
static initializers for global tables of/containing StringRefs.
Creating constexpr StringRefs isn't trivial as strlen isn't portably
constexpr and std::char_traits<char>::length is only constexpr in
C++17.
Alp Toker tried to create constexpr StringRefs for strings literals by
subclassing StringRef: https://reviews.llvm.org/rL200187
This is a verbose change where needed at string literal call sites.
Mehdi AMINI tried to add a constexpr constructor for string literals
by making the constructor from const char * explicit: https://reviews.llvm.org/D25639
This is a verbose change at every non-literal call site.
This only works with assignment syntax.
I've suggested using a user-defined literal: https://reviews.llvm.org/D26332
This is a small change where needed at string literal call sites.
C++17 adds a UDL for std::string_view, so it's not an unusual idea.
There is resistance to using a UDL as they can introduce a surprising
and novel syntax for calling functions.
From: "Malcolm Parsons via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Thursday, November 24, 2016 8:59:25 AM
Subject: [llvm-dev] RFC: Constructing StringRefs at compile time
Hi all,
There is a desire to be able to create constexpr StringRefs to avoid
static initializers for global tables of/containing StringRefs.
Creating constexpr StringRefs isn't trivial as strlen isn't portably
constexpr and std::char_traits<char>::length is only constexpr in
C++17.
Why don't we just create our own traits class that has a constexpr length, and then we can switch over to the standard one when we switch to C++17?
GCC and Clang treat __builtin_strlen as constexpr.
MSVC 2015 doesn't support C++14 extended constexpr. I don't know how
well it optimises a recursive strlen.
This works as an optimisation for GCC and Clang, and doesn't make
things worse for MSVC:
You’d at least want an assert in there (that N - 1 == strlen(Str)) in case a StringRef is ever constructed from a non-const char buffer that’s only partially filled.
But if we can write this in such a way that it performs well on good implementations - that seems sufficient. If getting good performance out of the compiler means bootstrapping - that’s pretty much the status quo already, as I understand it.
So I wouldn’t personally worry too much about performance degredation when built with MSVC - if, when building a stage 2 on Windows (building Clang with MSVC build Clang) you do end up with a compiler with the desired performance characteristics - then that’s probably sufficient.
The only reason I didn’t go with this solution was that an MSVC built clang would take a long time to startup if StringRef are present in global tables.
So I wouldn’t personally worry too much about performance degredation when built with MSVC - if, when building a stage 2 on Windows (building Clang with MSVC build Clang) you do end up with a compiler with the desired performance characteristics - then that’s probably sufficient.
Hold on there—we deliver an MSVC-built Clang to our licensees, and I would really rather not pessimize it.
Jumping in on Paul’s post, but we work on the same product so I can give at least one answer here, which is debugging, including post-mortem debugging of minidumps. We keep the PDBs from our build server so we can ship an executable without any embedded debug info but can still get a decent(ish) debugging experience with symbols and watch window values from minidumps.
Nothing would please me more than to switch to shipping a selfhost (subject to quite a thorough comparison/evaluation of all the factors) so I’m watching the PDB/codeview work with interest.
OK - good to know. (not sure we’re talking about pessimizing it - just not adding a new/possible optimization, to be clear)
Okay, glad to hear it. I admit I wasn’t following the thread all that closely.
Just out of curiosity - are there particular reasons you prefer or need to ship an MSVC built version, rather than a bootstrapped Clang?
We experiment with a bootstrapped Clang from time to time. The benefit has never been clearly worth the additional cost of internally supporting a Windows-target Clang. (Which is non-trivial; yes it’s still Clang, but it’s a different target OS, different object-file format, different debug-info format, etc.)
This does not seem that clear to me. The motivation seems to be able to create global table of StringRef, which we don’t do because the lack fo constexpr of static initializers right now.
Moving forward it would mean making clang a lot slower when built with MSVC if we were going this route.
OK - good to know. (not sure we’re talking about pessimizing it - just not adding a new/possible optimization, to be clear)
Okay, glad to hear it. I admit I wasn’t following the thread all that closely.
Just out of curiosity - are there particular reasons you prefer or need to ship an MSVC built version, rather than a bootstrapped Clang?
We experiment with a bootstrapped Clang from time to time. The benefit has never been clearly worth the additional cost of internally supporting a Windows-target Clang. (Which is non-trivial; yes it’s still Clang, but it’s a different target OS, different object-file format, different debug-info format, etc.)
So if opportunities came up that provided greater benefit to a self-host, then you’d have a motivation to switch… - not sure that should motivate the rest of the project to work hard to make MSVC performance important.
(but this is just me - I doubt my opinion changes the way others will behave all that much)
OK - good to know. (not sure we’re talking about pessimizing it - just not adding a new/possible optimization, to be clear)
This does not seem that clear to me. The motivation seems to be able to create global table of StringRef, which we don’t do because the lack fo constexpr of static initializers right now.
Moving forward it would mean making clang a lot slower when built with MSVC if we were going this route.
Ah, fair - perhaps I misunderstood/misrepresented, apologies. Figured this was just an attempt to reduce global initializers in arrays we already have. Any pointers on where the motivation is described/discussed?
This thread started with: "There is a desire to be able to create constexpr StringRefs to avoid static initializers for global tables of/containing StringRefs.”
I don’t have more information, but maybe Malcolm can elaborate?
The fact that the templatized constructor falls down because of the possibility of initializing StringRef with a stack-allocated char array kills that idea in my mind.
I feel like the only two reasonable solutions are
allow UDL for this case, document that this is an exception and that UDLs are still not permitted anywhere else, and require (by policy, since I don’t know of a way to have the compiler force it) that this UDL be used only in global constructors. One idea to help “enforce” this policy would be to give the UDL a ridiculously convoluted name, like string_ref_literal, so that one would have to write “foo”_string_ref_literal, and then provide a macro like #define LITERAL(x) x_string_ref_literal, so that the user writes StringRef s[] = { LITERAL("a"), LITERAL("b") }; I'm not sure if that's better or worse than StringRef s = { “a”_sr, “b”_sr };`, but at least it’s greppable this way.
I prefer constexpr llvm_strlen() over StringLiteral because it doesn't
require code changes outside StringRef - all StringRefs constructed
from a literal can benefit. But there are concerns about MSVC.
I prefer StringLiteral over UDL because the type requires code
changes, but the values don't.
I prefer StringLiteral over explicit StringRef constructor because it's safer.