[Proposal]: Fast non-portable extension to retrieve a type name from a pointer as a GV: __builtin_type_name(T* Ptr, [int Style])

The concept itself is pretty trivial, the builtin takes one or two arguments, one of which
has to be a pointer to an arbitrary type, this makes it particularly useful with C++
auto declarations or even printing the type by simply casting a null pointer to
the type in question. It's qualifiers are retained for the sake of printing them
depending on the requested style.

Implementation
^^^^^^^^^^^^^^

const char* TypeName = __builtin_type_name(T* Ptr, [int Style])

After validating either 1 or 2 argument form (ie. Pointed to type, and whether it's a
record declaration), SemaChecking will set the return type to TheCall->setType(
Context.getPointerType(Context.CharTy.withConst())) leaving it for Clang's CodeGen
to deal with.

Second argument is used to control the output format in form of a bitmask
passed down to PrintingPolicy (as I port this I will decouple the values so the
builtin's behavior isn't dependent on further changes in PrintingPolicy. At
which point the type is retrieved using `getAsString` and stored in the CGM
with `GetAddrOfConstantCString` allowing coalescing of those strings later
during linking. as it's cast to a Int8Ptr.

This is all done in Clang without needing to add anything to the LLVM core.

Things left to do before submitting for code review
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* There is no test coverage, so I need to write up a test file, comprehensively
  testing cases I haven't considered before like Objective-C pointers, any
  pointer to a special Clang type that may behave unexpectedly, complex
  C++ templates (or simple ones, was my main use of it). Target is IR since
  Clang fully lowers it, so in theory it's platform agnostic providing there is
  enough space (the original use was for Embedded C++ with no RTTI.
* While this is out of scope, a GUID buffer as a a style would provide a form
  type IDs in absence of RTTI or alternatively smaller types like u32 or u16
  (aimed at single module builds where they can remain unique, ie. embedded
  systems with a kernel using those to represent kernel object types).

Rationale
^^^^^^^^^
It's clear that this functionality is desired outside of embedded domain,
typically with hacks involving a lambda and __PRETTY_FUNCTION__, on case
being in `lib/Support/TypeName.h` in LLVMSupport. Many non-portable hacks
that depend on compiler versions exist. This doesn't aim to be portable,
just to be compact, not have a runtime cost, and provide this information
either for debugging or even for more practical reasons.

I wanted to find out if there was interest in this kind of thing since
I have developed a variety of different useful and not so useful extensions,
originally for embedded use but I want to upstream them for general use,
I want to know what the consensus is on 1) This particular extension
2). Further extensions that were originally developed for embedded use
but have turned out to be useful in a lot of contexts where you are willing
to sacrifice portability (now with `__has_builtin` this is extremely easy to
test for and fallback on something else).

On the other hand it's a lot to do overall so I would prefer to get a consensus
whether each feature (even this small one) is worth cleaning up and putting
up for code review, since I understand that something like builtin bloat
or non portability may be of concern. As for the formal name I'd like to call
it extensions for Embedded C++ 2 gating it behind an opt-in flag such as
`-fecxx2-extensions`.

Other things involve limited reflection retrieving the bare minimum that's
needed, a llvm::formatv style formatter accelerator, getting names of record
scope at different levels with 0 being similar to the desired __CLASSNAME__
macro (at least by some).

Looking forward to any feedback whether positive or negative.
Thank you for your time.
- Kristina

[+Chandler]

Hi, Kristina, can you provide a couple of examples of what this would
print for various cases?

-Hal

I put together a very very minimal example of what it produces, (the
prefix will be `__builtin` instead of `__os`, this is just an example
built with a toolchain where this was just an extension in a personal
fork of Clang).

WIth ELF target, all strings end up in `.rodata`. Tiny test case below,
includes omitted for brevity.

Minimal Test Case
^^^^^^^^^^^^^^^^^

class MyType
{
public:
  template <typename T>
  auto CallFunc(const T& TheFun) {
    auto StrStk = TheFun();
    return new decltype(StrStk)(StrStk);
  }

  MyType()
  {
    auto LocalType = (){ return std::string{"foo"}; };
    OSLogger Log{"tiny_example"};

    Log.info(
      "\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}",
      __os_type_name((std::vector<std::string>*)nullptr),
      __os_type_name(&LocalType, 0x1),
      __os_type_name(CallFunc(LocalType), 0x1),
      __os_type_name(&Log, 0x4),
      __os_type_name(&LocalType, 0x8),
      __os_type_name(&LocalType),
      __os_type_name(this, 0x4),
      __os_type_name(this, 0)
    );
  }
};

int main(int argc, const char** argv)
{
  MyType M{};
}

Runtime
^^^^^^^

[tiny_example ][INFO ]:
  std::vector<std::string>
  (lambda at /q/src/test_os_type_name/main.cc:46:20)
  std::__1::basic_string<char>
  OSLogger
  (lambda at /q/src/test_os_type_name/main.cc:46:20)
  class (lambda at /q/src/test_os_type_name/main.cc:46:20)
  MyType
  class MyType

Hopefully that demonstrates what I had in mind. All strings are unique
within `.rodata`, there are no heavy structures that would usually be
assosciated with RTTI, in fact this is built without RTTI or exceptions.

I overused auto a bit intentionally in this case. An example from a more
complex application which uses a Postgres client in C++ and a lot of
generics can print out types like these (newlines inserted by me):

"std::__1::tuple<pg::internal::ColumnImpl<pg::internal::oid_info<pg:
:internal::tag [1043]>, PGMeta::VarChar>, pg::internal::ColumnImpl<pg::internal
::oid_info<pg::internal::tag [1043]>, PGMeta::VarChar> >"

Thank you.
- Kristina

As discussed on IRC, ISTM this would be better spelled as:
typeid().name();

The issue with that at the moment is that typeid() is an error if you build with -fno-rtti. However, there appears to be no reason why we cannot support typeid(<typename>) even with -fno-rtti. Unlike typeid(<variable>), it requires no extra data to be emitted, since there’s no possibility that dynamic dispatch is required. Therefore, similarly to how exception support still functions with -fno-rtti by emitting the explicitly required typeinfo data on demand, so too can typeid().

I note also that when .name() is the only value used, the remainder of the typeinfo data is already omitted from the output when compiling with optimizations.

It looks like supporting this would be only be a few line change in clang, since all the underlying infrastructure is there already to support the EH with -fno-rtti use-case.

If this proves useful, I could amend my in-flight paper ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1105r0.html#rtti ) to allow for typeid(type) in freestanding, while still allowing typeid(variable) to be ill formed. I would need to carefully evaluate how much of and would make sense.

If this proves useful, I could amend my in-flight paper ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1105r0.html#rtti ) to allow for typeid(type) in freestanding, while still allowing typeid(variable) to be ill formed. I would need to carefully evaluate how much of and would make sense.

If you go down this path, it would make more sense to only disallow the dynamic form of typeid(expr) (that is, when the operand is a glvalue of polymorphic class type). There’s also implementation experience of that: that’s what MSVC’s /GR- flag does.

Which TU(s) will the type_info object go in, and what will the linkage of those objects be? If you call typeid(SomeClass), and all of SomeClass’s functions are out-of-lined in another TU, will you generate a type_info object of weak_linkage in the current TU? Or will you expect the TU that has SomeClass’s functions in it to define the type_info object?

I think if you want things to work, then you will need to generate a weak linkage type_info in each TU that calls typeid(SomeClass). I’m not sure if that’s ok or not though.

Which TU(s) will the type_info object go in, and what will the linkage of those objects be? If you call typeid(SomeClass), and all of SomeClass’s functions are out-of-lined in another TU, will you generate a type_info object of weak_linkage in the current TU? Or will you expect the TU that has SomeClass’s functions in it to define the type_info object?

I think if you want things to work, then you will need to generate a weak linkage type_info in each TU that calls typeid(SomeClass). I’m not sure if that’s ok or not though.

Yes, the last. It is what we do already – only in special cases can we assume that there is typeinfo available externally (certain standard types, and when RTTI is enabled, types with an externally available vtable). In other cases, the typeinfo is already emitted where required, with weak linkage.

What happens if you…

Yes – and again, those are the exact same behaviors you’ll see if you instead test “throw ClassWithExternalVtable()” today, in GCC and Clang.

Just to be clear, I’m not trying to shoot down anything, just trying to make sure I understand what’s happening. I’m totally fine with those tradeoffs.

Hi,

Well, essentially arguments like these and changing semantics of the language is what I was
trying to avoid by having it as a builtin, and besides it's much much simpler in terms of linkage
and all the other related issues. If anything I don't see why go down the complex route and
change language semantics as well as worrying about linking and all the possible test cases
as well as having a feature test for it, a trivial builtin just requires changing a few files. And it does
seem to have genuine uses since just browsing GitHub or even looking at LLVM's own support lib,
there are hacks to do this precise thing.

And hey it's even more useful if you can easily test for whether its available or not, if a customer
does decide to use it in their code that they intend to make portable (though again, I proposed this
as a Clang extension, from which there's fairly graceful fallback mechanisms) and which is already
implemented in my work tree, I'd just have to rename a few things and put the diff up for review and
I'm fine maintaining such extension.

The typeid(,) proposal is fine but it does seem to cause additional complications which were just
discussed. With a builtin you have the option of A). Using it if available and falling back to something
like lambda + __PRETTY_FUNCTION_ B). Flat out writing non-portable code if you only want to target
Clang as a C/C++ compiler.

And even taking that into account, static reflection is never coming to C. So that's another issue that
this wouldn't address. So while I know that the proposal got shot down pretty quickly on IRC, I would
urge everyone to take a moment to reconsider, as we already have a couple of Clang-only extensions,
and that's ignoring the point that any other compiler vendor could implement the same thing with likely
very little effort if they wanted to do so.

Thanks.
- Kristina