Using TBAA type descriptor's name as hints for optimizations

Hello all,

There was a discussion ( about whether using TBAA type descriptor metadata’s type name as hints for optimizations is okay.

The patch is using the type name to determine whether converting memcpy to load/store of integer is beneficial or not.
If the memcpy was copying a struct containing a pointer member, this becomes the source of a lot of inttoptr casts because the later store forwarding needs them.
The introduced inttoptr casts are later removed by cast elimination (inttoptr(ptrtoint p) -> p), but there are issues (e.g. which is deeply related with its validity.

A suggested clarification to LangRef is made at as well -
feel free to leave comments here and there.



I think it's clear that using TBAA or anything else for *hinting* optimisation is fine. The discussion in the review was rooted in two problems:

  - In some cases the transform is not valid. `memcpy` must be a type-oblivious copy. On some platforms (e.g. CHERI, including Arm's Morello, and with some language VMs), turning these into a typed load-store is not valid. We lose tracking of pointers in these environments. It is *always* valid to delete metadata and an optimisation may not become unsound if metadata is elided.

  - The transform was introducing things that made later analyses worse and so was not improving the optimisation pipeline overall.

Of these, the first is more significant. If a transform is valid in the absence of metadata and generates better code in the presence of metadata, this is fine (and follows from the description of metadata in the LangRef today). If an optimisation becomes unsound when metadata is omitted, this is categorically incorrect.

I don't object to the clarification in the LangRef, but all of this follows from the general rules about metadata:

  - Removing all metadata from a module should not cause miscompilations.

  - Adding metadata may alter code generation in any way that does not violate the extra semantics defined by the metadata.


Random comment, but sometimes clang or other frontends will *generate* a memcpy due to source language semantics (e.g. copying a large struct), not just as a result of a user writing a call to memcpy in C. It seems useful to attach type metadata to these at least.