Hi,
Thank you all for keeping this going. Indeed I was not aware that the discussion was going on, I am really sorry for this late reply.
Nice to hear you again! Thank you for starting this thread
I understand Chris' point about metadata design. Either the metadata becomes stale or removed (if we do not teach transformations to preserve it), or we end up modifying many (if not all) transformations to keep the data intact.
Currently in the IR, I feel like the default behavior is to ignore/remove the metadata, and only a limited number of transformations know how to maintain and update it, which is a best-effort approach.
That being said, my initial thought was to adopt this approach to the MIR, so that we can at least have a minimal mechanism to communicate additional information to various transformations, or even dump it to the asm/object file.
In other words, it is the responsibility of the users who introduce/use the metadata in the MIR to teach the transformations they selected how to preserve their metadata. A common API to abstract this would definitely help, just as combineMetadata() from lib/Transforms/Utils/Local.cpp does.
Unfortunately, I never worked with the LLVM-IR Metadata (I almost focused on the back-end
and I just scratched the LLVM's middle-end), but I see your point.
Clearly, applying the needed modifications to all the back-end transformations/optimizations
is unfeasible and, probably, not worth it -- different users may have different requirements/needs
regarding a specific pass.
I like the idea of a common API to handle the MIR metadata, and let the end user handle
such data. Of course, if the community encounters common cases while handling the metadata, such
cases may be integrated with the upstream project.
Nonetheless, the main point of this thread is to preserve middle-end metadata down to the
back-end, right after the Instruction Selection phase. Hence, despite the need of the end user, a
"preserve-all" policy during the lowering stage is required, which will involve a bit of changes,
in particular in the DAGCombine pass.
As for my use case, it is also security-related. However, I do not consider the metadata to be a compilation "correctness" criteria: metadata, by definition (from the LLVM IR), can be safely removed without affecting the program's correctness.
If possible, I would like to have more details on Lorenzo's use case in order to see how metadata would interfere with program's correctness.
I would really like to discuss here the details, but, unfortunately, I am working on a publication
and, thus, I cannot disclose any detail here
However, with "correctness" I do not refer to "I/O correctness", but the preservation of a
security property expressed in the front-end (e.g., specified in the source-code) or in the
middle-end (e.g., specified in the LLVM-IR, for instance by a transformation pass).
From a security point-of-view, removing or altering metadata does not interfere with the I/O
functionality of the code (although may impact on the performances), but may introduce
vulnerabilities.
As for the RFC, I can definitely try to write one, but this would be my first time doing so. But maybe it is better to start with Lorenzo's proposal, as you have already been working on this? Please tell me if you prefer me to start the RFC though.
It is the first time for me too, do not worry!
We could just use any other RFC as a template to get started
I think that a structure like the following would be fine:
1. Background
1.1 Motivation
1.2 Use-cases
1.3 Other approaches
2. Goal(s)
3. Requirements
4. Drawbacks and main bottlenecks
5. Design sketch
6. Roadmap sketch
7. Potential future development
It may be a bit overkill; you are warmly invited to cut/refine these points!
And...no, I still have no sketch of the RFC; sorry, I had a bit of workload in these
days.
Yes, you can start the write up of the RFC.
Quoting David:
"Since you first raised the topic [...] I want to give you right of first refusal."
Have a nice day!
-- Lorenzo