I am a security researcher MIT Lincoln Laboratory, and I work with compartmentalizing software. My team and I have created an out-of-tree LLVM pass that is used for analysis and adding compartment enforcement (see our paper here). We are interested in upstreaming our pass, but before we put resources into getting our pass up to your high standards, we are unsure if our approach is acceptable to the other LLVM developers, and we would like your input.
We plan on requiring a configuration file as input when using our pass. This is for several reasons. Currently, we exclusively target the Linux kernel, and our pass is tailored to that, including hard-coded source file and function names, and architecture-specific functionality. Encoding this information in a configuration file would allow us to support compartmentalizing general applications. More fundamentally, though, compartmentalization involves enforcing a compartmentalization policy, which is different for each application, so a configuration file is needed to define that policy per application.
My question is, is a pass that requires an input file acceptable?
What would be the exact contents of the input? Just source file names and function names?
I have seen passing a config file to a pass done in a downstream LLVM, so I think it would be reasonable to allow this.
Could you ask this question in the “LLVM Project” category, and put the issue (i.e. config file for a pass) in the title? That should attract more attention, I’m not sure how many people will see this RFC.
In principle, I think your proposal sounds reasonable, but we typically prefer to know more details about potential new passes in an RFC.
If there is a prototype implementation available, that folks can reference I’d suggest linking to that directly.
The use of a config file is probably fine. Doubly so if it can re use a format that we already support, like the SanitizerSpecialCase files.
That said, if you’re serious about upstreaming, I’d suggest fleshing out the RFC with more detail. I imagine a major concern will be complexity and what additional burdens this pass poses for maintenance. IMO you don’t need an RFC on the input file, just one for your proposed pass.
I agree that you’re likely to more attention there.
What would be the exact contents of the input? Just source file names and function names?
Right now, our pass performs two tasks: 1) analysis of symbols defined and used in compilation units for determining compartment boundaries after all source is compiled; and 2) consumption of a compartment policy to instrument the code to enforce the policy. Each task would use different configurations, with some intersection. For example, the compartment policy contains the definition of compartments, and a mapping of every symbol to a single compartment. This file would be used in task 2, but not task 1. However, we might want to exclude a few specific global variables from analysis or compartment enforcement, and the list of those global variables would be used in both tasks.
We do not have a full implementation of our configurable pass yet, so the full definition is unknown. As I said, we currently only target the Linux kernel, and we use object subclassing to hold all the relevant information. But those subclasses also overload methods that change functionality based on architecture, so that would also need to be figured out and could influence the final configuration spec.