[GlobalISel][RFC] Thoughts on MachineModulePass

Hi,

In the initial thread of the proposal for GlobalISel, I have mentioned that it may be interesting to have a kind of MachineModulePass.
Marcello mentioned this would be useful for their current pipeline.

I am interested in knowing:
1. If anyone else is interested for such concept?
2. What kind of information should we make accessible in an hypothetical MachineModule? I.e., how do you plan to use the MachineModulePass so that we make the right design decisions for the MachineModule feeding those passes?
3. Who would be willing to work on that?

Thanks,
-Quentin

Hi,

In the initial thread of the proposal for GlobalISel, I have mentioned that it may be interesting to have a kind of MachineModulePass.
Marcello mentioned this would be useful for their current pipeline.

I am interested in knowing:
1. If anyone else is interested for such concept?
2. What kind of information should we make accessible in an hypothetical MachineModule? I.e., how do you plan to use the MachineModulePass so that we make the right design decisions for the MachineModule feeding those passes?
3. Who would be willing to work on that?

Nearly perfect timing. I just wrote a grant proposal requesting funding to do just such a thing.
:slight_smile:

My research group is interested in a MachineModulePass because we are using LLVM's MachineInstr infrastructure for analyzing machine code. Specifically, we are attempting to build an infrastructure for measuring how well various defenses work against code reuse attacks. We are analyzing both data flow and control flow, and it would be handy for us to be able to analyze an entire program's assembly code (because we're looking for every last reusable instruction that an attacker could use and how those instructions can be strung together). We want to analyze after everything has been done (register allocation, instruction selection and scheduling, etc.).

At the very least, we'll be doing analysis, though it is conceivable that we would want to do transformation in the future (e.g., if we can determine that breaking certain data flows would stop an attack, we could transform the code to change the data flow).

Ethan, can you add anything more specific on what would be on our wish list?

As for resources, we're currently early enough in the project that we're not needing the inter-procedural analysis, and if we do need it, it may be quicker for us to hack something together than to enhance LLVM properly. The point of the proposal is to seek additional funding so that we could afford to do things properly instead of just hacking something together just to meet our own research needs. That said, if we makes sense to join forces, we'd certainly be open to doing that.

Regards,

John Criswell

+1

The pass, the perspective of our use case, would need to be able some kind of synchronization point in the pipeline between MF passes (such that all the MF passes before have run on all the MachineFunctions in the Module before running the MachineModule pass)
Currently the MachineFunctions are processed from the beginning to the end making it difficult to be able to do Machine level analysis without awful tricks (where we get information from multiple functions and merge them together in some way).
Immutable passes would also need to be accessible.

An important design decision would be to understand what can and cannot a MachineModulePass do to a hypotetical MachineModule (can it add MachineFunctions that are not connected to IR machine functions like a MachineFunction pass can add MachineBasicBlocks not connected to IR BasicBlocks?)

Marcello

Hi Marcello,

Can you elaborate here a bit more? Mostly what it sounds like is just a direct port of Module pass to run over MFs, similar to what AsmPrinter does today, but also being able to have some analysis passes as well?

I realize my question seems a bit vague, but I’m trying to get a better idea of what the use case is here as it sounds like something that we could have the “MF pipeline” have and use and be (possibly) a better place to stick the AsmPrinter (we’d need to figure out things like globals, but that might be it).

Thanks!

-eric

Yeah, the features needed would probably match pretty closely to what an IR Module Pass would do
It is funny that you mentioned the AsmPrinter, because that is actually where one of our hacks lives right now, as it is the last thing that happens and always executes, it can be used to collect and accumulate statistics or emit binary metadata for groups of functions when it is detected that a bunch of related functions has been already processed.

Another use for a “transformation” module pass is global allocation of resources and insertion of resource handling of these resources at machine level. If we have multiple functions then being able to do the allocation on all of them at once could potentially have an advantage over doing them one at a time without information on the others (because synchronization needs in that case to consider the worst case scenario).
A MachineModulePass could be a natural candidate to do something like that.

Having something that can modify functions, blocks and instructions would be enough for these cases, but having something more powerful (adding machinefunctions in the pipeline) maybe could also be useful.

Marcello

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of John
Criswell via llvm-dev
Sent: Friday, January 22, 2016 9:39 PM
To: Quentin Colombet <qcolombet@apple.com>; llvm-dev <llvm-
dev@lists.llvm.org>; Ethan Johnson <ethanjohnson89@gmail.com>
Subject: Re: [llvm-dev] [GlobalISel][RFC] Thoughts on MachineModulePass

> Hi,
>
> In the initial thread of the proposal for GlobalISel, I have mentioned that
it may be interesting to have a kind of MachineModulePass.
> Marcello mentioned this would be useful for their current pipeline.
>
> I am interested in knowing:
> 1. If anyone else is interested for such concept?
> 2. What kind of information should we make accessible in an hypothetical
MachineModule? I.e., how do you plan to use the MachineModulePass so that we
make the right design decisions for the MachineModule feeding those passes?
> 3. Who would be willing to work on that?

Nearly perfect timing. I just wrote a grant proposal requesting funding
to do just such a thing.
:slight_smile:

My research group is interested in a MachineModulePass because we are
using LLVM's MachineInstr infrastructure for analyzing machine code.
Specifically, we are attempting to build an infrastructure for measuring
how well various defenses work against code reuse attacks. We are
analyzing both data flow and control flow, and it would be handy for us
to be able to analyze an entire program's assembly code (because we're
looking for every last reusable instruction that an attacker could use
and how those instructions can be strung together). We want to analyze
after everything has been done (register allocation, instruction
selection and scheduling, etc.).

At the very least, we'll be doing analysis, though it is conceivable
that we would want to do transformation in the future (e.g., if we can
determine that breaking certain data flows would stop an attack, we
could transform the code to change the data flow).

Ethan, can you add anything more specific on what would be on our wish list?

The main thing that comes to mind is that it would be useful to have access to call graph information at the machine code level. Since this is already being tracked at the IR level, a lot of that information could probably be "inherited" by a MachineModule during code generation.

In particular, taking advantage of LLVM's "inside knowledge" about the semantics of call instructions would be helpful in identifying the targets of indirect calls. Right now, the only way to determine the target of a machine-level call is to look at the instruction's operands, and if any of them refer to globals, try to dyn_cast them to a Function. This, of course, only works for direct calls. For indirect calls, the best we could do is try to use data-flow analysis to determine what's in the pointer being called, and attempt to match that to the known address of a MachineFunction. As I understand it (and please correct me if I'm wrong), the existing IR-level call graph analysis already "knows" where the function pointer came from (unless the code is calling a "wild" function pointer created through an unsafe cast, but that's another story).

Ethan Johnson

Hi,

In the initial thread of the proposal for GlobalISel, I have mentioned that it may be interesting to have a kind of MachineModulePass.
Marcello mentioned this would be useful for their current pipeline.

I am interested in knowing:
1. If anyone else is interested for such concept?
2. What kind of information should we make accessible in an hypothetical MachineModule? I.e., how do you plan to use the MachineModulePass so that we make the right design decisions for the MachineModule feeding those passes?
3. Who would be willing to work on that?

This could be interested from a GC perspective. The current RS4GC pass which exists as an IR to IR transform would be (potentially) nice to push back into the backend. It's inherently a module level pass, so having an MI layer equivalent could be useful. Note that I'm not saying "required" or even "definitely useful"; I suspect we can find a practical engineering compromise today by splitting RS4GC into two or more pieces that run at different times.

Also, just an FYI, we already have Module level analysis passes over MI, but only the immutable variety. One of those is used by the gc.root infrastructure. See GCModuleInfo.

Hi,

In the initial thread of the proposal for GlobalISel, I have mentioned
that it may be interesting to have a kind of MachineModulePass.
Marcello mentioned this would be useful for their current pipeline.

I am interested in knowing:
1. If anyone else is interested for such concept?
2. What kind of information should we make accessible in an hypothetical
MachineModule? I.e., how do you plan to use the MachineModulePass so that
we make the right design decisions for the MachineModule feeding those
passes?

In an LTO context with most functions internalized, aggressively changing
calling conventions in a fine-grained way would be interesting (or, in
other words, doing some amount of interprocedural register allocation).

-- Sean Silva

I think it would be useful to be able to modify the module after the machine instructions have been selected. I don't have any specific examples that would demonstrate the need for it, but being able to create functions/globals from the MI level may come in handy.

-Krzysztof