global control flow graph at machine code level

Hi all,
I am trying to build a global control flow graph at machine code level. Essentially, I need the handles to the MachineFunction’s corresponding to every call site inside a MachineFunction in order to get the handles to MachineBasicBlock’s with return statements inside the callee. Currently, the codegen module processes one MachineFunction at a time and hence I can’t find a way to gather information about other MachineFunction’s and their MachineBasicBlock’s.

It seems like I may have to modify the way MachineFunction is instantiated in MachineFunctionAnalysis. Instead of doing it per Function, it may have to be done for the entire Module by instantiating MachineFunction objects for every Function inside the Module. This might require major changes to the PassManager framework as well. Is there some work in this direction and code that someone can share? Or an alternative solution?

Hi Abhishek,

It seems like I may have to modify the way MachineFunction is instantiated in MachineFunctionAnalysis. Instead of doing it per Function, it may have to be done for the entire Module by instantiating MachineFunction objects for every Function inside the Module. This might require major changes to the PassManager framework as well. Is there some work in this direction and code that someone can share? Or an alternative solution?

yes, the MachineFunctionAnalysis creates the MachineFunctions. unfortunately
a MachineFunction is destroyed along the MachineFunctionAnalysis that created
it. this happens for instance when you schedule a module pass (where you could
operate on the global control-flow) somewhere during code generation.

A workaround could be to modify the MachineFunctionAnalysis such that it
stores the MachineFunction in a look-up table inside the MachineModuleInfo
instead of destroying it before your module pass is run. once your module
pass is finished, a new MachineFunctionAnalysis is scheduled by the pass
manager. Now, instead of creating a new function, you could check the look-up
table of the MachineModuleInfo to get the original MachineFunction.

it does not appear to be a very complicated change, but there might be some
dirty details that could make this approach hard to implement, e.g., information
stored with the MachineFunctionAnalysis itself. you could move this information
to a new class and see the MachineFunctionAnalysis as a wrapper to this class.

a nice property of this solution is that code generation still proceeds on a
per-function-basis, unless you explicitly insert a module pass.

another problem is, that there is little (or no) support to construct the
global control-flow from the machine code. for instance, the call graph is
based on LLVM-IR. depending on the target architecture, you might have call
sites in the machine code that were not visible in the LLVM-IR, e.g., when you
implement floating point operations using library calls. so there might be some
extra work needed to get this infrastructure too.

I have not yet worked on this yet, but I plan to implement something like this
at some point sooner or later. if you find another (better) solution let me
know.

best,
Florian

Florian Brandner <flbr <at> imm.dtu.dk> writes:

Hi Abhishek,
> It seems like I may have to modify the way MachineFunction is instantiated

in MachineFunctionAnalysis.

Instead of doing it per Function, it may have to be done for the entire Module

by instantiating

MachineFunction objects for every Function inside the Module. This might

require major changes to the

PassManager framework as well. Is there some work in this direction and code

that someone can share? Or an

alternative solution?

yes, the MachineFunctionAnalysis creates the MachineFunctions. unfortunately
a MachineFunction is destroyed along the MachineFunctionAnalysis that created
it. this happens for instance when you schedule a module pass (where you could
operate on the global control-flow) somewhere during code generation.

A workaround could be to modify the MachineFunctionAnalysis such that it
stores the MachineFunction in a look-up table inside the MachineModuleInfo
instead of destroying it before your module pass is run. once your module
pass is finished, a new MachineFunctionAnalysis is scheduled by the pass
manager. Now, instead of creating a new function, you could check the look-up
table of the MachineModuleInfo to get the original MachineFunction.

it does not appear to be a very complicated change, but there might be some
dirty details that could make this approach hard to implement, e.g.,

information

stored with the MachineFunctionAnalysis itself. you could move this

information

to a new class and see the MachineFunctionAnalysis as a wrapper to this class.

a nice property of this solution is that code generation still proceeds on a
per-function-basis, unless you explicitly insert a module pass.

another problem is, that there is little (or no) support to construct the
global control-flow from the machine code. for instance, the call graph is
based on LLVM-IR. depending on the target architecture, you might have call
sites in the machine code that were not visible in the LLVM-IR, e.g., when you
implement floating point operations using library calls. so there might be

some

extra work needed to get this infrastructure too.

I have not yet worked on this yet, but I plan to implement something like this
at some point sooner or later. if you find another (better) solution let me
know.

I implemented a workaround solution for now but I think eventually I might go
with your solution. My solution generated the global control flow graph (GCFG)
in 2 passes (llc needs to be run twice). In the first pass, I printed the
necessary information to a file, like the Machine Function (MF) Name, it's ID
number, the IDs of MBBs inside the MF, the Call Sites inside the MF in the order
in which they are called from the respective MBBs, the predecessors and
successors of MBBs and whether a MBB has a return statement.
   In the second pass which should be immediately after the 1st pass in the
CodeGen flow (otherwise MFs and MBBs can get updated by the CodeGen flow in
between), I read back the file. This again is done in 2 passes. The first pass
parses the file and populates the skeleton of GCFG data structure with the MF
Nodes and MBB nodes inside those MF nodes (note that each Call Site in an MBB
splits the MBB into two). Here the nodes are identified by IDs and a DenseMap is
maintained to establish the MF name to ID correlation. The second pass parses
the file again and attaches the predecessor, and successor links to the MBB
nodes.

Hi Abhishek,

I finally got some time to try the approach I proposed in July in our compiler
(see https://github.com/t-crest/patmos-llvm). I extended the MachineModuleInfo
and MachineFunctionAnalysis machinery to support module passes at the
codegen-level. So far, it is tested using a simple pass that constructs the call
graph at the machine-level. It appears to work, i.e., the generated .s files
are identical and all our tests run correctly (i.e., MiBench).

I tried to avoid touching the LLVM core-infrastructure as much as possible, so
this patch is intended as a starting point for those who would like to do
something similar (or for a discussion on how to integrate this properly into
mainline). The attached patch contains the LLVM-specific bits only.

Best,
Florian

codegen-modulepasses.diff (4.31 KB)