Extracting libmachine from libcodegen (bug 1121)

Hi,

One of the long-standing code clean-up bugs in Bugzilla is to extract
the Machine* code from the CodeGen library into a separate one, on
which CodeGen depends (
http://llvm.org/bugs/show_bug.cgi?id=1121).

I'd like to start working on this. The general approach I'm planning to take is:

1. Identify which code to move.
2. Eliminate all dependencies that the Machine code has on the CodeGen
code (details to be fleshed out on llvmdev as they are encountered).
3. Create a library project for the Machine code
4. Move the code to the new library.

Do you have any concerns about the feasibility this approach?

As to step 1, I haven't dug too deeply into the problem yet, but I'm
starting from the assumption that we'll be moving only the modules
that start with 'Machine' into the new library. These are:

- MachineBasicBlock
- MachineBlockFrequencyInfo
- MachineBlockPlacement
- MachineBranchProbabilityInfo
- MachineCodeEmitter
- MachineCopyPropagation
- MachineCSE
- MachineDominators
- MachineFunctionAnalysis
- MachineFunction
- MachineFunctionPass
- MachineFunctionPrinterPass
- MachineInstrBundle
- MachineInstr
- MachineLICM
- MachineLoopInfo
- MachineModuleInfo
- MachineModuleInfoImpls
- MachinePassRegistry
- MachinePostDominators
- MachineRegisterInfo
- MachineScheduler
- MachineSink
- MachineSSAUpdater
- MachineTraceMetrics
- MachineVerifier

Are there any files in this list that should not be moved? Any others
that should be added? Any suggestions on which of these modules would
be a good place to start?

One question of procedure... Back in 2010 and 2011, I had commit
rights on the LLVM svn repository because of work I was doing on
Clang. I'm not sure whether my account is still active, but if you'd
like I re-qualify myself as a contributor by submitting patches to
llvm-commits instead of making changes directly to the repository,
just let me know.

-Ken

You should still be good to go, with commit access to all repos. Particularly for this sort of change, you should get buy in from the community on the general approach, and have patches reviewed.

-Chris

Hi Ken,

This is awesome! It's a long needed refactoring. One place I suggest you start is by reading, if you haven't already, the recent global instruction selection and machinefunction serialization threads. This refactoring is a prerequisite to a lot of that, so there's some related discussion there you may find useful.

-Jim

I'd like to suggest something for #1: documenting what's already there. The
doxygen documentation is ok-ish, but there really needs to be real,
long-form documentation in our Sphinx docs (i.e. docs/) about what
functionality is there and especially what role it plays in the compiler
(there's probably more than one document's worth to write about). The
Sphinx quickstart template should have most of the info you need to get up
and running <http://llvm.org/docs/SphinxQuickstartTemplate.html&gt;, but don't
hesitate to email llvmdev and CC me personally if you have any other
questions about the Sphinx docs.

AFAIK the only documentation we have about the Machine* code is the short
blurbs about a couple classes in <
http://llvm.org/docs/CodeGenerator.html#machine-code-description-classes&gt;\.

Especially the serialization work will benefit greatly from something akin
to a "MachineLangRef".

-- Sean Silva

Hi,

One of the long-standing code clean-up bugs in Bugzilla is to extract
the Machine* code from the CodeGen library into a separate one, on
which CodeGen depends (
http://llvm.org/bugs/show_bug.cgi?id=1121).

I'd like to start working on this. The general approach I'm planning to take is:

1. Identify which code to move.
2. Eliminate all dependencies that the Machine code has on the CodeGen
code (details to be fleshed out on llvmdev as they are encountered).
3. Create a library project for the Machine code
4. Move the code to the new library.

Do you have any concerns about the feasibility this approach?

As to step 1, I haven't dug too deeply into the problem yet, but I'm
starting from the assumption that we'll be moving only the modules
that start with 'Machine' into the new library. These are:

- MachineBasicBlock
- MachineBlockFrequencyInfo
- MachineBlockPlacement
- MachineBranchProbabilityInfo
- MachineCodeEmitter
- MachineCopyPropagation
- MachineCSE
- MachineDominators
- MachineFunctionAnalysis
- MachineFunction
- MachineFunctionPass
- MachineFunctionPrinterPass
- MachineInstrBundle
- MachineInstr
- MachineLICM
- MachineLoopInfo
- MachineModuleInfo
- MachineModuleInfoImpls
- MachinePassRegistry
- MachinePostDominators
- MachineRegisterInfo
- MachineScheduler
- MachineSink
- MachineSSAUpdater
- MachineTraceMetrics
- MachineVerifier

I *think* the goal here is to minimize the amount of code that gets linked into certain machine-code level tools.

If that is the goal, then you only want the modules for Machine IR, and maybe some core analysis passes. The “Machine” modules you listed above include machine code analysis or transform passes that you probably don’t want. Pruning the list to basic IR support:

- MachineBasicBlock
- MachineBranchProbabilityInfo
- MachineCodeEmitter
- MachineDominators
- MachineFunctionAnalysis
- MachineFunction
- MachineFunctionPass
- MachineFunctionPrinterPass
- MachineInstrBundle
- MachineInstr
- MachineLoopInfo
- MachineModuleInfo
- MachineModuleInfoImpls
- MachinePassRegistry
- MachinePostDominators
- MachineRegisterInfo
- MachineSSAUpdater
- MachineVerifier

Note that Passes.h would need to be provided in include/llvm/Machine so that targets can configure their pass pipeline. The pass ID’s are extern declared for use with TargetPassConfig, but are defined as references to the ID field inside the pass. Those ID’s might all need to live inside libMachine if you do this. To get the layering right, there would need to be two libraries in each target, lib<target>Machine for the target description, and lib<target>Codegen for the pases--but we clearly don’t want this, hence the Pass ID workaround.

-Andy

If that is the goal, then you only want the modules for Machine IR, and maybe some core analysis passes. The “Machine” modules you listed above include machine code analysis or transform passes that you probably don’t want. Pruning the list to basic IR support:

- MachineBasicBlock
- MachineBranchProbabilityInfo
- MachineCodeEmitter
- MachineDominators
- MachineFunctionAnalysis
- MachineFunction
- MachineFunctionPass
- MachineFunctionPrinterPass
- MachineInstrBundle
- MachineInstr
- MachineLoopInfo
- MachineModuleInfo
- MachineModuleInfoImpls
- MachinePassRegistry
- MachinePostDominators
- MachineRegisterInfo
- MachineSSAUpdater
- MachineVerifier

Thanks for whittling the list down, Andrew.

One question that has been nagging at me since I started looking into
this is what to do about the dependency on libAnalysis? Some of these
classes (eg. MachineFunctionPass are MachineLoopInfo) seem to have a
hard dependency on the Pass class in libAnalysis. Is it acceptable to
carry this dependency along to the new library? Or should we be aiming
to eliminate this dependency like we are with the one on libCodeGen?

Note that Passes.h would need to be provided in include/llvm/Machine so
that targets can configure their pass pipeline. The pass ID’s are extern
declared for use with TargetPassConfig, but are defined as references to
the ID field inside the pass. Those ID’s might all need to live inside
libMachine if you do this. To get the layering right, there would need to be
two libraries in each target, lib<target>Machine for the target description,
and lib<target>Codegen for the pases--but we clearly don’t want this,
hence the Pass ID workaround.

Thanks for pointing those out.

-Ken

Hi,
    I did the Pass ID workaround by providing Passes.h in the
include/llvm/Machine. But the question which is holding me up is that, as
Ken Dyck was kind enough to point out, what should be done to the dependency
on libAnalysis? The Machine passes are not only dependent on Passes class
but also on others such as the ConstantFolding.h. Is it a possible solution
to break this dependency or carry this dependency to the Machine library as
well?

Regards,
Nitish B.

Ideally, libLLVMMachine should not depend on libLLVMAnalysis. I can understand if you need to do that as an intermediate step. ConstantFolding is not a good reason to do that though. What do people think about moving ConstantFolding into libLLVMCore? Makes sense to me.

-Andy