[PATCH] Refactoring the DFA generator

Hi,

I've refactored the DFA generator in TableGen because it takes too much time to build the table of our BE and I'd like to share it.
We have 15 functional units and 13 different itineraries which, in the worst case, can produce 13! states. Fortunately, many of those states are reused :slight_smile: but it still takes up to 11min to build the entire table. This patch reduces the build time to 5min, giving a speed-up factor greater than 2.

It contains small changes:
- Transitions are stored in a set for quicker searches
- canAddInsnClass() API is split in two API's:
   - canAddInsnClass() which perform a quick verification about the possibility of having new states for a given InsnClass
   - AddInsnClass() performs the actual computation of possible states.

I've regenerated the DFA table of Hexagon and all seems to be ok.

What do you think about these changes ?

Ivan

DFAPacketizerEmitter.speedup.patch (4.99 KB)

Thank you for using a deterministic sort order for the std::set :slight_smile:

One nit: please make as much as possible private to the translation unit. Run

nm DFAPacketizerEmitter.cpp.o | awk ‘$2 == “T”’ | c++filt

to ensure that only the “emit” function is being exported.

I can’t comment on the validity of this optimization since I’m not familiar with the algorithm that this code uses.

–Sean Silva

Ivan,

Thanks for working on the DFA generator. I'll take a look at the changes in detail but from your description, I like the general nature of the modifications.

-Anshu

Hi Anshu, Sean,

Thanks for your quick feedbacks!

Sean, I ran your command and I had the following output:

$nm DFAPacketizerEmitter.o | awk ‘$2 == “T”’ | c++filt
0000000000000000 T llvm::EmitDFAPacketizer(llvm::RecordKeeper&, llvm::raw_ostream&)

which I think it’s correct.

Ivan

Hi Ivan,

The patch looks good to me. I have a couple of minor comments:

+void State::AddInsnClass(unsigned InsnClass,
Add a top level comment describing the function

+ std::map<State*, std::set<Transition*, ltTransition>, ltState> stateTransitions;
You should be able to use SmallSet here. Also, this line exceeds 80 columns.

On a related note, is the CachedTable mechanism in DFAPacketizer.h useful for your architecture? Currently the DFA generator generates one table for a given architecture. I had originally added the CachedTable mechanism since for a given compilation and subtarget, the DFA only uses the subset of the states and transitions. Using a "cache" made sense. At one point, I'd like to change the code so that it can generate one DFA for every subtarget and remove the CachedTable mechanism. Given the size of the DFA for your architecture, however, it may make sense to keep it around even if it generates separate tables for each subtarget.

Thanks
-Anshu

Hi Anshu,

Thanks for reviewing this. I added a top comment for AddInsnClass and I fixed the violation of column numbers.

I tried but SmallSet is not iterable. SmallSetPtr could be useful here but it doesn’t allow custom sorting.

I liked the cachedtable idea but I can’t tell you if it’s really useful in our case, I didn’t make any performance tests in that regard.
IMO, it wil be nice to keep it alive for performance comparisons. Given the overall performance is rather determined by transition searches on the current state, for small DFA tables may not be a win and it may still be the case for greater ones. We have a huge number of states but the number of distinct itineraries, or maximum possible transitions, remains quite small (11, it wasn’t 13, my mistake).

Ivan

DFAPacketizerEmitter.speedup.patch (5.08 KB)

Thanks for reviewing this. I added a top comment for AddInsnClass and

I fixed the violation of column numbers.

Great. Looks good to me.

> IMO, it wil be nice to keep it alive for performance comparisons. Given the overall performance
> is rather determined by transition searches on the current state, for small DFA tables may not be a win
> and it may still be the case for greater ones.

I agree; let's keep it around for now.

-Anshu

Hi Anshu,

Just in case you have forgotten this thread ;-). Is this patch ok to commit or does it not apply to trunk properly ?
I can fix it if that’s the problem.

Ivan

Hi Ivan,

Sorry, I should have been more explicit in my last email. The patch looks
good to me. Please check that it applies on trunk and go ahead and commit.

Thanks
-Anshu

Hi Anshu,

I don't have commit access. It applies correctly on trunk, I've just checked it. Could you please commit it?

Ivan

I don't have commit access. It applies correctly on trunk,

> I've just checked it. Could you please commit it?

Ivan: Sure, I'll commit the patch today.

-Anshu

Committed in r159281.

-Anshu

Thanks Anshu!

I've recently added more functional units to our Schedule.td and the generation time became painfully long. In fact, the main problem was in writeTableAndAPI(). I propose another patch to fix it:
- Fixed memory leaks.
- Transitions are attached to states.

I've regenerated the DFA table of Hexagon and everything is ok.
Please review it.

Thanks,
Ivan

DFAPacketizerEmitter.cpp.patch2 (5.18 KB)

I missed last 2 commits made by Alexey. Following his advices, I updated the patch. It should be ok now.

Ivan

DFAPacketizerEmitter.cpp.patch2 (5.24 KB)

Ivan,

Unfortunately I won't be able to review the new patch in the next few days but I plan on taking a look at it next week.

-Anshu

Hi Ivan,

I missed last 2 commits made by Alexey. Following his advices, I updated the patch. It should be ok now.
Thanks Anshu!

I've recently added more functional units to our Schedule.td and the generation time became painfully long. In fact, the main problem was in writeTableAndAPI(). I propose another patch to fix it:
- Fixed memory leaks.
- Transitions are attached to states.

I've regenerated the DFA table of Hexagon and everything is ok.
Please review it.

I had a few comments about the design change that you're introducing with the patch:

This patch adds multiple points of control for adding a transition: there's now an addTransition() for DFA and another addTransition() for State that populate different data structures. We shouldn't need both. I am okay with transitions being folded into the State class if it results in significant speedup in DFA generation. But then please remove the Transition mechanism for the DFA class (stateTransitions).

For transitions being folded into State:
> std::map<unsigned, Transition *> inputToTrans;

We don't need a map to Transition* here; we can directly map from input to stateNum. You should be able to use a LLVM data structure.

-Anshu

Hi Anshu,

Thanks again for your feedbacks.

Hi Ivan,

I missed last 2 commits made by Alexey. Following his advices, I
updated the patch. It should be ok now.
Thanks Anshu!

I've recently added more functional units to our Schedule.td and the
generation time became painfully long. In fact, the main problem was
in writeTableAndAPI(). I propose another patch to fix it:
- Fixed memory leaks.
- Transitions are attached to states.

I've regenerated the DFA table of Hexagon and everything is ok.
Please review it.

I had a few comments about the design change that you're introducing
with the patch:

This patch adds multiple points of control for adding a transition:
there's now an addTransition() for DFA and another addTransition() for
State that populate different data structures. We shouldn't need both. I
am okay with transitions being folded into the State class if it results
in significant speedup in DFA generation.

Yes, it gives a significant speedup to the emitter. My main concern is to address this:

357: for (unsigned j = 0; j <= LargestInput; ++j) {

LargestInput becomes too large. The more resources we introduce in the td file the larger it will be. Given an architecture with n resources, LargestInput will take the maximum value, i.e. 2^(n-1), but valid inputs are just a small subset of [0, LargestInput].

But then please remove the

Transition mechanism for the DFA class (stateTransitions).

Done!

For transitions being folded into State:
> std::map<unsigned, Transition *> inputToTrans;

We don't need a map to Transition* here; we can directly map from input
to stateNum. You should be able to use a LLVM data structure.

In that case,
(1) Should I completely remove Transition and create a map structure in State (input, state) to replace them?
(2) Or are you proposing to go though DFA in order look for valid transitions?

After doing some cleanup to match the new design, these are the main changes:
- Transition folded in states.
Each state keeps a set of transitions.
- Each transition contains the input to match in order to take it and the destination state 'To'.
- Factoring up redundant information.
Transitions doesn't contains 'From' state anymore because they are folded into it.
- Using STLExtras functions.
- Removed old and unused API's (e.g. addTransition in DFA)

The new patch is attached but if you prefer (2) I can remake it.

I failed to use SmallSet to store the transitions in State because I needed to iterate on it. What kind of llvm structure may I use?

Ivan

DFAPacketizerEmitter.new.patch (8.72 KB)

Hi Anshu,

Thanks again for your feedbacks.

Hi Ivan,

I missed last 2 commits made by Alexey. Following his advices, I
updated the patch. It should be ok now.
Thanks Anshu!

I've recently added more functional units to our Schedule.td and the
generation time became painfully long. In fact, the main problem was
in writeTableAndAPI(). I propose another patch to fix it:
- Fixed memory leaks.
- Transitions are attached to states.

I've regenerated the DFA table of Hexagon and everything is ok.
Please review it.

I had a few comments about the design change that you're introducing
with the patch:

This patch adds multiple points of control for adding a transition:
there's now an addTransition() for DFA and another addTransition() for
State that populate different data structures. We shouldn't need both. I
am okay with transitions being folded into the State class if it results
in significant speedup in DFA generation.

Yes, it gives a significant speedup to the emitter. My main concern is
to address this:

357: for (unsigned j = 0; j <= LargestInput; ++j) {

LargestInput becomes too large. The more resources we introduce in the
td file the larger it will be. Given an architecture with n resources,
LargestInput will take the maximum value, i.e. 2^(n-1), but valid inputs
are just a small subset of [0, LargestInput].

ERRATA: LargestInput will take some value in [2^(n-1), 2^n-1]. That's depend on the resource combinations. Then it may be even larger! :slight_smile:

Ivan,

Thanks for working on the patch. It looks good to me except for the removal of the Transition class:

> (1) Should I completely remove Transition and create a map structure in State (input, state) to replace them?

Yes, please remove the Transition class and create a map structure in State instead of TransitionSet.

Thanks
-Anshu

Ivan,

Thanks for working on the patch. It looks good to me except for the
removal of the Transition class:

> (1) Should I completely remove Transition and create a map structure
in State (input, state) to replace them?

Yes, please remove the Transition class and create a map structure in
State instead of TransitionSet.

Sure. Please, take a look at the new patch.

Ivan

DFAPacketizerEmitter.new.patch (8.2 KB)