Data structures for better hashing

Dear guys,

I am implementing Nuutila’s algorithm to find the strongly connected
components of the dependence graph of a program. The original algorithm is
linear on the number of edges in the graph. My algorithm, however, is not,
because I am having problems with data-structures.

Nuutila uses an array to check if he has visited a variable or
not; however, I am visiting llvm::Value, and I do not know how to do a
perfect hash with this data. I am using, instead, the following data-

DenseMap<Value*, int> dfs;
SmallPtrSet<Value*, 2048> inComponent;

My target programs are large. For instance, gcc’s dependence graph
give me >450,000 nodes. Which would be the data-structures that you would
recommend me to mark the visited variables?



Why? LLVM has fully generic graph algorithms that are quite efficient. Look at and some of its users to see how this works. In essence, you provide a GraphTraits specialization for whatever datastructure you have, and the SCC building logic provides the rest.

The result of the SCC iterator is that you can iterate cleanly over all the SCCs in yoru graph, bottom up. At each SCC you get a vector of nodes in the SCC.

If you have performance problems with these generic algorithms, then by all means lets discuss it here, but I don’t think the solution should initially be to implement your own algorithm… But maybe I don’t understand what you’re actually trying to do. Is your use case not served by these generic libraries?