Clang Plugin Parallel Compilation Data Output Consolidation

Dear all,

Not sure if this is the right mailing list for my question but I hope you will correct me if it’s not.

I am developing a little Clang Plugin that compiles various data about a legacy codebase during the compilation process. Unfortunately, compilation of the legacy code is slow so it is done using many processes (-j24-144+.)

My problem is as follows: say I want to enumerate all the functions encountered in the codebase. Further say clang process 1 is processing a file that includes header 1.h and process 2 processes a different file that also includes header 1.h. The header includes an inline function.

When my plugin writes the function, when visited, into an output file, I end up having the same entry twice: once written by process 1 and once by process 2. What’s the standard way to eliminate this sort of duplication? Is there a clang functionality I can use or do I have to mess with shared memory or something like that? I can’t just have a map of “functions already recorded” because each process starts its own Clang plugin instance.

So far, I dealt with this by removing duplicate entries in my output file after running the plugin. But the file grows to sizes of multiple Terabytes now and this method is no longer sustainable.

If you have any advice or know the best place to ask I would really appreciate it.

Thank you so much,