Circular library dependencies

Trying to look at ways to reduce the link time of LLDB, I wrote a small python script last night whose purpose is to get a rough idea of what LLDB’s dependency graph looks like. A lot of this is CMake specific, but the idea isn’t much different either way.

It does this by walking the source and incldue folder. For each folder entered, it determines what .a this compiles to by looking for the corresponding CMakeLists.txt and doing a rudimentary parse.

Then, for each .cpp and .h file found, scan the file for #include statements, and map the include file to the corresponding location under the source tree, using this to determine which .a file the #include is a primary member of. For each of these found, add that .a file to the list of dependencies for the containing .a.

Basically what I found is that every folder more or less depends on every other folder, which is pretty unfortunate.

I have some ideas about how to make some improvements, but I want to see what people think first. Aside from the obvious benefits of just making the code be better layered and more separable, it also would reduce link times quite a bit I think. And there are lots of cases such as lldb-server or unit tests where we want to link in as little as possible, as opposed to the monolithic LLDB executable which wants to link in pretty much everything.

Thoughts?

On MacOSX the linking process is quick, I am not sure how fast linking is on other platforms.

I really want the code organized by functionality not optimized for linking. Everything in LLDB links against just about everything else and there should be no limits imposed on what can link with what.

Linking static libraries is only for tools that want to use the dangerous internals of LLDB. Really most tools should link against our public API. I know that lldb-server is one of the tools that needs the internals and it should stay that way.

I would rather not see any changes/reorganization going on to optimize linking unless it makes a lot of sense organizationally and I think we have decent organization in our source tree.

Greg

I agree that it should be organized by functionality, but I think it’s possible to organize it by functionality in a way that better linking falls out naturally as a consequence. Also I’m not really talking about a major restructuring of every project, but just hitting a few key points. A good example is how source/lldb has 2 random cpp files dropped in, lldb-log.cpp and lldb.cpp. The way the CMake build works is that we treat all source files in a particular directory (and sometimes in sub-directories) as belonging to the same target, and each target is compiled to a .a file (or .lib file on Windows). So source/lldb.cpp and source/lldb-log.cpp get put into something I’ve called lldbBase.a / .lib. Now, since everything in the codebase depends on logging, we have to link everything against lldbBase, which also means lldb.o. And lldb.o depends on pretty much everything in the entire project.

The way this came up in context is that our unit test runner builds multiple executables, one for each component being tested. This makes the link time scale really horribly, because it’s having to look at every .lib in all of LLDB just to link against one class.

In any case, the organization also doesn’t make much sense. You can’t use either source/lldb-log.cpp or source/Core/Log.cpp without the other, so seems to me they should both be in source/Core. and lldb.cpp is kind of its own thing that is only necessary for global LLDB initialization, so it could be off by itself in like source/Initialize or something.

So that’s one example of something I wanted to fix. So to sum up, I agree we shouldn’t do anything that’s purely an optimization and has no benefit to code organization, but I think there’s some places where the organization could improve and the rest would happen naturally.

I agree that it should be organized by functionality, but I think it's possible to organize it by functionality in a way that better linking falls out naturally as a consequence. Also I'm not really talking about a major restructuring of every project, but just hitting a few key points. A good example is how source/lldb has 2 random cpp files dropped in, lldb-log.cpp and lldb.cpp. The way the CMake build works is that we treat all source files in a particular directory (and sometimes in sub-directories) as belonging to the same target, and each target is compiled to a .a file (or .lib file on Windows). So source/lldb.cpp and source/lldb-log.cpp get put into something I've called lldbBase.a / .lib. Now, since *everything* in the codebase depends on logging, we have to link everything against lldbBase, which also means lldb.o. And lldb.o depends on pretty much everything in the entire project.

Yes, this kind of change to better serve make/cmake is fine.

The way this came up in context is that our unit test runner builds multiple executables, one for each component being tested. This makes the link time scale really horribly, because it's having to look at every .lib in all of LLDB just to link against one class.

You could make a DLL that contains all of the LLDB internals so that you don't have to relink stuff over and over again.

In any case, the organization also doesn't make much sense. You can't use either source/lldb-log.cpp or source/Core/Log.cpp without the other, so seems to me they should both be in source/Core.

Again, this kind of change is fine.

and lldb.cpp is kind of its own thing that is only necessary for global LLDB initialization, so it could be off by itself in like source/Initialize or something.

Fine with that too.

So that's one example of something I wanted to fix. So to sum up, I agree we shouldn't do anything that's purely an optimization and has no benefit to code organization, but I think there's some places where the organization could improve and the rest would happen naturally.

I agree. But I would try and fix your rest runner to build against an lldb-internal.dll so you don't have to statically link tons of binaries over and over and pay a large cost, just make sure we don't expose the lldb-internal.dll to the outside world. We don't want people shipping tools that link against our internal C++ APIs, but is is quite OK for internal tests.

Greg

Sadly DLLs don’t work quite the same on windows as they do on other platforms. On Windows a symbol is not exported unless you specifically tag the symbol with a compiler attribute. So being able to use a class this way from a test would require exporting it and everything it exposes from its public interface, so it will quickly grow out of control tagging every function and class this way. See the class definitions on the public API for examples of where we do this.

Another option is to make one unit test executable so at least we don’t have to pay for the link multiple times. That seems like the best compromise if we can’t figure out a better idea.

Sadly DLLs don't work quite the same on windows as they do on other platforms. On Windows a symbol is not exported unless you specifically tag the symbol with a compiler attribute. So being able to use a class this way from a test would require exporting it and everything it exposes from its public interface, so it will quickly grow out of control tagging every function and class this way. See the class definitions on the public API for examples of where we do this.

Yeah, we don't want to pollute all private classes with preprocessor macros to allow the classes to be exported under certain circumstances as we don't want people seeing this and thinking it is ok to export anything from the private layer.

Another option is to make one unit test executable so at least we don't have to pay for the link multiple times. That seems like the best compromise if we can't figure out a better idea.

Agreed.