Layering Requirements in the LLVM Coding Style Guide

Context: I’ve been looking at experimenting with using Modular Code Generation (My talk at last year’s LLVM dev meeting https://www.youtube.com/watch?v=lYYxDXgbUZ0 is about the best reference at the moment) when building the LLVM project, as a good experiment for the feature. This can/does enforce a stronger layering invariant than LLVM has historically been enforced. So I’m curious to get buy-in and maybe document this if it’s something people like the idea of.

I’m starting this discussion here rather than in an actual code review on llvm-commits since it seems like it could do with a bit of a wider discussion, but once/if the general direction is agreed on, I’ll send a patch for review of specific wording for the LLVM Coding Standards.

Currently the LLVM Coding Standards doesn’t say much/anything about layering. ‘A Public Header File is a Module’ section talks about modules of functionality, mostly trying to describe why a header file should be self contained - but uses anachronistic language about modules that doesn’t line up with the implicit or explicit modules concepts in use today, I think.

I propose making this wording a bit more explicit, including:

  1. Headers should be standalone (include all their dependencies - this is mentioned in the “is a Module” piece, by way of a technique to help ensure this, but not explicit as a goal itself).

  2. Files intended to be included in a particular context (that aren’t safe/benign to include multiple times, in multiple .cpp files, etc) should use a ‘.inc’ or ‘.def’ (.def specifically for those “define a macro, include the header which will reference that macro” style setups we have in a few places).

And the actual layering issue:
3) Each library should only include headers or otherwise reference entities from libraries it depends on. Including in headers and inline functions. A simple/explicit way to put this: every inline function should be able to be moved into a .cpp file and the build (with a unix linker - one that cannot handle circular library dependencies) should still succeed.

This last point is the most interesting - and I hope one that people generally find desirable, so it might not be immediately obvious why it may be contentious or difficult:

LLVM violates this constraint by using inline functions in headers to avoid certain layering constraints that might otherwise cause the build to fail. A couple of major examples I’ve hit are:

TargetSelect.h and similar: This one’s especially tricky - the header is part of libSupport, but each function in here depends on a different subset of targets (creating a circular dependency) - to call the given function the programmer needs to choose the right dependencies to link to or the program will not link.
Clang Diagnostics (work in progress): The diagnostics for each component are in their own component directories, but are then all included from libClangBasic, a library none of those components depends on. (so this isn’t so much an inlining case as #include based circular dependency)

Generally I’d like to get buy-in that stricter layering is desirable, and that these few cases are at least sub-optimal, if accepted for now.

Happy to go into more details about any of this, examples, etc, but I realize this is already a bit long.

  • Dave

I have found layering to be a particularly useful and beneficial model in past large software projects.

Is LLVM’s layering actually written down anywhere? Last time I went looking, there was nothing. If there’s no spec, there’s no verifiable conformance; you have to guess based on what other files do.

–paulr

I have found layering to be a particularly useful and beneficial model in past large software projects.

Is LLVM’s layering actually written down anywhere? Last time I went looking, there was nothing. If there’s no spec, there’s no verifiable conformance; you have to guess based on what other files do.

Fair point - Google’s build system is pretty specific about this & so we’ve got it codified there, and the open source build system has to know some of this to get the link order right - otherwise LLVM programs couldn’t successfully link (if the libraries weren’t placed in the right order on the link command)

I think the the LLVMBuild.txt files contain the library dependency lists for the CMake build.

  • Dave

I would describe it from this angle: LLVM is layered just fine. Usually the layering is enforced as we don't link all libraries to all targets and you will notice missing symbols if you violate it. It just happens that you can violate the layering with header-only implementations of features that are not catched this way and sure enough we a handful of cases that violate the layering this way as David nicely explained here.

I don't think there is a reason not to fix those layering violations. We just need a plan on how to fix them.

- Matthias

I would describe it from this angle: LLVM is layered just fine.

Yeah, in most cases/in general I agree.

Usually the layering is enforced as we don’t link all libraries to all targets and you will notice missing symbols if you violate it.

Actually even more than that - on unix, linkers don’t resolve circular dependencies (they start with a list of unresolved symbols and walk the link line left to right - resolving any symbols they can iteratively while looking at one library, then moving on to the next - never going back to an earlier library to resolve a later dependency), this enforces quite a lot of the layering constraints we usually think about. Google’s build system enforces things a bit more strictly, I think (even without modular codegen) & that’s generally been a good/accepted thing - though I don’t have any precise examples to point to, unfortunately.

It just happens that you can violate the layering with header-only implementations of features that are not catched this way and sure enough we a handful of cases that violate the layering this way as David nicely explained here.

I don’t think there is a reason not to fix those layering violations. We just need a plan on how to fix them.

Yep - my goal is just to enshrine that understanding in documentation so it’s a bit more of an explicit/clear goal going forward.

I may be able to fix some of the existing violations - but mostly wanting to sure up things for future changes.

  • Dave

Everything up to here seems non-controversial. We should document this and ideally identify tooling suitable to enforce it. I have no strong opinion on this topic. My experience has been that it’s often far harder to unwind these types of inline dependencies than it first seems and that the value in doing so is often unclear. I’m not opposed, but I’m also not signing up to help. :slight_smile:

Unless your linker is LLD, in which case they do by default.

David

+1

cheers,
--renato

Thanks David for bringing that up.

FWIW, I think this is a totally reasonable approach and I am supportive of this.

Context: I’ve been looking at experimenting with using Modular Code Generation (My talk at last year’s LLVM dev meeting https://www.youtube.com/watch?v=lYYxDXgbUZ0 is about the best reference at the moment) when building the LLVM project, as a good experiment for the feature. This can/does enforce a stronger layering invariant than LLVM has historically been enforced. So I’m curious to get buy-in and maybe document this if it’s something people like the idea of.

I’m starting this discussion here rather than in an actual code review on llvm-commits since it seems like it could do with a bit of a wider discussion, but once/if the general direction is agreed on, I’ll send a patch for review of specific wording for the LLVM Coding Standards.

Currently the LLVM Coding Standards doesn’t say much/anything about layering. ‘A Public Header File is a Module’ section talks about modules of functionality, mostly trying to describe why a header file should be self contained - but uses anachronistic language about modules that doesn’t line up with the implicit or explicit modules concepts in use today, I think.

I propose making this wording a bit more explicit, including:

  1. Headers should be standalone (include all their dependencies - this is mentioned in the “is a Module” piece, by way of a technique to help ensure this, but not explicit as a goal itself).

  2. Files intended to be included in a particular context (that aren’t safe/benign to include multiple times, in multiple .cpp files, etc) should use a ‘.inc’ or ‘.def’ (.def specifically for those “define a macro, include the header which will reference that macro” style setups we have in a few places).

Everything up to here seems non-controversial. We should document this and ideally identify tooling suitable to enforce it.

And the actual layering issue:
3) Each library should only include headers or otherwise reference entities from libraries it depends on. Including in headers and inline functions. A simple/explicit way to put this: every inline function should be able to be moved into a .cpp file and the build (with a unix linker - one that cannot handle circular library dependencies) should still succeed.

This last point is the most interesting - and I hope one that people generally find desirable, so it might not be immediately obvious why it may be contentious or difficult:

LLVM violates this constraint by using inline functions in headers to avoid certain layering constraints that might otherwise cause the build to fail. A couple of major examples I’ve hit are:

TargetSelect.h and similar: This one’s especially tricky - the header is part of libSupport, but each function in here depends on a different subset of targets (creating a circular dependency) - to call the given function the programmer needs to choose the right dependencies to link to or the program will not link.
Clang Diagnostics (work in progress): The diagnostics for each component are in their own component directories, but are then all included from libClangBasic, a library none of those components depends on. (so this isn’t so much an inlining case as #include based circular dependency)

Generally I’d like to get buy-in that stricter layering is desirable, and that these few cases are at least sub-optimal, if accepted for now.

I have no strong opinion on this topic. My experience has been that it’s often far harder to unwind these types of inline dependencies than it first seems and that the value in doing so is often unclear. I’m not opposed, but I’m also not signing up to help. :slight_smile:

Oh, yeah - mostly I’m looking for community agreement (enough for me to change the Coding Standards and to push for adherence when these issues come up in future changes) about the general principle.

For existing violations - I’m not expecting people to sign up to help, and I’m not sure how many I’ll fix/get through before I get tired and just whitelist them in as “old quirky LLVM” with a note that if someone gets deep into any of that code for other reasons, they might want to keep in mind how these issues could be fixed while they’re there.

  • Dave

Looking at build-procedure files for link-order hints is technically “written down” but not really human-friendly and not at all what I had in mind. J

I get that writing it down on a doc page will have the usual bit-rot problems, but if you want to tell developers (especially newer developers) “get the layering right” you really need to point to a place that says what the layering is. Maybe you were agreeing to do that, but I’m not sure.

–paulr

Looking at build-procedure files for link-order hints is technically “written down” but not really human-friendly and not at all what I had in mind. J

I get that writing it down on a doc page will have the usual bit-rot problems, but if you want to tell developers (especially newer developers) “get the layering right” you really need to point to a place that says what the layering is. Maybe you were agreeing to do that, but I’m not sure.

I wasn’t planning on writing it down anymore than it’s already necessarily enshrined in the build system. I’m not expecting to yell at/complain to people who violate it unknowingly (Apple/Windows developers in general don’t get layering checked today because their linkers do resolve circular dependencies - but they get failures on Linux buildbots and fix them when they arise) - but to point out “oh, hey, maybe you didn’t notice but this introduces a layering violation - please fix it”.

I’m not sure what makes LLVMBuild.txt not human-friendly - they’re very terse text files, contain little other than a list of dependencies. They’re probably easier to read/maintain than CMakeLists.txt which have to be updated whenever a new source file is added. These lists have to be updated/maintained when new libraries or dependencies are introduced & yeah, for the most part we don’t have to think about them - and most changes won’t impact layering/violate layering constraints, but then rarely we will hit these things & check or update the layering, etc.

  • Dave

I can wait to see what you propose for actual coding-standard wording, as guidance for figuring out layering.

–paulr

Context: I’ve been looking at experimenting with using Modular Code Generation (My talk at last year’s LLVM dev meeting https://www.youtube.com/watch?v=lYYxDXgbUZ0 is about the best reference at the moment) when building the LLVM project, as a good experiment for the feature. This can/does enforce a stronger layering invariant than LLVM has historically been enforced. So I’m curious to get buy-in and maybe document this if it’s something people like the idea of.

I’m starting this discussion here rather than in an actual code review on llvm-commits since it seems like it could do with a bit of a wider discussion, but once/if the general direction is agreed on, I’ll send a patch for review of specific wording for the LLVM Coding Standards.

Currently the LLVM Coding Standards doesn’t say much/anything about layering. ‘A Public Header File is a Module’ section talks about modules of functionality, mostly trying to describe why a header file should be self contained - but uses anachronistic language about modules that doesn’t line up with the implicit or explicit modules concepts in use today, I think.

I propose making this wording a bit more explicit, including:

  1. Headers should be standalone (include all their dependencies - this is mentioned in the “is a Module” piece, by way of a technique to help ensure this, but not explicit as a goal itself).

  2. Files intended to be included in a particular context (that aren’t safe/benign to include multiple times, in multiple .cpp files, etc) should use a ‘.inc’ or ‘.def’ (.def specifically for those “define a macro, include the header which will reference that macro” style setups we have in a few places).

Everything up to here seems non-controversial. We should document this and ideally identify tooling suitable to enforce it.

+1

And the actual layering issue:
3) Each library should only include headers or otherwise reference entities from libraries it depends on. Including in headers and inline functions. A simple/explicit way to put this: every inline function should be able to be moved into a .cpp file and the build (with a unix linker - one that cannot handle circular library dependencies) should still succeed.

This last point is the most interesting - and I hope one that people generally find desirable, so it might not be immediately obvious why it may be contentious or difficult:

LLVM violates this constraint by using inline functions in headers to avoid certain layering constraints that might otherwise cause the build to fail. A couple of major examples I’ve hit are:

TargetSelect.h and similar: This one’s especially tricky - the header is part of libSupport, but each function in here depends on a different subset of targets (creating a circular dependency) - to call the given function the programmer needs to choose the right dependencies to link to or the program will not link.
Clang Diagnostics (work in progress): The diagnostics for each component are in their own component directories, but are then all included from libClangBasic, a library none of those components depends on. (so this isn’t so much an inlining case as #include based circular dependency)

Generally I’d like to get buy-in that stricter layering is desirable, and that these few cases are at least sub-optimal, if accepted for now.

I have no strong opinion on this topic. My experience has been that it’s often far harder to unwind these types of inline dependencies than it first seems and that the value in doing so is often unclear. I’m not opposed, but I’m also not signing up to help. :slight_smile:

While I’m also not in a position to help a lot, I think there is a question we should ask here:

Should we hold new code to this standard? Should we declare that this is what we want?

For me, I say emphatically “yes” and we should put it into the coding standards. I think cleaning up the existing code is a good thing to do and we can let people who have a reason actually drive that, but I don’t want that to be necessarily finished in order for us to establish reasonable guidelines going forward.

Context: I’ve been looking at experimenting with using Modular Code Generation (My talk at last year’s LLVM dev meeting https://www.youtube.com/watch?v=lYYxDXgbUZ0 is about the best reference at the moment) when building the LLVM project, as a good experiment for the feature. This can/does enforce a stronger layering invariant than LLVM has historically been enforced. So I’m curious to get buy-in and maybe document this if it’s something people like the idea of.

I’m starting this discussion here rather than in an actual code review on llvm-commits since it seems like it could do with a bit of a wider discussion, but once/if the general direction is agreed on, I’ll send a patch for review of specific wording for the LLVM Coding Standards.

Currently the LLVM Coding Standards doesn’t say much/anything about layering. ‘A Public Header File is a Module’ section talks about modules of functionality, mostly trying to describe why a header file should be self contained - but uses anachronistic language about modules that doesn’t line up with the implicit or explicit modules concepts in use today, I think.

I propose making this wording a bit more explicit, including:

  1. Headers should be standalone (include all their dependencies - this is mentioned in the “is a Module” piece, by way of a technique to help ensure this, but not explicit as a goal itself).

  2. Files intended to be included in a particular context (that aren’t safe/benign to include multiple times, in multiple .cpp files, etc) should use a ‘.inc’ or ‘.def’ (.def specifically for those “define a macro, include the header which will reference that macro” style setups we have in a few places).

Everything up to here seems non-controversial. We should document this and ideally identify tooling suitable to enforce it.

+1

And the actual layering issue:
3) Each library should only include headers or otherwise reference entities from libraries it depends on. Including in headers and inline functions. A simple/explicit way to put this: every inline function should be able to be moved into a .cpp file and the build (with a unix linker - one that cannot handle circular library dependencies) should still succeed.

This last point is the most interesting - and I hope one that people generally find desirable, so it might not be immediately obvious why it may be contentious or difficult:

LLVM violates this constraint by using inline functions in headers to avoid certain layering constraints that might otherwise cause the build to fail. A couple of major examples I’ve hit are:

TargetSelect.h and similar: This one’s especially tricky - the header is part of libSupport, but each function in here depends on a different subset of targets (creating a circular dependency) - to call the given function the programmer needs to choose the right dependencies to link to or the program will not link.
Clang Diagnostics (work in progress): The diagnostics for each component are in their own component directories, but are then all included from libClangBasic, a library none of those components depends on. (so this isn’t so much an inlining case as #include based circular dependency)

Generally I’d like to get buy-in that stricter layering is desirable, and that these few cases are at least sub-optimal, if accepted for now.

I have no strong opinion on this topic. My experience has been that it’s often far harder to unwind these types of inline dependencies than it first seems and that the value in doing so is often unclear. I’m not opposed, but I’m also not signing up to help. :slight_smile:

While I’m also not in a position to help a lot, I think there is a question we should ask here:

Should we hold new code to this standard? Should we declare that this is what we want?

For me, I say emphatically “yes” and we should put it into the coding standards. I think cleaning up the existing code is a good thing to do and we can let people who have a reason actually drive that, but I don’t want that to be necessarily finished in order for us to establish reasonable guidelines going forward.

Yep, that’s where I am too - I want it to be our standard going forward but, like naming conventions and other things, realize that not all existing code in the project will conform to this constraint.

  • Dave

Sent out the code review: https://reviews.llvm.org/D42771 & CC’d everyone who commented on this thread so they can easily follow along there.