[PSA] Annotating LLVM Public Interface

Hi everyone! I’m a new contributor to LLVM, and I have been looking into building LLVM as a DLL (shared library) on Windows. To support this option, we are adding annotations to LLVM’s public headers to explicitly describe the set of symbols that should be visible externally.

Details

Code changes will primarily consist of annotating LLVM’s public symbols with the LLVM_ABI macro already defined in llvm/Support/Compiler.h. There are similar macros for annotating C++ template instantiations which are used in some less common situations. A portion of the codebase is already annotated.

Because the macros are inactive by default, adding them throughout the codebase is low-risk and can be done incrementally. Annotations will not become mandatory until the entire codebase has been annotated and there are CI jobs, documentation, and tools in place to catch regressions.

Generally, annotations will be added to individual symbols rather than to entire classes. This method is preferred for a couple of reasons:

  • It leads to exporting only the symbols that are truly needed. Limiting the exported surface area enables improved LTO and will help mitigate the risk of hitting Windows’ 64K DLL export limit
  • It avoids issues that arise from exporting generated copy constructors and operators. While these issues can be be solved by explicitly deleting compiler generated methods, they will be difficult for unfamiliar engineers to diagnose and fix.

The bulk of annotations will be added mechanically using the Interface Definition Scanner tool, which leverages clang’s AST and rewriter libraries.

Previous Efforts

This LLVM discourse from 2021 covers the original proposal in detail and is still mostly relevant. Following that discussion, There was some initial work in 2023 which identified issues and proved-out viability. This work resulted in this Discord discussion.

In 2024, the effort was resumed as part of a GSoC project to support clang plugins on Windows. This work is primarily tracked in this issue on GitHub. The project added build options to build LLVM as a DLL, introduced the macros to annotate LLVM’s public surface area, and annotated a portion of the codebase. The work to get LLVM fully building as a DLL is incomplete.

Maintainability

Most LLVM developers do not build on Windows locally, so they may not immediately catch breaks caused by missing symbol annotations. There are a number of things we will do to help identify issues earlier in the development cycle. Annotations will not be mandatory until these pieces are in place.

1. Documentation and Examples

The use cases for LLVM_ABI and related macros will be documented and discoverable. We will document, with examples, patterns and situations that may occur to make it easy for developers to address related issues that arise during development.

2. Windows LLVM DLL CI build job

A Windows LLVM DLL build job to CI will catch unannotated symbols at link time. This job can run either pre- or post-merge. This build job will not catch any unannotated LLVM symbol referenced by projects that don’t get built.

We may also consider changing the default Windows build to LLVM DLL. This change would let all existing Windows build jobs to catch missing export issues.

3. Approximate DLL export behavior on other shared-library builds

We can achieve similar behavior to Windows DLL exports in other environments by building ELF and Mach-O shared libraries with default hidden symbol visibility. This result is achieved by setting -fvisibility-default=hidden and re-defining the LLVM_ABI annotation to __attribute__((__visibility__("default"))). The existing annotations in llvm/Support/Compiler.h already behave this way when configured for a non-Windows shared library build.

This mechanism will produce similar behavior to the Windows DLL build and could catch most issues without building for Windows. However, since most developers are using static library builds locally, this change won’t necessarily result in catching missing annotations earlier.

4. Static analysis with the Interface Definition Scanner tool

The Interface Definition Scanner tool will be run on PRs to flag newly introduced symbols that are not properly annotated for export. It can run much faster than a full Windows build of all projects, and can suggest exact fixes to address missing exports.

Once the bulk of symbol annotations have been merged, we can enable IDS to run on all LLVM PRs – there is not need to wait until building Windows as a DLL is a complete or fully supported configuration.

Additional Background

LLVM can already be built as a shared library on ELF- and Mach-O-based systems; however, building it as a Windows DLL is more involved for several reasons:

  • Symbols are not exported from a DLL by default, similar to building ELF shared libraries with fvisibility-default=hidden. To make a symbol externally visible, it must be explicitly exported when building the DLL. A symbol can be exported by annotating it with __declspec(dllexport) or by adding its name to a module definition (.def) file.
  • Symbols imported from a Windows DLL may be annotated with __declspec(dllimport) when compiling clients to remove a level of runtime indirection. This annotation is not strictly required; however, if the symbol is not annotated with __declspec(dllimport), it is the responsibility of the developer to dereference the pointer to use the symbol.
  • CMake v3.4 introduced support to automatically export all symbols from a DLL with the WINDOWS_EXPORT_ALL_SYMBOLS target property. LLVM currently requires minimum CMake version 3.20.
  • A single Windows DLL can export a maximum of 65,535 symbols. This limitation most likely prevents us from brute-force exporting everything using CMake’s WINDOWS_EXPORT_ALL_SYMBOLS.

Exporting C++ Classes

When defining DLL exports, it is possible to annotate entire C++ classes and structs, rather than their individual members, with __declspec(dllexport). Annotating a class will export every method and static field in the class class including:

  • Compiler generated methods, such as copy/move constructors and assignment operators
  • Methods defined entirely in the header
  • Private methods
  • RTTI/vtable as appropriate

Annotating a class does not implicitly export nested classes/structs or any friend class or function declarations. A class with a class-level annotation cannot also have annotated members-- it will fail to compile.

The advantage of annotating at the class level is that new members will be automatically exported. However, exporting entire classes can cause significantly more methods to be exported than necessary, and it can lead to tricky-to-debug problems with compiler-generated methods.

7 Likes

@compnerd , @vvassilev

1 Like

Googletest does something similar, with a macro to identify classes/APIs that should be exported. When I was attempting to implement Rotten Green Tests in Googletest, the Windows build easily caught the cases where I’d omitted the annotation macro, and it wasn’t hard to identify how to fix things. It was nice that the default Windows configuration did that in my local builds. I strongly suggest that the DLL mode be included in pre-commit CI, because a majority of people in the LLVM community do not have access to a Windows environment, and it is a very easy thing to miss adding the annotation.

Tagging more people interested in Windows: @rnk @hansw2000 @jmorse

1 Like

Definitely makes sense. My only concern is increasing pre-commit build times. I think we may be able to achieve something similar with static analysis that will run faster, but that still needs to be demonstrated.

Thanks for posting the update. This looks reasonable to me.

I’m a bit confused by this point. At least in the previous patches I’ve seen, annotations were placed either on classes, or on freestanding functions, not on individual class members. Are you going to use a different approach now, or am I misunderstanding something?


How do you plan to land these changes? Assuming that the changes are fully auto-generated by ids, there’s not really much need for manual review, so is the plan to just mass-add all the annotations to llvm/include at once?

Right, we can either annotate entire classes or individual class members. The few existing instances in the codebase are indeed annotated at the class level. I am recommending we annotate individual class members instead. Both options have pros and cons:

Class Annotations
+ Adds fewer annotation points to the codebase: one per class rather than one per exported class symbol.
+ New class members are automatically exported.
- Exports significantly more symbols than necessary, including private class members, inline function definitions, and type info.
- Exporting a class forces templates to be fully instantiated, which can lead to confusing compilation errors described here. I am concerned these errors will be difficult for developers to debug and solve.

Class Member Annotations
+ Limits exported symbols to only those that are actually required; type info, private members, and functions defined inline are not exported.
- New class members must be explicitly annotated.
- Adds significantly more annotation points to the codebase.

Additionally, the IDS tool I’m using to code-mod in the annotations doesn’t currently support class-level annotations. This support could be added, so it really shouldn’t be a deciding factor.

My recommendation to annotate class members instead of classes is to minimize the number exported symbols to stay under the 64K DLL export limit and to avoid confusing compilation errors due to fully instantiating templates in exported classes. I think it is the better choice for these reasons, but would definitely appreciate any counterpoints.

Using primarily class member annotations locally, I’ve been able to build clang against an LLVM DLL on Windows. The LLVM DLL exports just over 17K symbols, which is well below the 64K DLL export limit. I do not know how many exports there would be if we were to export classes instead, but could find out with some work to modify the IDS tool to support it. I suspect it will blow past the 64K limit or at least put us a lot closer to it.

Is there a preferred process for merging codemods? I definitely appreciate any input here from those who have gone through similar.

My current plan is to put up one PR for each library under llvm for review. It would be something like 30-40 PRs. After those merge, I expect a tail of follow-up PRs to add annotations that may have been missed and to annotate newly merged code until we get automated builds and IDS jobs running.

1 Like

FWIW, /Zc:Dllexportinlines- in clang-cl will prevent inline functions from being exported even if you annotate the class. MSVC doesn’t support it though, so it might not be the best option to rely on.

1 Like

Thanks for the detailed analysis. Given the Windows limitations involved, I agree that using class member annotations is preferable.

Personally I’d be fine if you did a PR for one library to make sure we have agreement on the approach, and then did the rest in a bulk change. I don’t think there is a lot of benefit in splitting it up by library. But I think that approach is fine too.

1 Like

That was something updated the LLVM CMake system to use, but i have to admit the maintenance burden of having to fix new compile errors for MSVC when class special members get instantiated in headers from dllexport on classes might be too much hassle to constantly solve.

1 Like

This is cool to see!

You probably know this (I learned through painful experience), but for the benefit of other readers: despite the name, this option does not, in fact, export all symbols. For example, global data symbols still need to be annotated with __declspec(dllexport).

I recently ran into this with building a shared library for grpc, which (in conjunction with CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS) blew through the limit[1]. After some digging, I opened this issue with Microsoft – feel free to upvote if you care about this; a previous issue on this subject had gotten the response that “We are thinking of ways to get around this”, so not completely impossible (though I’ll be first to admit that chances are slim).


  1. Since grpc is not fully annotated and I don’t have the time to convince upstream to support it, I ended up with a PRE_LINK script that does some basic symbol filtering based on dumpbin /symbols of the objects. I don’t recommend it though, I consider it a disgusting hack. ↩︎

2 Likes

To be more precise, it’s not that the case when exporting the data symbols need __declspec(dllexport), that side works just fine even if it would be exported with CMake’s WINDOWS_EXPORT_ALL_SYMBOLS - it’s the side accessing the data symbol which needs __declspec(dllimport). (Which the link also says.)

FWIW, with mingw tooling, one avoids the need for explicit dllimport annotations (thanks to mechanisms called autoimport and runtime pseudo relocations), and without dllexport annotations, it exports all symbols. This also means that it’s been possible to build the LLVM and Clang dylibs for a long time already.

But the limit of 65536 exported symbols is indeed an issue. By omitting symbols that are marked hidden for ELF builds from the mingw autoexporting of all symbols ([llvm] Use hidden visibility when building for MinGW with Clang · llvm/llvm-project@2c2fb0c · GitHub), we currently have a decent margin to the limit (current builds end up with around 43k exports). But getting even more precise annotations would of course be even better, as the number of symbols is growing all the time; around LLVM 16, the number of symbols were around 37k.

5 Likes

@cdacamar, I think we briefly discussed this couple of months ago but as this seems important, fyi :wink:

1 Like

A few different customers have bumped up against the 64k limitation. It is indeed caused by the PE section encoding just a 16-bit number, but lifting that limitation isn’t the only thing to consider as the Windows loader might also need some changes to support larger sets of exports.

The scope of work required to lift the 64k limitation isn’t just on the tools, it’s on the larger ecosystem. Unless that issue gets major traction and buy-in from multiple teams, it is highly unlikely that we would ever fix it.

That’s terrible news (even if expected). For affected cases – which will only grow in number over time – there’s literally no alternative; not even switching toolchain helps, one is forced to either drop support for shared libraries or windows entirely[1].

Rather than doing some basic extrapolation of the consequences and fixing it out of foresight before it becomes an even bigger problem, it seems we’ll have to wait until some important enough (to MSFT) clients/projects are affected.

https://developercommunity.visualstudio.com is designed (perhaps not intentionally, but effectively) in such a way that getting major traction is impossible – the most upvoted issues are in the 100s, which is hardly major anything. Also, compiler issues get completely swamped by wishes for IDE improvements.

If MSFT moved the issue tracking to Github (which they own too…), there’d be real engagement, and more importantly, lots of cross-references from projects affected by compiler bugs or constraints. But I doubt MSFT actually wants more visibility, because then such issues become much harder to ignore, and so they’d be forced to spend resources on stuff that’s not hand-picked by the product managers.


  1. or spend substantial effort to split the library, which just kicks the can down the road ↩︎

1 Like