I would like to propose the following change to external_source_symbol attribute in Clang to support better indexing of Swift symbols in C++. We should allow the user to specify a concrete USR in the attribute. The Clang indexer will then use it when computing the USR of the symbol in the emitted indexed data. For example, given this Swift code:
public struct TestStruct {
public func method() {
}
...
}
Then the generated C++ header for the Swift will contain the following external_source_symbol attribute:
class __attribute__((external_source_symbol(language="Swift", defined_in="Test",USR="s:4Test10TestStructV",generated_declaration))) TestStruct final {
public:
inline void __attribute__((external_source_symbol(language="Swift", defined_in="Test",USR="s:4Test10TestStructV6methodyyF",generated_declaration))) method() const;
};
This will allow a language indexer that recognizes external_source_symbol to know that any references to TestStruct or TestStruct::method actually refer to the underlying Swift symbols.
To allow the sources to conditionally enable this new extension I would like to propose adding a new case to __has_feature in Clang:
I think the idea is reasonable but I did have some questions that could be concerns.
Do we document the format of USRs somewhere? If not, how do non-compiler-developers use this new functionality?
Is the encoding for USRs stable? If we allow them to be used in the interface, it seems to me like we’re promising stability so that we don’t silently break users later.
I don’t think we should use __has_feature for testing attribute functionality, we have __has_attribute and friends for that and should be reusing that machinery these days.
The USR format specified in this attribute is the USR format of the foreign language from which the external symbol came from, so it wouldn’t necessarily be documented in Clang’s docs. I don’t think Swift specifically has a documentation page that describes its USR format , as it just corresponds to the mangled symbol name with an “s:” prefix. For our use case, the header file that provides C++ bindings for a Swift module will be generated by the compiler, so the user wouldn’t have to specify the USR manually.
That depends on the specific compiler of the foreign language from which the external symbol came from. When it comes to Swift specifically, its USRs correspond to mangled symbol names. This means that ABI stable Swift interfaces will have stable USRs. Once the Swift compiler generates a header file for an ABI stable Swift interface with such such USRs, the header file will then be compatible with future versions of the indexer in the Swift toolchain as the toolchain will respect the ABI stability guarantees. Uses of Swift interfaces that don’t opt-in into ABI stability from C++ will require regeneration of the header file whenever the Swift compiler that built the Swift module changes, so any USR format changes will be accounted for in the newly generated header.
Sounds good, I will update my patch to use __has_attribute instead.
Thank you for the details! I’m of two minds on this proposal. On the one hand, I think this adds some useful functionality that fits in line with the design of the original attribute. On the other hand, I don’t know how anyone would use this new feature in practice aside from through code generation where the USR is automatically generated (as in Swift). The USR is going to depend on the source language, which we can basically assume will be undocumented except for rare occurrences, and the C interface using the attribute has no way to validate anything to help the user catch mistakes with the USR, so this is a user unfriendly proposal.
I did some searching of open source code bases and I do not have the impression this attribute is being used by users in practice; it seems to only be used by Swift. So if this is intended as compiler glue code rather than a user-facing attribute people would write by hand, then I think it’s reasonable. However, I also notice that a lot of the uses of this attribute are with #pragma clang attribute blasting it onto every declaration it can (context:global external_… - Sourcegraph), so I assume Swift will change those to instead write the attributes on individual declarations so that the USR can be generated correctly?
Yep, the current use case is exactly that. We do not expect a user of mixed-language Swift and Objective-C/C/C++ project to write this attribute by hand anywhere, as the interoperability header that bridges Swift interfaces into Objective-C/C/C++ is always generated by the compiler. This is codified by the Swift-to-C++ interoperability vision document in the Swift project.
Yep, exactly. For Objective-C interop, there was no need to specify USRs explicitly (as Swift actually uses Clang USRs for its @objc declarations), so we could apply this attribute using the pragma in the generated header. However, now that we want to access Swift native APIs in C++, we need to specify the USRs on the generated declarations, and thus we will attach this attribute to each individual declaration in the generated header (using some macro).
I am still working on a patch for the Swift compiler that updates the header generation for this change, but once I have that PR up I will link it in this thread.
That’s good to know, but the situation I’m a bit more concerned by is non-Swift and Objective-C/C/C++ (for example, this seems like the sort of functionality SWIG might use). But it still seems like that’s going to be automatically generated USRs and not hand-written.
Good to know! I’m curious to hear how that goes in practice though, as I understand the big reason to add pragma clang attribute in the first place was to drastically improve parse times for those headers.
Please do. Do you think it might make sense to hold off on landing the attribute in Clang until after Swift has validated that this approach still gives palatable compile times?
The parse time of the generated header with C++ bindings will really depend on the APIs that a particular Swift module is exposing. A lot of the time there will be a lot of additional C++ boilerplate specifically for Swift value types, class types, and Swift generics. Thus the majority of the parse time is going to be spent parsing the actual C++ boilerplate, especially parsing and instantiating the C++ templates that model Swift generics. I believe that the addition of and new attribute onto the generated declaration is not going to impact the parse time of those headers in any significant manner because of that, especially because a lot of the boilerplate that’s generated is going to be without this attribute as it’s essentially implementation detail of the Swift-to-C++ binding.
Today I did some local testing with both changes to see how it would impact compile time for some sample mixed-language projects that use the generated header. I did not observe any negative compile time impact outside of noise.
Thanks for the extra details and linking the other patch! I’m surprised to hear about the compile time impacts being negligible given that the whole reason we have the pragma was the promise it was necessary for reasonable parsing times (which made sense to me given that some system headers will have multiple attributes on almost every function declaration). However, perhaps those scenarios are different than the ones you’re hitting with Swift. I dunno. If you’re happy with the performance, then I’ll be content with that. So this RFC looks reasonable to me.