RFC: Python callback for Source File Resolution

Problem

LLDB currently does not support fetching source files from arbitrary source servers due to security concerns and debuginfod support for fetching source files is yet to be integrated. Users currently need to ensure the source file state matches with the debugging process built source state inorder to resolve the source files correctly and hit breakpoints during debugging.

Proposal

I’d like to propose a new feature to address this problem which is similar to python callback for custom module resolution.

A Python callback for Platform CallResolveSourceFileCallbackIfSet.

This new feature has negligible performance impact when not used.

When it is used, this Python callback will work as the implementation for getting source files from stack frame in LineEntry.cpp - ApplyFileMappings. The callback takes build id of module, original source file resolved by LLDB as input args and populates resolved source file spec. The method’s signature is as follows:

  void CallResolveSourceFileCallbackIfSet(const char* build_id, const FileSpec& original_source_file_spec, FileSpec &resolved_source_file_spec, bool *did_create_ptr);

If the callback fails, or something goes wrong, CallResolveSourceFileCallbackIfSet fallbacks to continue to use the LLDB implementation for getting source files. If the callback succeeds to return a source file path, CallResolveSourceFileCallbackIfSet will use it in the same way with the LLDB implementation.

This will unblock users to write their own source file caching system for LLDB and allows fetching from arbitrary source servers. Since the python callback is called from userland, LLDB does not need to deal with authentication from different source servers and not worry about security concerns exposing source code.

Users will be able to use a new SBPlatform API to set the callback function.

Performance benefits

Currently, we are checking out an entire repo and/or checking out the source code commit corresponding to the built process even if we want to resolve a few source files for debugging. In my scenario, checking out the repo takes close to 10 minutes. This can be an overkill whereas fetching of source files on a pay-per-play basis should be very fast where fetching each file takes 1-2 seconds.

Draft Implementation

Commit

1 Like

@clayborg @splhack

I think this is an interesting feature, and we’d most likely start using it as soon as it is implemented. I have two questions about the implementation though:

  • should this really be a platform API? My thinking is that the file names come from the debug info, which are housed by the Module class, and while Platforms can help with finding modules, once they are created, the modules are very much independent. I don’t really have an alternative proposal here (however @bulbazord is getting ready to refactor the Platform class, so he might), I’d just like to hear what you think about this.
  • instead of the build-id, would it be possible to pass the callback an (SB)Module instead? My reasoning here is that if the callback wants to get the build id, it can always get it from the Module, but this way maybe it can get some additional information that wouldn’t be available otherwise. For example, a callback may not be able to provide files for every module, but it may not be able to tell from the build id (an opaque string) whether this is one of the supported modules. Or it may need the module file name (or something else) in order to locate the file.
1 Like

That makes sense as we can have ModuleSpec infer different properties such as the build_id. I can change it to pass a ModuleSpec

Just a small correction, the API exposed to python is

SetResolveSourceFileCallback(const char* build_id, const FileSpec& original_source_file_spec, FileSpec &resolved_source_file_spec)

The one mentioned in the RFC is an internal API in Platform.cpp, used in ApplyFileMappings

I guess that could help someone, but why not pass the actual module instead of (just) the ModuleSpec ? The reason I’m asking for this is because the information that I would need (in the majority of cases I need to support anyway) is actually in a global (constant) variable inside the module. I couldn’t get what I need from a ModuleSpec. I’d need an actual module so I can look up that constant.

Sure, I can add the friend class SBPlatform to SBModule.h so that it can do the SBModule to ModuleSP conversion here

Cool, thanks. (Don’t be afraid to make the SB classes friends of each other. The methods are private just so that the outside world can’t access them.)

Submitted 3 PR’s for review. Please review when you get a chance

Somewhat similar concern is raised for custom module resolution(RFC), where it was decided to use it as Platform API since PlatformSP is always retained per Target and should be able to resolve modules associated with the debugger target. Took a similar approach for source files, would be happy to explore any alternate solutions

@labath Happy New Year!!

Hope you are doing good. Thanks for reviewing the first PR [lldb][ResolveSourceFileCallback] Update SBModule by rchamala · Pull Request #120832 · llvm/llvm-project · GitHub. I am waiting on request for merge access - Request Commit Access For rchamala · Issue #121244 · llvm/llvm-project · GitHub, would appreciate if you can approve it, so that I can complete the PR

Is this a limitation of debuginfod (i.e. it doesn’t support this yet) or a limitation of LLDB’s support for debuginfod (i.e. we don’t know how to ask debuginfod for source files)? I’m assuming the former because otherwise it seems like we should invest in the latter?

The mention of debuginfod makes me wonder if we should make this a property of the SymbolLocator plugin. I have a very similar use case where (in the near future) I will to have to teach the symbol locator how to fetch source files.

If that’s the route we want to go, should we consider making a scripted symbol file plugin. With all the work @mib did for scripted processes, it’s pretty straightforward to add a scripted-anything and I think that might provide more flexibility down the line than a callback.

I am not sure about the former(debuginfod support) but I believe LLDB does not currently have the support for debuginfod. From what I gather, there have been discussions around implementing custom symbol downloads and source file downloads using debuginfod but it is not present yet. Particularly, source file support with debuginfod needs to consider security concerns as it deals with sensitive source code information.

The changes required in LLDB to support debuginfod also need to change in the same areas as the submitted PR’s. Meanwhile that is implemented, we have custom module callback(RFC: Python callback for Target get module ) already implemented into LLDB. In addition, we wanted to have custom source file callback as a way for users to specify their own logic for fetching source files without worrying about security concerns.

Scripting symbol file plugin is new to me but sounds interesting. Does it allow for the same flexibility to override how users can fetch source files ?

Using platform callbacks for finding modules does not sound particularly surprising to me because platforms are already intimately involved in locating modules. Using platforms for finding source files seems a bit more fuzzy, since I think there’s no precedent for something like that. It also limits your options somewhat since modules (by design) don’t know which platform created them, so you can never find the source file callback from inside the module (only from things like target which have a platform available).

Note I’m not saying this is a bad design, just that it’s worth giving it a second thought.

I think it’s mostly the latter. I am not sure about the implementations, but I believe the protocol itself does support source file downloads.

That said, I think it’d be still useful to support an extensible method of downloading/locating source files, as not everything runs on debuginfod. (I mean, I suppose you could make a fake debuginfod server which talks to lldb and implements the custom logic under the hood, but that seems somewhat convoluted.) I think that sort of aligns with your idea of putting this inside a ScriptedSomething, except that I don’t think that “Something” should be a SymbolFile. Symbol files plugins are incredibly complicated and most (if not all) of that complexity does not have anything to do with finding source files. (It’s true that DWARF>=5 can include source code inside the line table, but I’d argue that this discussion is only relevant for files which don’t do that.) I’d probably say it should be a ScriptedSymbolLocator, since that would be the natural place to put the debuginfod source file downloading code as well. (Bonus points for making it general enough so that one can implement a DebuginfodSymbolLocator in python.)

Sorry for the confusion… I meant SymbolLocator, not SymbolFile. I turned the former into a plugin when the debuginfod work was being done, knowing I’d need something similar in the future (i.e. the thing I hinted at earlier). I totally agree that SymbolFile would be the wrong place to put this.

1 Like

Do you think TargetSP is a better place for this callback ? Apart from that, I can’t think of any alternatives. If you have any alternate solutions, would love to explore that.

I don’t think that would address Pavel’s concern. You also can’t get to the Target that’s using the Module from the Module itself (since the Module might be shared by many Targets.)

Based on the discussion, I see 2 approaches. Would appreciate your thoughts on whether any of the following approaches seem reasonable:

  1. Make the callback as a Module API: Since Modules can be independent and can be used alone to locate source files instead of relying on Platform

  2. Use SymbolLocator plugins: As Jonas has mentioned, I could make it part of this plugin.

Jim is correct. Target is in the same boat as platform. For it to work the way I want(ed) to, this would have to be a module-level API. OTOH, I don’t think we have any module-level callbacks right now, and you don’t seem to need it for what you’re trying to do, so it does need to be balanced against that as well.

That said, I think Jonas’s idea of doing a scripted symbol locator plugin is the most principled approach, and it’s the one I’d recommend.

2 Likes