RFC: Python callback for Source File Resolution

rahulvn389 · December 8, 2024, 4:17am

Problem

LLDB currently does not support fetching source files from arbitrary source servers due to security concerns and debuginfod support for fetching source files is yet to be integrated. Users currently need to ensure the source file state matches with the debugging process built source state inorder to resolve the source files correctly and hit breakpoints during debugging.

Proposal

I’d like to propose a new feature to address this problem which is similar to python callback for custom module resolution.

A Python callback for Platform CallResolveSourceFileCallbackIfSet.

This new feature has negligible performance impact when not used.

When it is used, this Python callback will work as the implementation for getting source files from stack frame in LineEntry.cpp - ApplyFileMappings. The callback takes build id of module, original source file resolved by LLDB as input args and populates resolved source file spec. The method’s signature is as follows:

  void CallResolveSourceFileCallbackIfSet(const char* build_id, const FileSpec& original_source_file_spec, FileSpec &resolved_source_file_spec, bool *did_create_ptr);

If the callback fails, or something goes wrong, CallResolveSourceFileCallbackIfSet fallbacks to continue to use the LLDB implementation for getting source files. If the callback succeeds to return a source file path, CallResolveSourceFileCallbackIfSet will use it in the same way with the LLDB implementation.

This will unblock users to write their own source file caching system for LLDB and allows fetching from arbitrary source servers. Since the python callback is called from userland, LLDB does not need to deal with authentication from different source servers and not worry about security concerns exposing source code.

Users will be able to use a new SBPlatform API to set the callback function.

Performance benefits

Currently, we are checking out an entire repo and/or checking out the source code commit corresponding to the built process even if we want to resolve a few source files for debugging. In my scenario, checking out the repo takes close to 10 minutes. This can be an overkill whereas fetching of source files on a pay-per-play basis should be very fast where fetching each file takes 1-2 seconds.

Draft Implementation

Commit

rahulvn389 · December 8, 2024, 10:40pm

@clayborg @splhack

labath · December 9, 2024, 4:59pm

I think this is an interesting feature, and we’d most likely start using it as soon as it is implemented. I have two questions about the implementation though:

should this really be a platform API? My thinking is that the file names come from the debug info, which are housed by the Module class, and while Platforms can help with finding modules, once they are created, the modules are very much independent. I don’t really have an alternative proposal here (however @bulbazord is getting ready to refactor the Platform class, so he might), I’d just like to hear what you think about this.
instead of the build-id, would it be possible to pass the callback an (SB)Module instead? My reasoning here is that if the callback wants to get the build id, it can always get it from the Module, but this way maybe it can get some additional information that wouldn’t be available otherwise. For example, a callback may not be able to provide files for every module, but it may not be able to tell from the build id (an opaque string) whether this is one of the supported modules. Or it may need the module file name (or something else) in order to locate the file.

rahulvn389 · December 9, 2024, 6:17pm

That makes sense as we can have ModuleSpec infer different properties such as the build_id. I can change it to pass a ModuleSpec

rahulvn389 · December 9, 2024, 6:47pm

Just a small correction, the API exposed to python is

SetResolveSourceFileCallback(const char* build_id, const FileSpec& original_source_file_spec, FileSpec &resolved_source_file_spec)

The one mentioned in the RFC is an internal API in Platform.cpp, used in ApplyFileMappings

labath · December 16, 2024, 1:14pm

I guess that could help someone, but why not pass the actual module instead of (just) the ModuleSpec ? The reason I’m asking for this is because the information that I would need (in the majority of cases I need to support anyway) is actually in a global (constant) variable inside the module. I couldn’t get what I need from a ModuleSpec. I’d need an actual module so I can look up that constant.

rahulvn389 · December 16, 2024, 5:39pm

Sure, I can add the friend class SBPlatform to SBModule.h so that it can do the SBModule to ModuleSP conversion here

labath · December 20, 2024, 2:33pm

Cool, thanks. (Don’t be afraid to make the SB classes friends of each other. The methods are private just so that the outside world can’t access them.)

rahulvn389 · December 21, 2024, 9:16am

Submitted 3 PR’s for review. Please review when you get a chance

github.com/llvm/llvm-project

[lldb][ResolveSourceFileCallback] Update SBFileSpec/SBModule

llvm:main ← rchamala:users/rachamal/lldb_api_1

opened 08:59AM - 21 Dec 24 UTC

rchamala

+9 -0

Summary: RFC https://discourse.llvm.org/t/rfc-python-callback-for-source-file-r…esolution/83545 SBFileSpec and SBModule will be used for resolve source file callback as Python function arguments. This diff allows these things. Can be instantiated from SBPlatform. Can be passed to/from Python. Test Plan: N/A. The next set of diffs in the stack have unittests and shell test validation

github.com/llvm/llvm-project

[lldb][ResolveSourceFileCallback] Call resolve source file callback

llvm:main ← rchamala:users/rachamal/lldb_api_2

opened 09:01AM - 21 Dec 24 UTC

rchamala

+302 -14

Summary: RFC https://discourse.llvm.org/t/rfc-python-callback-for-source-file-r…esolution/83545 Updated LineEntry::ApplyFileMappings to call resolve source file callback if set. include/lldb/Target/Platform.h, source/Target/Platform.cpp Implemented SetResolveSourceFileCallback and GetResolveSourceFileCallback include/lldb/Symbol/LineEntry.h, Source/Symbol/LineEntry.cpp Implemented CallResolveSourceFileCallbackIfSet Source/Target/StackFrame.cpp Source/Target/StackFrameList.cpp Source/Target/ThreadPlanStepRange.cpp Updated the caller to ApplyFileMappings unittests/Symbol/TestLineEntry.cpp Added comprehensive ResolveSourceFileCallback tests. Test Plan: Added unittests for LineEntry. ``` ninja check-lldb-unit ```

github.com/llvm/llvm-project

[lldb][ResolveSourceFileCallback] Implement API, Python interface

llvm:main ← rchamala:users/rachamal/lldb_api_3

opened 09:03AM - 21 Dec 24 UTC

rchamala

+347 -2

Summary: RFC https://discourse.llvm.org/t/rfc-python-callback-for-target-get-mo…dule/71580 Use SWIG for the resolve source file callback the same as other Python callbacks. TestResolveSourceFileCallback.py verifies the functionalities. Test Plan: Added shell tests for validation ``` ./llvm-lit -sv TestResolveSourceFileCallback.py ``` Differential Revision: https://phabricator.intern.facebook.com/D67541203

rahulvn389 · December 22, 2024, 7:08pm

Somewhat similar concern is raised for custom module resolution(RFC), where it was decided to use it as Platform API since PlatformSP is always retained per Target and should be able to resolve modules associated with the debugger target. Took a similar approach for source files, would be happy to explore any alternate solutions

rahulvn389 · January 2, 2025, 9:10pm

@labath Happy New Year!!

Hope you are doing good. Thanks for reviewing the first PR [lldb][ResolveSourceFileCallback] Update SBModule by rchamala · Pull Request #120832 · llvm/llvm-project · GitHub. I am waiting on request for merge access - Request Commit Access For rchamala · Issue #121244 · llvm/llvm-project · GitHub, would appreciate if you can approve it, so that I can complete the PR

JDevlieghere · January 6, 2025, 7:01pm

Is this a limitation of debuginfod (i.e. it doesn’t support this yet) or a limitation of LLDB’s support for debuginfod (i.e. we don’t know how to ask debuginfod for source files)? I’m assuming the former because otherwise it seems like we should invest in the latter?

The mention of debuginfod makes me wonder if we should make this a property of the SymbolLocator plugin. I have a very similar use case where (in the near future) I will to have to teach the symbol locator how to fetch source files.

If that’s the route we want to go, should we consider making a scripted symbol file plugin. With all the work @mib did for scripted processes, it’s pretty straightforward to add a scripted-anything and I think that might provide more flexibility down the line than a callback.

rahulvn389 · January 6, 2025, 11:19pm

I am not sure about the former(debuginfod support) but I believe LLDB does not currently have the support for debuginfod. From what I gather, there have been discussions around implementing custom symbol downloads and source file downloads using debuginfod but it is not present yet. Particularly, source file support with debuginfod needs to consider security concerns as it deals with sensitive source code information.

The changes required in LLDB to support debuginfod also need to change in the same areas as the submitted PR’s. Meanwhile that is implemented, we have custom module callback(RFC: Python callback for Target get module ) already implemented into LLDB. In addition, we wanted to have custom source file callback as a way for users to specify their own logic for fetching source files without worrying about security concerns.

Scripting symbol file plugin is new to me but sounds interesting. Does it allow for the same flexibility to override how users can fetch source files ?

labath · January 7, 2025, 3:21pm

Using platform callbacks for finding modules does not sound particularly surprising to me because platforms are already intimately involved in locating modules. Using platforms for finding source files seems a bit more fuzzy, since I think there’s no precedent for something like that. It also limits your options somewhat since modules (by design) don’t know which platform created them, so you can never find the source file callback from inside the module (only from things like target which have a platform available).

Note I’m not saying this is a bad design, just that it’s worth giving it a second thought.

labath · January 7, 2025, 3:34pm

I think it’s mostly the latter. I am not sure about the implementations, but I believe the protocol itself does support source file downloads.

That said, I think it’d be still useful to support an extensible method of downloading/locating source files, as not everything runs on debuginfod. (I mean, I suppose you could make a fake debuginfod server which talks to lldb and implements the custom logic under the hood, but that seems somewhat convoluted.) I think that sort of aligns with your idea of putting this inside a ScriptedSomething, except that I don’t think that “Something” should be a SymbolFile. Symbol files plugins are incredibly complicated and most (if not all) of that complexity does not have anything to do with finding source files. (It’s true that DWARF>=5 can include source code inside the line table, but I’d argue that this discussion is only relevant for files which don’t do that.) I’d probably say it should be a ScriptedSymbolLocator, since that would be the natural place to put the debuginfod source file downloading code as well. (Bonus points for making it general enough so that one can implement a DebuginfodSymbolLocator in python.)

JDevlieghere · January 7, 2025, 4:37pm

Sorry for the confusion… I meant SymbolLocator, not SymbolFile. I turned the former into a plugin when the debuginfod work was being done, knowing I’d need something similar in the future (i.e. the thing I hinted at earlier). I totally agree that SymbolFile would be the wrong place to put this.

rahulvn389 · January 8, 2025, 3:15am

Do you think TargetSP is a better place for this callback ? Apart from that, I can’t think of any alternatives. If you have any alternate solutions, would love to explore that.

jingham · January 9, 2025, 3:48pm

I don’t think that would address Pavel’s concern. You also can’t get to the Target that’s using the Module from the Module itself (since the Module might be shared by many Targets.)

rahulvn389 · January 11, 2025, 7:38pm

Based on the discussion, I see 2 approaches. Would appreciate your thoughts on whether any of the following approaches seem reasonable:

Make the callback as a Module API: Since Modules can be independent and can be used alone to locate source files instead of relying on Platform
Use SymbolLocator plugins: As Jonas has mentioned, I could make it part of this plugin.

labath · January 13, 2025, 1:20pm

Jim is correct. Target is in the same boat as platform. For it to work the way I want(ed) to, this would have to be a module-level API. OTOH, I don’t think we have any module-level callbacks right now, and you don’t seem to need it for what you’re trying to do, so it does need to be balanced against that as well.

That said, I think Jonas’s idea of doing a scripted symbol locator plugin is the most principled approach, and it’s the one I’d recommend.

Topic		Replies	Views
[RFC] Support fetching source files with Debuginfod LLDB	9	67	May 30, 2025
RFC: Python callback for Target get module LLDB	5	811	July 12, 2023
[RFC] Add `source command` to allow new lldb command to be registered when source content is requested LLDB	2	278	June 14, 2022
RFC: full support for python files, and avoid using FILE* internally LLDB	6	89	September 25, 2019
Get source-map from python API? LLDB	4	111	January 19, 2016

RFC: Python callback for Source File Resolution

Related topics