Heterogeneous debugging and HSA

Hi everyone,

I am a member of the HSA Foundation tools group, who are attempting to standardise debugging and profiling interfaces for heterogeneous systems.

HSA is a low-level heterogeneous system architecture and software platform. There are standards describing the architectural requirements, an LLVM-IR-like intermediate language for portable HSA applications (HSAIL), and a runtime interface for interacting with the system. We are currently developing a standard for debugging HSA applications in a vendor-independent manner.

By working together with other communities, we are hoping to extend existing debuggers and standards to support such systems. To this end, I’m hoping to start a conversation with you all about how LLDB could be augmented to support heterogeneous debugging, and to ask for input about the standards we are writing.

To aid these discussions and provide a testbed for ideas, I have modified LLDB to support debugging HSA programs on AMD’s hardware. My aim for this project is to adapt it to the standard HSA debugging interface once this is completed, so that it could debug HSA applications on any vendor’s implementation.

Currently, the changes I have made to LLDB are in no state for upstreaming to trunk LLDB, as I have made some HSA-specific alterations to core parts of the code base. My hope is that, aided by discussion with the community, I can abstract out these modifications to support the necessary features in an elegant and extensible manner. This would most likely involve changing the threading model of LLDB to support groups of threads which run on different devices and a “some-stop” model by which some thread groups can continue to execute whilst others are stopped. I plan on starting work on this soon, but I wonder if anyone else is already exploring this avenue?

If you would like to examine the code for my LLDB port or try it out, you can get the code here:

For our debugging standard, we are currently trying to work out the best way to express the multiple address spaces of HSA in DWARF and get an idea of how this is handled by GDB and LLDB. Our current thinking is to use DW_AT_address_class for pointers types and DW_op_xdref for variables. I couldn’t see anywhere these are handled in LLDB; is there any ongoing work to add them? Or would you recommend a different approach?

Feel free to send me any queries or issues regarding the LLDB implementation or the HSA standards work.

Thanks,

Simon Brand