Has anyone already submitted a form for a Debug Info round table at the upcoming EuroLLVM conference?
If not and anyone is interested in attending one, I’m happy to do it. There are a few things I think would be good candidates to discuss to start with:
Removing debug intrinsics. Various subtopics include deprecation timeline, how this has impacted downstream projects etc.
Enabling encoding address spaces in LLVM debug info and DWARF (AMDGPU folks’ work, cc @slinder1)
The future of line tables. Attribution (of instructions to line numbers) & stepping (marking some lines attributed a line number as “uninteresting” from a debugger’s perspective). Consideration for profilers (better attribution means we probably need a way to signal to profiling tools to ignore particular lines), etc.
I’m personally curious about others’ usage of call site info and entry_value too.
Does anyone want to be put down as a contact/co-organizer for the round table (please provide an email address)? And would anyone like to volunteer to take notes?
Does anyone have anything else they’d like to discuss?
Can’t say I’ll be around for euroLLVM (I’ve never actually made it out for EuroLLVM - I really should some time… ) - look forward to reading teh notes, though
Thanks to everyone who attended. The session felt productive and… hopeful?
Special thanks to @jryans and @StephenTozer for taking notes (please do share those when you get the time - it hasn’t been very long, I just wanted to say I’m happy to help if they need sifting through).
It was great to see both familiar and new people in the debug info space this year! I have included my notes below. Please let me know if you spot anything that should be edited or clarified.
Attendees
J. Ryan Stinnett (King’s College London)
Orlando Cazalet-Hyams (SN Systems / Sony)
Greg Bedwell (SN Systems / Sony)
Stephen Livermore-Tozer (SN Systems / Sony)
Lukáš Korenčik (Trail of Bits)
Artem G (Intel)
Djordje Todorovic (Syrmia)
John R (VMS)
Alexis Engelke (Technical University of Munich)
Mohamed Ismail Bennani (Apple)
Michael Buch (Apple)
Adrian Prantl (Apple)
Tom J (Siemens)
Keith Walker (Arm)
Walter Erquinigo (Modular)
Billy Zhu (Modular)
Matt Arsenault (AMD)
Hans Wennborg (Google)
Reid Kleckner (Google)
Notes
OCH: Replacing debug intrinsics with debug records
Very close to turning this on everywhere
Speed increase at compile time
SLT: Working everywhere in optimisation and code time time
In the process of using records everywhere front to back
JR: Should I change to records in my frontend?
SLT: DIBuilder should handle this for you automatically
DT: Is there anything backend developers need to know?
OCH: If have a downstream code gen, then perhaps, but otherwise should be okay
SLT: Docs for new IR syntax
RK: Do you have a measurement to show compile time benefit?
SLT: Don’t have a full benchmark yet…
OCH: On the order of 5% compile time improvement
OCH: Should generally avoid cases of debug info changing code gen (in optimisation)
SLT: Don’t have an equivalent in debug records in MIR yet
GB: Used to have so many bug where debug info would affect code gen, but this really helps with a lot of them
SLT: Instruction referencing
MIR feature that changes how debug info works there
Instead of creating virtual register, we reference the instruction that produces that value
Turned on for a while for x86, seems to work well
Is it something people are interested in for other targets?
OCH: Open work for multiple variables
AP: Apple looking at porting this to AArch64
SLT: Mostly target independent, but may need to about specific value moving
AP: Effort to integrate CAS
Want to build caching compilers
Deduplicate data naturally and easily
Debug info contains tons of redundant data
Looking at partitioning debug info to expose redundancy
Found a scheme that doesn’t need much DWARF changes
Built drop-in replacement for ccache on top of this CAS, much better in terms of performance and cache size
TJ: CAS upstreaming progress
AP: Needs more people to review
AP: Very sure LLVM CAS will make it upstream, just a matter of working through the process
LLVM CAS is a framework that could be used for these caching and debug info features
OCH: Do you have any numbers?
AP: Could go back to previous talk, some numbers there
RK: Seems like great technology
Windows linkers uses CAS for deduplicating
DT: Anything specific to Swift?
AP: No, it’s actually Clang first
DT: Mojo presentation about MLIR debug info
RK: No value tracking in all MLIR…?
BZ: Yes, only in LLVM dialect for now, but could be core
WE: Want all compiler engineers to care debugging
RK: How does debug info work in MLIR?
BZ: Uses dbg.value, DIExpression for now
RK: Worried about duplication
SLT: On the contrary, seems a bit hopeful to find same answers showing up in both
JRS: Hopefully we can see more sharing between communities
SLT: Location views
Could be helpful for a variety of line coverage improvements
AP: What about cases where you have one location or the other, how do you visualise it?
SLT: Hoisting locations out of block, could encode instruction belong to several lines
Could be doing much better in terms of line info with more expressivity
OCT: DWARF expressivity and debugger visualisation are the main obstacles
KW: Two level line tables not on the DWARF backlog
SLT: No entirely surprised, really inflates the line tables
SLT: Location views should be enough with better conceptual design
AP: If LLVM supports locations views, no point if we don’t consume it
RK: Would love to have it for PGO, more applications than just debugging
AP: Either or example: handle it properly
Different cases: Hoisting from different blocks vs. multiple source lines merged into one instruction
SLT: Use LBR in DWARF expression to work out where you came from
OCT: Needs DWARF extension to say these things need relocation
SLT: Misplaced instructions
With speculation, instructions are hoisting out of block
Currently we drop debug info for line table correctness
Would like to extend line table to say misplaced to not confuse profiling, debugging
This would at least allow stepping
AP: Frontier definitely being pushed forward
SLT: Key instructions
Looking at using this to give real meaning to is_stmt flag
Find instructions that primarily produced the value
JR: Had this kind of “semantic” event in Alpha compiler
GB: What do debuggers do currently with is_stmt?
AP: Debuggers currently use complex heuristics
RK: Which line do you use?
SLT: Line that produces user-visible state change?
SLT: Really mean source coordinates (including column info) for sub-expressions where they also produce user-visible state changes
SLT: Og
Extend lifetimes feature
Inserts fake uses to keep variable alive for better debug info
Seems to work well
Decent pay off in terms of debug info gained vs. performance change
“O2g”
A mode like O2 with a few passes removed plus extend lifetimes
TJ: We’ll certainly use it
JR: We used to have something like this, and would use it too
Sony has exposed it downstream since ~2016 or so
WM: How do you measure?
SLT: We have Dexter tool to integration test
More lines covered, more entry value availability
GB: Ryan’s coverage tool could also measure
GB: Please add our O2g to your measurements as well