Approximating LLDB's breakpoint location decisions

Hi LLDB devs,

@pogo59 and I are putting together a tool for a hackathon that works out a similar statistic to llvm-locstats, which uses llvm-dwarfdump --statistics to calculate “%PC bytes in scope covered by variable locations”, except that we’re calculating “% of breakpoints in scope covered by variable locations”. Part of this involves knowing all the breakpoint locations in the program being measured. The tool has two modes: A) accept a list of pre-generated breakpoint locations, B) approximate breakpoint locations by looking at the line table.

For mode A, the list passed to the tool is assumed to be generated by a debugger. We’re (perhaps naively) using lldb -o "breakpoint set -p .* --all-files" -o "b" -o "quit" and just reading the addresses of the set breakpoints to build the list. For mode B we currently have this simple code to build the list of breakpoint locations:

  for (const DWARFDebugLine::Row &R : LT->Rows) {
    if (R.EndSequence)
      continue;
    if (R.IsStmt) {
      if (BreakpointLocations.empty() ||
          BreakpointLocations.back() != R.Address.Address)
        BreakpointLocations.push_back(R.Address.Address);
    }
  }

We’re seeing a 4x increase in breakpoint locations when using this method compared to mode A and are wondering what could be causing such a consistently large increase.

The code above clearly doesn’t account for prologue_end; it will “set breakpoints” in the prologues. LLDB is not setting breakpoints in prologues so that will account for some of the difference. But 4x seems like a lot so we figure there might be something else at play.

In addition, modifying this code to ensure that we only get one breakpoint per line has no effect on the total number of breakpoints.

Does anyone have any ideas for why the number of breakpoints set by our simple code is much higher than LLDB, or have any troubleshooting tips?

Many thanks,
Orlando

Aha - it appears that breakpoint set -p doesn’t set breakpoints in headers for whatever reason (I’m unsure if this is intended or a bug). Even specifying a header explicitly with -f <filename> rather than using --all-files, headers seem to be skipped.

This results in inline function and class method definitions in headers getting no breakpoints using that method!

I’ve got a work-around: use lldb --source cmds where cmds contains a list of b <file>:<line> commands, constructed by reading the file names from DWARF and adding a breakpoint for each line in [1, num_lines_in_file].

There’s still a 25%-ish difference in breakpoint count between using lldb and reading the line-table directly to guess breakpoint locations, but at that scale I’m more comfortable with the idea that it’s probably just down to breakpoints being set in prologues in the latter.

I’m fairly happy that I’ve answered my original question. But this approach isn’t viable for large programs as it is Very slow.

Could anyone suggest a less-hacky (and faster) way to generate a list of all locations that lldb would set breakpoints for a given program?

Note that lldb’s file and line breakpoint resolver’s goal is to make useful breakpoints for a given source entity. Setting a location on each entry that matches the given file & line in the line table does not achieve that purpose. For instance, it’s not uncommon for there to be multiple contiguous line entries for the same line, and usually if lldb were to stop at each of them it would be annoying and unhelpful. There are other slightly less trivial instances of this same problem. So to that end, lldb coalesces locations when two line table entries are in the same block, and does a bunch of other heuristic tricks to try to avoid making “too many” stops for a given source entry. This logic is fairly complicated and changes (including just recently) as we discover better heuristics.

So if you want to track all the possible source locations, lldb’s file and line breakpoints are not the right tool. It is explicitly trying not to do that… But if you want to know all the places lldb would actually set breakpoint locations, you’re probably better off asking lldb rather than trying to reimplement the resolver’s logic.

Jim

Hi @jingham, thanks for the response!

But if you want to know all the places lldb would actually set breakpoint locations, you’re probably better off asking lldb

Do you have any tips on how to achieve this? I tried a few different approaches (mentioned in comment 1) but ran into trouble.

The “trouble” I remember is that for a large binary, with Nx100k breakpoints, it took many hours to finish. Were there other issues that I have blotted from my memory?

Now that we’ve identified headers as the primary cause of the discrepancy between what lldb reported and what’s in the line table, I’m not fussed about the remaining difference. If those are the breakpoints that lldb sets, then those are the points of interest for coverage statistics (at least related to lldb).