I’m preparing to propose some additional error checks for the DWARF debug line parser which we have locally. However, there have been some valid comments in the past that some errors really shouldn’t prevent parsing for certain situations. For example, a line_range of 0 would only be an issue if anything needed that information (and would consequently divide by zero), whilst the include directories and file names table should be terminated with null bytes, but the parser could use the header length field to identify when the tables have ended instead.
After a brief discussion with Paul Robinson offline, I think it would be good to add different parsing modes to the debug line parser. I’ve got three possible categories:
“Dump” mode - this would be the most lenient mode. It is intended for tools like llvm-dwarfdump that merely want to print the contents to the best of their ability, even if the table is malformed/dodgy in some other way. It might print warnings, but these should not prevent it continuing to parse what it can.
“Strict” or “Verify” mode - this would be the strictest mode, which would emit an error if it sees things like the aforementioned bad line_range or filenames tables. Optionally, we could even extend this to also cover things like addresses that don’t make sense (possibly with or without special handling for fixups for GC-sections-processed output), or that could be a further mode.
“Consumer” mode - this would be somewhere in between the two above modes and would essentially do whatever a consumer such as LLDB wants. It might not need to be as strict as the “Strict” mode, but some guidance here from the consumers would be best.
I haven’t yet worked out exactly how this would work, but I think it would probably fit into the existing error handling mechanism, either with some extra conditionals or similar that say whether to report an error or not. I will hopefully be putting up some proposed patches later this week that illustrate all this.
I’d appreciate any thoughts people have on the different modes, whether we would need more/fewer, what they’d each cover etc.
(mailing lists can be busy - best to include any known-relevant parties on the ‘to’ line to help highlight the discussion for them)
My take on it: I’d be OK with only a strict mode with good error messages lazily created. (so you don’t get an error if you never needed to parse/lookup a file name, for instance - don’t want the strict mode to be inefficient by aggressively parsing portions of the file you don’t end up needing) If there are cases where consumers just need to be able to read invalid DWARF, yeah, could jump that hurdle when it comes up in a few ways I guess.
This sounds really interesting. A few things that come to mind:
- I'm curious what kind of errors you'd be okay with in dump mode but
not in consumer mode. From LLDB's perspective, we probably want to
extract as much information as possible from a malformed line table,
as long as we don't end up lying of course.
- I need to take another look at the code, but don't we already have
something similar where the consumer can decide how to deal with these
kinds of issues? I'm bringing this up because I don't think we really
need different parsing modes. I think we need to let the consumer
decide what to do with the potential errors. The verifier in dwarfdump
would presumably stop after the first error (~strict mode) while the
dumper would just move on. Would the parsing modes then be a set of
"presets" with regards to how to handle these issues?
- I'm in favor of creating these error messages lazily, especially
for LLDB where we care about responsiveness. However, that does
conflict with the behavior you want for the DWARF verifier. We
probably want a way to control this?
This sounds really interesting. A few things that come to mind:
I’m curious what kind of errors you’d be okay with in dump mode but
not in consumer mode. From LLDB’s perspective, we probably want to
extract as much information as possible from a malformed line table,
as long as we don’t end up lying of course.
I need to take another look at the code, but don’t we already have
something similar where the consumer can decide how to deal with these
kinds of issues? I’m bringing this up because I don’t think we really
need different parsing modes. I think we need to let the consumer
decide what to do with the potential errors. The verifier in dwarfdump
would presumably stop after the first error (~strict mode) while the
dumper would just move on. Would the parsing modes then be a set of
“presets” with regards to how to handle these issues?
I’m in favor of creating these error messages lazily, especially
for LLDB where we care about responsiveness. However, that does
conflict with the behavior you want for the DWARF verifier. We
probably want a way to control this?
For my part - I’d imagine the verifier would be an aggressive reader of DWARF - it’d use the same lazy/high quality error API, but the verifier code itself would try to walk all of the parts of the DWARF to manifest any lazy error cases, rather than needing any other codepath in the parser.
That would be my ideal setup as well (disclaimer: I work on lldb) --
have the parser provide sufficient information so that the verification
can be done from the "outside".
That is, if the goal is to have stronger verification of generated line
tables -- it's not fully clear (to me) whether that's your main
motivation for adding these checks.
Thanks for the comments. I should have done a bit more experimenting before writing up this RFC, it seems After sending it off, I played around with the existing code, tor refresh my memory of what I did a while back in adding the Recoverable/Unrecoverable Error callbacks. These are written with the intent that the client could decide how to handle errors, as Jonas points out, making the additional strictness level a little unnecessary. Unrecoverable errors stop the parsing, whereas recoverable allow it to continue as best it can. I think the current parser is stricter than it needs to be to provide useful information, so I’m going to put a patch up soon to loosen it up a bit before I start adding more checks.
The Errors reported by the parser only contain strings, which means that it’s difficult for a client to use them in some other way, for example to identify issues that it might need to know/completely ignore etc. Short of adding extra data to the Error, such as an enum, it’s unclear to me how to improve that situation. Of course, if we did add the extra data, it would allow LLDB and other clients to decide what errors to ignore/pay attention to on a finer grained basis, which partially achieves the lazy creation goal (only report the problems that apply to the things the client cares about), but doesn’t improve responsiveness since the whole program will still be parsed.
Lazy creation of errors is tricky, at least for some cases, given the current code state. The problem is that the parser reads the name table, calculates the rows etc as it parses a program, and that’s the only way of getting this data through the current interface. The only laziness available as things stand is to skip line tables that we don’t want (e.g. to find the line table at a given offset). This ignores the body of the skipped programs. The alternative is to refactor the parser completely into much smaller pieces, which provide access to the desired bits individually, and delays reading until that point. I can definitely see benefits in this approach, and I think it would solve the needs of things at both ends of the scale (verifiers and dumpers) but it’s a lot more work, which I’m not convinced I personally can take on, although I’d be happy to review it if others fancy attempting that option.
Having recently done something similar for .debug_loc(lists) (a much
simpler data structure) I can certainly appreciate the complexity and
commitment necessary to pull something like that off.
Anyway, I just wanted to say that I think that adding new fields to the
returned Errors (or even creating completely new ErrorInfo types) seems
perfectly reasonable to me -- I think that's one of the main reasons for
introducing the Error class in the first place.