DWARF debug line error handling changes

Hi all,

Since December, I’ve made several changes to the DWARF debug line parser to improve its error handling. Some examples include removing redundant error checks[1] and making address size mismatches non-fatal to parsing[2], with several more about to land or being planned.

David Blaikie recommended I post something to give a bit more context to these changes. I am a member of Sony’s Linker and Binutils team, and as part of the linking process, we make updates to the debug line section in order to handle GC-section processing and similar modifications. In order for these operations to be safe, we need the input line program to be in a good state. As a result, in addition to the existing LLVM error checks, we have added several additional downstream checks. These were added downstream at the time for speed, but we’ve always had the intention of bringing these to the wider community. We now have the time to do this.

There are broadly-speaking two kinds of changes we are making:

  1. Adding additional new checks to make sure the table is valid. Where possible, these are made as late as reasonable. That way, we don’t emit errors or warnings until it is necessary, and potentially therefore not at all in some cases.

  2. Making existing error checks less fatal, i.e. changing the code to allow the parser to continue even though something doesn’t look right in the code. This allows consumers to gather more information, whether for displaying or otherwise. In several cases, the change is to assume that the length recorded in a field is correct, even if it doesn’t match what either the standard says should be the length. This won’t always be the right thing to do (it might be that just the length field is correct), but providing more information, together with a warning that should indicate that something may not be quite right, is surely more useful than refusing to give the extra information. I picked the recorded length because other areas of the code already assume it to be correct, and use it to iterate onto other parts of the line table section.

I hope this provides greater context for these changes!

Regards,

James

[1] https://github.com/llvm/llvm-project/commit/418cd8216b41f4c08e0e1b22feda381d9b2345da
[2] https://github.com/llvm/llvm-project/commit/07804f75a6cc506fada40c474f1e60840ce737d8

Hi all,

Since December, I’ve made several changes to the DWARF debug line parser to improve its error handling. Some examples include removing redundant error checks[1] and making address size mismatches non-fatal to parsing[2], with several more about to land or being planned.

David Blaikie recommended I post something to give a bit more context to these changes. I am a member of Sony’s Linker and Binutils team, and as part of the linking process, we make updates to the debug line section in order to handle GC-section processing and similar modifications. In order for these operations to be safe, we need the input line program to be in a good state. As a result, in addition to the existing LLVM error checks, we have added several additional downstream checks. These were added downstream at the time for speed, but we’ve always had the intention of bringing these to the wider community. We now have the time to do this.

There are broadly-speaking two kinds of changes we are making:

  1. Adding additional new checks to make sure the table is valid. Where possible, these are made as late as reasonable. That way, we don’t emit errors or warnings until it is necessary, and potentially therefore not at all in some cases.

  2. Making existing error checks less fatal, i.e. changing the code to allow the parser to continue even though something doesn’t look right in the code.

Thanks for all you work. I’m sure that the additional error handling and error resilience benefits other clients such as LLDB which also have to deal with inputs from a variety of producers.

This allows consumers to gather more information, whether for displaying or otherwise. In several cases, the change is to assume that the length recorded in a field is correct, even if it doesn’t match what either the standard says should be the length. This won’t always be the right thing to do (it might be that just the length field is correct), but providing more information, together with a warning that should indicate that something may not be quite right, is surely more useful than refusing to give the extra information. I picked the recorded length because other areas of the code already assume it to be correct, and use it to iterate onto other parts of the line table section.

This is something we’ll need to decide on a case-by-case basis. If there are producers out there with known bugs that we want to support, this may be the right call. Generally though, I’d always prefer to err on the side of correctness over feeding potentially wrong or misleading information to the user. But let’s discuss this separately for each case.

– adrian

Thanks for all the work & context, James!

Hi all,

Since December, I’ve made several changes to the DWARF debug line parser to improve its error handling. Some examples include removing redundant error checks[1] and making address size mismatches non-fatal to parsing[2], with several more about to land or being planned.

David Blaikie recommended I post something to give a bit more context to these changes. I am a member of Sony’s Linker and Binutils team, and as part of the linking process, we make updates to the debug line section in order to handle GC-section processing and similar modifications. In order for these operations to be safe, we need the input line program to be in a good state. As a result, in addition to the existing LLVM error checks, we have added several additional downstream checks. These were added downstream at the time for speed, but we’ve always had the intention of bringing these to the wider community. We now have the time to do this.

There are broadly-speaking two kinds of changes we are making:

  1. Adding additional new checks to make sure the table is valid. Where possible, these are made as late as reasonable. That way, we don’t emit errors or warnings until it is necessary, and potentially therefore not at all in some cases.

  2. Making existing error checks less fatal, i.e. changing the code to allow the parser to continue even though something doesn’t look right in the code. This allows consumers to gather more information, whether for displaying or otherwise. In several cases, the change is to assume that the length recorded in a field is correct, even if it doesn’t match what either the standard says should be the length.

If the standard conflicts with the length when we read the length (without reading further) - yeah, I think we should fail/warn/something there. (as I think we do - if you have a length that’s not long enough for the header we’re expecting to read, for instance, etc) & if that means this contribution can’t be parsed (eg: header’s truncated) we treat it that way (this contribution is broken) & can still go on to the next contribution and read that.

This won’t always be the right thing to do (it might be that just the length field is correct), but providing more information, together with a warning that should indicate that something may not be quite right, is surely more useful than refusing to give the extra information. I picked the recorded length because other areas of the code already assume it to be correct, and use it to iterate onto other parts of the line table section.

Hmm, I thought I saw somewhere (in the many active reviews at the moment, some from Sony, some from some other folks - haven’t got the full picture myself) I thought someone, maybe Paul I thought, said something about stopping/not continuing when we found an error in a previous contribution - which doesn’t sound right (& is different from what you’ve just said here). Ah well, maybe I’m hallucinating.

It might be interesting to have some harder errors if contributions ever overlap (if two CUs point to two different parts of debug_line, for instance, and one of those pointers+parsed length overlaps with the other pointer). That’s a bad place to be and may mean we should give up on parsing any of debug_line, or at least any unit that has such an overlap - but harder to detect.

The alternative, I suppose is to go the other way (& that’s something that might be good/important to do at some point) & not parse any contributions that aren’t pointed to by a CU (well, debug_line in DWARFv5 is designed to be parsed standalone too - and maybe everything in DWARFv5 is designed to be parsed standalone, but we don’t know whether a section contains only v5 contributions (debug_line maybe should’ve changed name so we’d know it only contained v5 contributions - like loclists and rnglists - ah well)… )

Just rambling at this point.

  • Dave

Hi all,

Since December, I’ve made several changes to the DWARF debug line parser to improve its error handling. Some examples include removing redundant error checks[1] and making address size mismatches non-fatal to parsing[2], with several more about to land or being planned.

David Blaikie recommended I post something to give a bit more context to these changes. I am a member of Sony’s Linker and Binutils team, and as part of the linking process, we make updates to the debug line section in order to handle GC-section processing and similar modifications. In order for these operations to be safe, we need the input line program to be in a good state. As a result, in addition to the existing LLVM error checks, we have added several additional downstream checks. These were added downstream at the time for speed, but we’ve always had the intention of bringing these to the wider community. We now have the time to do this.

There are broadly-speaking two kinds of changes we are making:

  1. Adding additional new checks to make sure the table is valid. Where possible, these are made as late as reasonable. That way, we don’t emit errors or warnings until it is necessary, and potentially therefore not at all in some cases.

  2. Making existing error checks less fatal, i.e. changing the code to allow the parser to continue even though something doesn’t look right in the code.

Thanks for all you work. I’m sure that the additional error handling and error resilience benefits other clients such as LLDB which also have to deal with inputs from a variety of producers.

This allows consumers to gather more information, whether for displaying or otherwise. In several cases, the change is to assume that the length recorded in a field is correct, even if it doesn’t match what either the standard says should be the length. This won’t always be the right thing to do (it might be that just the length field is correct), but providing more information, together with a warning that should indicate that something may not be quite right, is surely more useful than refusing to give the extra information. I picked the recorded length because other areas of the code already assume it to be correct, and use it to iterate onto other parts of the line table section.

This is something we’ll need to decide on a case-by-case basis. If there are producers out there with known bugs that we want to support, this may be the right call. Generally though, I’d always prefer to err on the side of correctness over feeding potentially wrong or misleading information to the user. But let’s discuss this separately for each case.

– adrian

I agree, a case-by-case basis is perhaps correct. My thinking is that clients that care about correctness over providing a best guess can change the error callbacks to ignore data from a line table/section where something has gone wrong. Consequently, the parser should provide as much info as it could possibly get (some of which might be wrong), as long as it uses one of the callbacks appropriately when something looks suspicious.

Thanks for all the work & context, James!

Thanks for the reviews!

Hi all,

Since December, I’ve made several changes to the DWARF debug line parser to improve its error handling. Some examples include removing redundant error checks[1] and making address size mismatches non-fatal to parsing[2], with several more about to land or being planned.

David Blaikie recommended I post something to give a bit more context to these changes. I am a member of Sony’s Linker and Binutils team, and as part of the linking process, we make updates to the debug line section in order to handle GC-section processing and similar modifications. In order for these operations to be safe, we need the input line program to be in a good state. As a result, in addition to the existing LLVM error checks, we have added several additional downstream checks. These were added downstream at the time for speed, but we’ve always had the intention of bringing these to the wider community. We now have the time to do this.

There are broadly-speaking two kinds of changes we are making:

  1. Adding additional new checks to make sure the table is valid. Where possible, these are made as late as reasonable. That way, we don’t emit errors or warnings until it is necessary, and potentially therefore not at all in some cases.

  2. Making existing error checks less fatal, i.e. changing the code to allow the parser to continue even though something doesn’t look right in the code. This allows consumers to gather more information, whether for displaying or otherwise. In several cases, the change is to assume that the length recorded in a field is correct, even if it doesn’t match what either the standard says should be the length.

If the standard conflicts with the length when we read the length (without reading further) - yeah, I think we should fail/warn/something there. (as I think we do - if you have a length that’s not long enough for the header we’re expecting to read, for instance, etc) & if that means this contribution can’t be parsed (eg: header’s truncated) we treat it that way (this contribution is broken) & can still go on to the next contribution and read that.

I think the code now uses the error callbacks for all potential length conflicts (or will do after I land some more warnings - not checked the current state). By default, this results in warnings, but clients can do extra stuff like ignoring the table and then continuing from the next. With my latest changes, the parser continues reading the header until the actual end for the file format, even if that goes beyond the claimed header end, reporting a warning if it does so, and filling in zeroes if it runs out of data, as per the DataExtractor API. The parser then starts parsing the body from the claimed header end until the unit length is reached. This provides us with maximum possible information (useful for dumpers), but still with the ability to stop (useful for things where correctness is more important).

This won’t always be the right thing to do (it might be that just the length field is correct), but providing more information, together with a warning that should indicate that something may not be quite right, is surely more useful than refusing to give the extra information. I picked the recorded length because other areas of the code already assume it to be correct, and use it to iterate onto other parts of the line table section.

Hmm, I thought I saw somewhere (in the many active reviews at the moment, some from Sony, some from some other folks - haven’t got the full picture myself) I thought someone, maybe Paul I thought, said something about stopping/not continuing when we found an error in a previous contribution - which doesn’t sound right (& is different from what you’ve just said here). Ah well, maybe I’m hallucinating.

Paul actually said in D71255 that we try to continue from the next contribution, although I did previously make an incorrect statement about length errors preventing this from happening. That being said, it is entirely up to the client whether to continue or not to the next contribution if they see something bad.

It might be interesting to have some harder errors if contributions ever overlap (if two CUs point to two different parts of debug_line, for instance, and one of those pointers+parsed length overlaps with the other pointer). That’s a bad place to be and may mean we should give up on parsing any of debug_line, or at least any unit that has such an overlap - but harder to detect.

This sounds like it’s a little outside the scope of the low-level debug line parser, and more something that belongs in DWARFContext or the verifier. That being said, some sort of warning would make sense (it would take some pretty crafty line table building to get two valid line tables that overlap).

The alternative, I suppose is to go the other way (& that’s something that might be good/important to do at some point) & not parse any contributions that aren’t pointed to by a CU (well, debug_line in DWARFv5 is designed to be parsed standalone too - and maybe everything in DWARFv5 is designed to be parsed standalone, but we don’t know whether a section contains only v5 contributions (debug_line maybe should’ve changed name so we’d know it only contained v5 contributions - like loclists and rnglists - ah well)… )

I think in some contexts, maybe it is worth only parsing contributions pointed to by CUs, but again, this would be a higher-level decision. Indeed, Sony’s linker modifications don’t parse the .debug_info at all, only the .debug_line, so this wouldn’t work for us.

Thanks for all the work & context, James!

Thanks for the reviews!

Hi all,

Since December, I’ve made several changes to the DWARF debug line parser to improve its error handling. Some examples include removing redundant error checks[1] and making address size mismatches non-fatal to parsing[2], with several more about to land or being planned.

David Blaikie recommended I post something to give a bit more context to these changes. I am a member of Sony’s Linker and Binutils team, and as part of the linking process, we make updates to the debug line section in order to handle GC-section processing and similar modifications. In order for these operations to be safe, we need the input line program to be in a good state. As a result, in addition to the existing LLVM error checks, we have added several additional downstream checks. These were added downstream at the time for speed, but we’ve always had the intention of bringing these to the wider community. We now have the time to do this.

There are broadly-speaking two kinds of changes we are making:

  1. Adding additional new checks to make sure the table is valid. Where possible, these are made as late as reasonable. That way, we don’t emit errors or warnings until it is necessary, and potentially therefore not at all in some cases.

  2. Making existing error checks less fatal, i.e. changing the code to allow the parser to continue even though something doesn’t look right in the code. This allows consumers to gather more information, whether for displaying or otherwise. In several cases, the change is to assume that the length recorded in a field is correct, even if it doesn’t match what either the standard says should be the length.

If the standard conflicts with the length when we read the length (without reading further) - yeah, I think we should fail/warn/something there. (as I think we do - if you have a length that’s not long enough for the header we’re expecting to read, for instance, etc) & if that means this contribution can’t be parsed (eg: header’s truncated) we treat it that way (this contribution is broken) & can still go on to the next contribution and read that.

I think the code now uses the error callbacks for all potential length conflicts (or will do after I land some more warnings - not checked the current state). By default, this results in warnings, but clients can do extra stuff like ignoring the table and then continuing from the next. With my latest changes, the parser continues reading the header until the actual end for the file format, even if that goes beyond the claimed header end, reporting a warning if it does so,

Oh… which change caused that to happen? Because I don’t think that should happen. I’d really much prefer that once the length is read and it’s checked to be in-bounds of the underlying section, that only those bytes within that specified length are read. I was/am actually looking at tidying up the repeated complications of parsing the DWARF32/DWARF64 headers - and one idea I had was to have the DWARFDataExtractor return a new DWARFDataExtractor that was specifically bounded to the parsed length for this reason.

and filling in zeroes if it runs out of data, as per the DataExtractor API. The parser then starts parsing the body from the claimed header end until the unit length is reached. This provides us with maximum possible information (useful for dumpers), but still with the ability to stop (useful for things where correctness is more important).

This won’t always be the right thing to do (it might be that just the length field is correct), but providing more information, together with a warning that should indicate that something may not be quite right, is surely more useful than refusing to give the extra information. I picked the recorded length because other areas of the code already assume it to be correct, and use it to iterate onto other parts of the line table section.

Hmm, I thought I saw somewhere (in the many active reviews at the moment, some from Sony, some from some other folks - haven’t got the full picture myself) I thought someone, maybe Paul I thought, said something about stopping/not continuing when we found an error in a previous contribution - which doesn’t sound right (& is different from what you’ve just said here). Ah well, maybe I’m hallucinating.

Paul actually said in D71255 that we try to continue from the next contribution, although I did previously make an incorrect statement about length errors preventing this from happening. That being said, it is entirely up to the client whether to continue or not to the next contribution if they see something bad.

It might be interesting to have some harder errors if contributions ever overlap (if two CUs point to two different parts of debug_line, for instance, and one of those pointers+parsed length overlaps with the other pointer). That’s a bad place to be and may mean we should give up on parsing any of debug_line, or at least any unit that has such an overlap - but harder to detect.

This sounds like it’s a little outside the scope of the low-level debug line parser, and more something that belongs in DWARFContext or the verifier. That being said, some sort of warning would make sense (it would take some pretty crafty line table building to get two valid line tables that overlap).

The alternative, I suppose is to go the other way (& that’s something that might be good/important to do at some point) & not parse any contributions that aren’t pointed to by a CU (well, debug_line in DWARFv5 is designed to be parsed standalone too - and maybe everything in DWARFv5 is designed to be parsed standalone, but we don’t know whether a section contains only v5 contributions (debug_line maybe should’ve changed name so we’d know it only contained v5 contributions - like loclists and rnglists - ah well)… )

I think in some contexts, maybe it is worth only parsing contributions pointed to by CUs, but again, this would be a higher-level decision. Indeed, Sony’s linker modifications don’t parse the .debug_info at all, only the .debug_line, so this wouldn’t work for us.

nod figured that might be the case - good to know!

I think this would be great. In fact, I was getting ready to propose
something like that myself. :slight_smile:

FWIW, lldb DataExtractors already support this kind of slicing.

pl

Actually, I had a slightly faulty memory. To a large extent, the behaviour has been for a while to parse until the end of the header as per the standard format. There is then a check once parsing is finished to make sure the reached offset matches the expected end. Recently, I removed[1] a few extra checks in the V5 file/directory table parsing that checked that we hadn’t run off the end, as the later check makes them unnecessary. However, these were the only other checks to check against the prologue length.

I think your suggestion to use a data extractor that only knows about the bytes we expect is a good idea. Combined with the changes that made it return 0 when running off the end, this will fit nicely into the rest of the code without worrying about reading bytes in other tables. The reading would continue as far as needed, the error check would still happen, but no spurious bytes would be read.

[1] https://github.com/llvm/llvm-project/commit/418cd8216b41f4c08e0e1b22feda381d9b2345da

Oh, great - I’m sort of inundated with code reviews and things these days - so could I prevail upon you to implement that & pick up some of these line table reviews to either coax the use of such an entity there, or port a few places (at least all the DWARF32/64 length parsing duplicates - there’s 3-4 copies of the code that does the 32/64 length dance and validates the length of a header or similar in there)?

Tagged you in a few reviews (some committed already/approved - but places that could benefit from cleanup with such a device) https://reviews.llvm.org/D72155 https://reviews.llvm.org/D73618 https://reviews.llvm.org/D72900

Thanks.

Well, I am afraid in this case, "getting ready" is a pretty long-term
process :/, and I don't think I'll be able to get to this next week,
because I'll be travelling.

But I think I should be able to squeeze that in the week after that.

pl

I don’t think it’s vitally pressing or anything - just trying to delegate some of the deluge of code review/coaxing all these DWARF changes (at least the line table improvements and the DWARF64 improvements - and I think there’s one other group/project happening too? (DWARF5 features like the defaulted template parameters, implicit_pointer, etc))