[Bug 15038] New: LLDB does not support printing wide-character variables on Linux

http://llvm.org/bugs/show_bug.cgi?id=15038

             Bug #: 15038
           Summary: LLDB does not support printing wide-character
                    variables on Linux
           Product: lldb
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: All Bugs
        AssignedTo: lldb-dev@cs.uiuc.edu
        ReportedBy: daniel.malea@intel.com
    Classification: Unclassified

Printing a variable of wchar_t type does not behave as expected and results in
garbage being printed on the screen.

To reproduce, remove the @expectedFailureLinux decorator from TestChar1632T.py
and TestCxxWCharT.py and run:

python dotest.py --executable <path-to-lldb> lang/cpp/char1632_t
lang/cpp/wchar_t

The root cause for Bug 15038 is a ptrace EIO that can occur because we don't know the size of a (UTF) string and so read a fixed number of characters for strings. The attached fix accepts EIO in the case where data has been read and a null terminator of the correct size and alignment was found.

- Ashok

read-strings.diff (3.6 KB)

Hi Ashok,
thanks for submitting an LLDB patch :slight_smile:

I have looked at the code and while I see where the patch is coming from, I see a potential issue with it.

The code in ReadUTFBufferAndDumpToStream, which is what you are editing, is meant to work on a partial string (i.e. one that does not have a NULL terminator at the end). The reason for that is that you might have a wstring that is 1GB long and we would not want to try and read all of it and then display it. What we do is pick a size and only extract that much data. For obvious reasons, your string might be longer than our upper boundary, so you would get a chunk of valid bytes and then no end-of-buffer marker.

It looks like your code would fail in that case and produce no summary for a string if an EIO was received before \0.

Is there any reason why you can’t just check if (error == EIO and data_read > 0) and if so treat this as a “partial string” condition and keep going?
Would that break/crash anything?

Best,

Enrico Granata
:email: egranata@.com
✆ 27683

Thanks for the feedback, Enrico,

I’m relying on the fact that when the wstring is larger than the default size (1GB in your example), we won’t get a ptrace error like EIO or EFAULT. So, there will be no search for a null terminator.

If we can’t rely on the null terminator, it’s hard to know if a different error occurred. The trouble is that there is inconsistency in the PTRACE_PEEK implementation on Linux that is more or less arbitrary (sigh), so the out-of-bounds read can generate EIO or EFAULT. In particular, EIO can mean a number of things:

request is invalid, or an attempt was made to read from or write to an invalid area in the parent’s or child’s memory, or there was a word-alignment violation, or an invalid signal was specified during a restart request.” – http://linux.die.net/man/2/ptrace

In addition, the Linux ProcessMonitor reads in 8-byte chunks, so ReadUTFBufferAndDumpToStream will probably fail (for the right reasons) in the case where at least part of the null terminator was in that block. I see that as an implementation detail that can be resolved in ProcessMonitor by reading smaller chunks until PTRACE_PEEK succeeds (i.e. a good follow-up patch).

Do let me know if you feel this is okay to commit as is. Cheers,

  • Ashok

Hi,
now I pretty much see where you are coming from.
What we have here is a Linux-specific implementation issue (which of course is not going to get fixed anytime soon :slight_smile:

What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.
If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”
I would prefer, if we do this, that at least we put the code around #if defined (__linux__), as in:
     if (error.Fail() || data_read == 0)
     {
+ bool found = false;
#if defined (__linux__)
+ if (data_read && error.GetType() == eErrorTypePOSIX) {
+ // Determine if a null terminator was found in spite of an out-of-bounds read
+ if (error.GetError() == EIO || error.GetError() == EFAULT) {
+ char* buffer = reinterpret_cast<char *>(buffer_sp->GetBytes());
+ char terminator[4] = {'\0', '\0', '\0', '\0'};
+ assert(sizeof(terminator) >= type_size && "Atempting to read a wchar with more than 4 bytes per character!");

Hi Enrico,

Ø If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”

I see. Looking through the code base for methods that call GetMaximumSizeOfStringSummary, I see that there are a number of code paths to handle. The attached patch lowers the fix so that this functionality can be used as needed via a helper method in Process to validate an unbounded read.

Ø But I am inclined to believe the best option would be to adopt your ProcessMonitor fix that reads smaller and smaller chunks and then returns the number of bytes read and no error

Well, the unbounded read is a characteristic of the use case (i.e. string reads), not the implementation. The behavior of the plug-in under this type of request is implementation specific, and an error is always reasonable since there was a failure to read the requested size in bytes. My point is that the corner case of null terminator in the last 8-byte read needs to be tested and fixed.

Ø What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.

Ø I would prefer, if we do this, that at least we put the code around #if defined (linux), as in:

Ø What we have here is a Linux-specific implementation issue

Note that the issue can appear with any ptrace implementation. So, I lowered the fix to the ProcessPOSIX plugin, which is consistent with the case of eErrorTypePOSIX.

Thanks again for the feedback,

  • Ashok

read-strings-2.diff (6.78 KB)

Hi Enrico,

Ø If we go this route, the same thing might have to be done for ValueObject::ReadPointedString() which does the same kind of “unbounded read”
I see. Looking through the code base for methods that call GetMaximumSizeOfStringSummary, I see that there are a number of code paths to handle. The attached patch lowers the fix so that this functionality can be used as needed via a helper method in Process to validate an unbounded read.

We already have a Process::ReadCStringFromMemory - that does read&validation
You might instead want to consider making a Process::ReadWStringFromMemory
The mere act of validating that a buffer is 0-terminated is not a Process specific operation at all

Ø But I am inclined to believe the best option would be to adopt your ProcessMonitor fix that reads smaller and smaller chunks and then returns the number of bytes read and no error
Well, the unbounded read is a characteristic of the use case (i.e. string reads), not the implementation. The behavior of the plug-in under this type of request is implementation specific, and an error is always reasonable since there was a failure to read the requested size in bytes. My point is that the corner case of null terminator in the last 8-byte read needs to be tested and fixed.

Ø What troubles me a little with the overall idea of this patch is that we are putting a workaround for a lower-level issue in higher-level code.
Ø I would prefer, if we do this, that at least we put the code around #if defined (__linux__), as in:
Ø What we have here is a Linux-specific implementation issue
Note that the issue can appear with any ptrace implementation. So, I lowered the fix to the ProcessPOSIX plugin, which is consistent with the case of eErrorTypePOSIX.

I don’t think that is a Process responsibility to validate a buffer
If you want to make it a new type of request, ReadWString, and pass it a size so it knows how many consecutive bytes must be 0 to mean EndOfData, that would be fine.
It would look like
size_t Process::ReadWStringFromInferior(const void* buffer, size_t max_size, Error& error, size_t sizeofitem);
and there you could check the end-of-string
Then the WString data formatter can be moved to use this new read request, but you will have to ensure that partial strings are accepted :slight_smile:

Thanks,
Enrico Granata
:email: egranata@.com
✆ 27683

Hi Enrico,

Your suggestions got me thinking that the version of ReadCStringFromMemory that takes a char* can be extended to support null terminators of any width. The attached patch is therefore a variation on what you suggested that consolidates the read and validation into one method, and handles strings of any type width. In addition, this implementation provides a null terminator when a partial read occurs.

Incidentally, in the old implementation, strlen would always succeed on the first call, because curr_dst is set to dst, which is memset to 0. The current implementation will scan for a null terminator within bytes_read and stop reading when one is found.

If this is the right approach, let me know if I should modify SBProcess to match (i.e to add the new parameter type_width). Also, I noticed that this code is largely duplicated in methods of the same name in class Target. Can I assume that a complete patch would include changes to all of those methods? Thanks in advance,

  • Ashok

read-strings-4.diff (8.07 KB)

Ping!

A recent change for the LLDB community are buildbots for Linux using clang and gcc. This has been a quiet change and I suspect that the status isn’t being monitored by most of the community. Consistent with llvm and clang, I’d like to make the case for working together to keep the buildbots green. Specifically, to build the case for a policy to fix or revert commits that break the Linux buildbots.

First, test coverage is getting much closer to Darwin for the x86-64 targets using clang and gcc. In addition, the test decorators for skipOnLinux, expectedFailureLinux and expectedFailureGcc are cross-referenced with bugzilla tickets (i.e. to help distinguish between new failures and known failures).

As the new buildbots trigger on llvm, clang and lldb commits, they provide a useful tool to root cause regressions when llvm/clang is out of step with lldb (i.e. changes to DWARF or AST record layout). Historically these have been time-consuming problems that affect many tests at once.

Given monitoring and community support, these improvements make it possible to encourage adoption of Linux LLDB packages on Debian, Ubuntu, Arch and Solus. As the current packages are built from trunk, stability is a current barrier to adoption. This point also suggests the need for wider pre-commit testing of common code on Linux.

In turn, we would also benefit from a stable Darwin buildbot. Thoughts?

  • Ashok Thirumurthi

Intel of Canada

Hey there,
sorry for the delay. I have been busy over the last couple of days with some high priority work.
I have discussed this idea with a coworker. Our idea is that your “extended reader” is a good idea.
What we would prefer to see is committing it as a new function (e.g. Process::ReadNullTerminatedStringFromMemory) - that would allow an experimentation phase where we can use the old and new function to make sure things work correctly with it. Eventually, once we are all fully satisfied with the new API, the existing CString reader would simply become a wrapper for the NullTerminated reader passing a size of 1.
Feel free to change your patch in this way and check that in.We will then take it from there.

Thanks,
Enrico Granata
:email: egranata@.com
✆ 27683

FAIL: LLDB :: (TestCPPStaticMethods.py)

   AssertionError: False is not True : Command 'expression -- A::getStaticValue()' returns successfully

It’s possible that the above regression triggered a buildbot e-mail that identifies the offending revision. So far, my bisection shows that the regression was introduced after r178608, and before r178794, but it would be handy to have a single commit to look for in case that’s handy in an undisclosed inbox. Thanks,

- Ashok

Ø Feel free to change your patch in this way and check that in. We will then take it from there.

Now that the Linux buildbots provide some test coverage, I committed ReadStringFromMemory in r179857 and included a comment for the deprecated sibling. I hooked up the new API to ReadUTFBufferAndDumpToStream, which fixes wchar failures on Linux on the buildbots. Once stable, I assume we want a carbon copy in Target. Cheers,

  • Ashok

Ashok Thirumurthi changed bug 15038

What Removed Added
Status NEW RESOLVED
CC ashok.thirumurthi@intel.com
Resolution FIXED

Comment # 2 on bug 15038 from Ashok Thirumurthi

The attempt to perform an unbounded string read ran up against ptrace EIO
errors when the PTRACE_PEEK used the default number of wide characters.  Fixed
using a search for a null-terminator.