[RFC]The future of pexpect

As you probably know (I didn’t), lldb embeds its own version of
`pexpect-2.4`, which doesn’t support python3.
This is the (relatively short) list of tests relying on pyexpect:

testcases/tools/lldb-mi/syntax/TestMiSyntax.py: import pexpect
                        # 7 (EOF)
testcases/tools/lldb-mi/lldbmi_testcase.py: import pexpect
testcases/tools/lldb-mi/signal/TestMiSignal.py: import pexpect
testcases/tools/lldb-mi/signal/TestMiSignal.py: import pexpect
testcases/lldbtest.py: import pexpect
testcases/driver/batch_mode/TestBatchMode.py: import pexpect
testcases/driver/batch_mode/TestBatchMode.py: import pexpect
testcases/driver/batch_mode/TestBatchMode.py: import pexpect
testcases/driver/batch_mode/TestBatchMode.py: import pexpect
testcases/lldbpexpect.py: import pexpect
testcases/terminal/TestSTTYBeforeAndAfter.py: import pexpect
testcases/darwin_log.py: import pexpect
testcases/macosx/nslog/TestDarwinNSLogOutput.py: import pexpect
testcases/benchmarks/stepping/TestSteppingSpeed.py: import pexpect
testcases/benchmarks/frame_variable/TestFrameVariableResponse.py:
  import pexpect
testcases/benchmarks/turnaround/TestCompileRunToBreakpointTurnaround.py:
       import pexpect
testcases/benchmarks/turnaround/TestCompileRunToBreakpointTurnaround.py:
       import pexpect
testcases/benchmarks/expression/TestExpressionCmd.py: import pexpect
testcases/benchmarks/expression/TestRepeatedExprs.py: import pexpect
testcases/benchmarks/expression/TestRepeatedExprs.py: import pexpect
testcases/benchmarks/startup/TestStartupDelays.py: import pexpect
testcases/functionalities/command_regex/TestCommandRegex.py:
import pexpect
testcases/functionalities/single-quote-in-filename-to-lldb/TestSingleQuoteInFilename.py:
       import pexpect
testcases/functionalities/format/TestFormats.py: import pexpect

(I count 14, but there might be something else).

I audited all of them and from what I see they’re almost all testing the driver.
I had a chat with my coworkers and we agreed it's reasonable to
replace them with lit tests (as they're just running commands).
This would allow us to get rid of an external dependency, which
happened to be cause of trouble in the past.

Are there any objections?

Thanks,

+1

Thanks for bringing this up. I'd like to see this happen!

- Alex

This would be great. All of these tests have always been disabled on Windows so converting them to lit tests would increase test coverage there as well

I'm not a fan of pexpect, and if these tests can be converted to lit, then I'm all for it. But I do have a question.

There is a class of tests that cannot be written in the current lit framework, but they can with pexpect. A couple of weeks ago we had a patch fixing a bug where pressing up arrow while searching through the command history caused a crash. In the end a test for this was not included because it was hard for a reason unrelated to pexpect, but without pexpect (or something equivalent) writing a test for this would be impossible.

What's our story for testing interactive command-line functionalities? The way I see it, if we don't use pexpect, we'll either have to use some other tool which simulates a realistic terminal, or write our own. (We already have one attempt for this in unittests/Editline/EditlineTest.cpp, but this would need more work to be fully functional.)

pl

PS: Does anyone actually use the benchmark tests? Can we just delete them?

> As you probably know (I didn’t), lldb embeds its own version of
> `pexpect-2.4`, which doesn’t support python3.
> This is the (relatively short) list of tests relying on pyexpect:
>
> testcases/tools/lldb-mi/syntax/TestMiSyntax.py: import pexpect
> # 7 (EOF)
> testcases/tools/lldb-mi/lldbmi_testcase.py: import pexpect
> testcases/tools/lldb-mi/signal/TestMiSignal.py: import pexpect
> testcases/tools/lldb-mi/signal/TestMiSignal.py: import pexpect
> testcases/lldbtest.py: import pexpect
> testcases/driver/batch_mode/TestBatchMode.py: import pexpect
> testcases/driver/batch_mode/TestBatchMode.py: import pexpect
> testcases/driver/batch_mode/TestBatchMode.py: import pexpect
> testcases/driver/batch_mode/TestBatchMode.py: import pexpect
> testcases/lldbpexpect.py: import pexpect
> testcases/terminal/TestSTTYBeforeAndAfter.py: import pexpect
> testcases/darwin_log.py: import pexpect
> testcases/macosx/nslog/TestDarwinNSLogOutput.py: import pexpect
> testcases/benchmarks/stepping/TestSteppingSpeed.py: import pexpect
> testcases/benchmarks/frame_variable/TestFrameVariableResponse.py:
> import pexpect
> testcases/benchmarks/turnaround/TestCompileRunToBreakpointTurnaround.py:
> import pexpect
> testcases/benchmarks/turnaround/TestCompileRunToBreakpointTurnaround.py:
> import pexpect
> testcases/benchmarks/expression/TestExpressionCmd.py: import pexpect
> testcases/benchmarks/expression/TestRepeatedExprs.py: import pexpect
> testcases/benchmarks/expression/TestRepeatedExprs.py: import pexpect
> testcases/benchmarks/startup/TestStartupDelays.py: import pexpect
> testcases/functionalities/command_regex/TestCommandRegex.py:
> import pexpect
> testcases/functionalities/single-quote-in-filename-to-lldb/TestSingleQuoteInFilename.py:
> import pexpect
> testcases/functionalities/format/TestFormats.py: import pexpect
>
> (I count 14, but there might be something else).
>
> I audited all of them and from what I see they’re almost all testing the driver.
> I had a chat with my coworkers and we agreed it's reasonable to
> replace them with lit tests (as they're just running commands).
> This would allow us to get rid of an external dependency, which
> happened to be cause of trouble in the past.
>
> Are there any objections?
>
> Thanks,
>

I'm not a fan of pexpect, and if these tests can be converted to lit,
then I'm all for it. But I do have a question.

There is a class of tests that cannot be written in the current lit
framework, but they can with pexpect. A couple of weeks ago we had a
patch fixing a bug where pressing up arrow while searching through the
command history caused a crash. In the end a test for this was not
included because it was hard for a reason unrelated to pexpect, but
without pexpect (or something equivalent) writing a test for this would
be impossible.

I don't know about this, to be honest. Maybe lit should grow an
interactive mode somehow to accomodate for this functionality?
I'm not an expert in how it's implemented so that could be hard to achieve.
FWIW, I haven't seen anything that really requires interactivity, but
I have to admit I haven't looked really deeply.

What's our story for testing interactive command-line functionalities?
The way I see it, if we don't use pexpect, we'll either have to use some
other tool which simulates a realistic terminal, or write our own. (We
already have one attempt for this in
unittests/Editline/EditlineTest.cpp, but this would need more work to be
fully functional.)

pl

PS: Does anyone actually use the benchmark tests? Can we just delete them?

I don't know. Maybe Jim knows. I personally don't use them.

Was the test failing specifically in the keyboard handler for up arrow, or was it failing in the command history searching code? Because if it’s the latter, then we could have a command which searches the command history.

I don't think anybody uses these tests. They are all time based benchmarks, and in the end there was just too much variability for them to be really useful. We really need to do more work tracking performance, but I think a better approach is to focus on how much work we do (how many DIE's did you have to parse to do X, how many lookups did it take to compile an expression, or how many memory requests did a task take, things like that. Those seem to me likely to be ore stable.

That said, I'm unclear why any of the benchmark tests would need pexpect to function. The Driver used to have a lot more functionality in it that has since been moved into the SB API's. Maybe there was good reason for doing performance testing through the driver but I can't think of any good reason for doing that nowadays.

Jim

Even if it was the keyboard handler, lldb feeds characters to edit line through the IO Handler, so it should be possible to emulate the up arrow as well. If there are reasons why that's not feasible, we should be able to make it work. This seems a tractable problem to me, and to me seems a better place to put effort than something like pexpect.

Jim

The patch is r351313, if you want to look at it in detail. But, I don't think this one example matters too much, since we will always have some code which deals with the interactivity of the terminal. That will need to be tested somehow.

Another example: we have a fairly complex piece of code that makes sure our (lldb) prompt comes out in color. How do we write a test for that?

FileCheck the ansi escape codes seems like one possibility.

In general I think you don’t actually need to test true interactivity, because the odds of there being a problem in the 2-3 lines of code that convert the keyboard press to something else in LLDB are very unlikely to be problematic, and the rest can be mocked.

Was the test failing specifically in the keyboard handler for up arrow, or was it failing in the command history searching code? Because if it's the latter, then we could have a command which searches the command history.

The patch is r351313, if you want to look at it in detail. But, I don't think this one example matters too much, since we will always have some code which deals with the interactivity of the terminal. That will need to be tested somehow.

Another example: we have a fairly complex piece of code that makes sure our (lldb) prompt comes out in color. How do we write a test for that?

All the traffic back and forth with the terminal happens in the IOHandlerEditLine. We should be able to get our hands on the Debuggers IOHandler and feed characters directly to it, and read the results. So we should be able to write this kind of test by driving the debugger to whatever state you need with SB API and then just run one command and get the output string directly from the IOHandler. We should be able to then scan that output for color codes. I don't think we need an external process inspection tool to do this sort of thing.

Jim

Libedit expect to work with a real terminal, so to test the code that interacts with libedit (and there's more than 3 lines of that), you'll need something that can create a pty, and read and write characters to it, regardless of whether you drive the test through FileCheck or SB API.

"creating a pty, and reading and writing to it" is pretty much the definition of pexpect.

I am not saying either of this approaches can't be made to work, but I am not sure who is going to do it. I fear that we are shooting ourselves in the foot banning pexpect and then pushing patches without tests because "it's hard".

Just for fun, I tried to write a test to check the coloring of the prompt via pexpect. It was _literally_ three lines long:

def test_colored_prompt_comes_out_right(self):
     child = pexpect.spawn(lldbtest_config.lldbExec)
     child.expect_exact("(lldb) \x1b[1G\x1b[2m(lldb) \x1b[22m\x1b[8G")

BTW: I am not proposing we spend heroic efforts trying to port pexpect 2.4 to python3. But I would consider using a newer version of pexpect to write tests ***where it makes sense to do so***. At least until someone comes up with a better (and not vapourware) alternative...

pl

It’s worth mentioning that pexpect is basically unusable on Windows, so there’s still that.

Our interactive command line is basically unusable on windows, so there isn't anything to test anyway.

I expect (pun intended) that getting a working pexpect on windows will be much easier than getting a working interactive command line.

I found out that there are tests that effectively require
interactivity. Some of the lldb-mi ones are an example.
A common use-case is that of sending SIGTERM in a loop to make sure
`lldb-mi` doesn't crash and handle the signal correctly.

This functionality is really hard to replicate in lit _as is_.
Any ideas on how we could handle this case?

Thanks,

How hard is it to import a new version of pexpect which supports python3 and stuff?

I'm not sure how the situation is on darwin, but I'd expect (:P) that most linux systems either already have it installed, or have an easy way to do so. So we may not even be able to get away with just using the system one and skipping tests when it's not present.

BTW, for lldb-mi I would actually argue that it should *not* use pexpect :D. Interactivity is one thing, and I'm very much in favour of keeping that ability, but pexpect is not a prerequisite for that. For me, the main advantage of pexpect is that it emulates a real terminal. However, lldb-mi does not need that stuff. It doesn't have any command line editing capabilities or similar. It's expecting to communicate with an IDE over a pipe, and that's it.

Given that, it should be fairly easy to rewrite the lldb-mi tests to work on top of the standard python "subprocess" library. While we're doing that, we might actually fix some of the issues that have been bugging everyone in the lldb-mi tests. At least for me, the most annoying thing was that when lldb-mi fails to produce the expected output, the test does not fail immediately, but instead the implementation of self.expect("^whatever") waits until the timeout expires, optimistically hoping that it will find some output that match the pattern.

If we change this to something like self.expect_reply("^whatever"), and make the "expect_reply" function smart enough to know that lldb-mi's response should come as a single line, and if the first line doesn't match, it should abort, this problem would be fixed. While we're at it, we could also tune the failure message so that it's more helpful than the current implementation. Plus, that would solve the issue of not being able to run lldb-mi tests on windows.

Anyway, that's what I'd do. I was actually planning to look into that soon, but then I roped myself into writing a yaml (de)serialization tool for minidumps, so I have no idea when I will get back to that. I hope some of this is helpful nonetheless.

cheers,
pavel

s/may not/may/

From: lldb-dev <lldb-dev-bounces@lists.llvm.org> On Behalf Of Pavel Labath
via lldb-dev
Sent: Thursday, February 21, 2019 8:35 AM
To: Davide Italiano <dccitaliano@gmail.com>
Cc: LLDB <lldb-dev@lists.llvm.org>
Subject: [EXT] Re: [lldb-dev] [RFC]The future of pexpect

> I found out that there are tests that effectively require
> interactivity. Some of the lldb-mi ones are an example.
> A common use-case is that of sending SIGTERM in a loop to make sure
> `lldb-mi` doesn't crash and handle the signal correctly.
>
> This functionality is really hard to replicate in lit_as is_.
> Any ideas on how we could handle this case?

How hard is it to import a new version of pexpect which supports python3 and
stuff?

I'm not sure how the situation is on darwin, but I'd expect (:P) that most linux
systems either already have it installed, or have an easy way to do so. So we
may not even be able to get away with just using the system one and skipping
tests when it's not present.

BTW, for lldb-mi I would actually argue that it should *not* use pexpect :D.
Interactivity is one thing, and I'm very much in favour of keeping that ability,
but pexpect is not a prerequisite for that. For me, the main advantage of
pexpect is that it emulates a real terminal. However, lldb-mi does not need
that stuff. It doesn't have any command line editing capabilities or similar. It's
expecting to communicate with an IDE over a pipe, and that's it.

Given that, it should be fairly easy to rewrite the lldb-mi tests to work on top
of the standard python "subprocess" library. While we're doing that, we might
actually fix some of the issues that have been bugging everyone in the lldb-mi
tests. At least for me, the most annoying thing was that when lldb-mi fails to
produce the expected output, the test does not fail immediately, but instead
the implementation of self.expect("^whatever") waits until the timeout
expires, optimistically hoping that it will find some output that match the
pattern.

If we change this to something like self.expect_reply("^whatever"), and make
the "expect_reply" function smart enough to know that lldb-mi's response
should come as a single line, and if the first line doesn't match, it should abort,
this problem would be fixed. While we're at it, we could also tune the failure
message so that it's more helpful than the current implementation. Plus, that
would solve the issue of not being able to run lldb-mi tests on windows.

This would be OK, I think, as long as "expect_reply" has the option to do a partial match,
or a regex match. Some of the lldb-mi tests only look for certain parts of the reply.

Also, until Python2 is declared dead and not supported at all by lldb, we should be able
to run this under 2 or 3.

From: lldb-dev <lldb-dev-bounces@lists.llvm.org> On Behalf Of Pavel Labath
via lldb-dev
Sent: Thursday, February 21, 2019 8:35 AM
To: Davide Italiano <dccitaliano@gmail.com>
Cc: LLDB <lldb-dev@lists.llvm.org>
Subject: [EXT] Re: [lldb-dev] [RFC]The future of pexpect

I found out that there are tests that effectively require
interactivity. Some of the lldb-mi ones are an example.
A common use-case is that of sending SIGTERM in a loop to make sure
`lldb-mi` doesn't crash and handle the signal correctly.

This functionality is really hard to replicate in lit_as is_.
Any ideas on how we could handle this case?

How hard is it to import a new version of pexpect which supports python3 and
stuff?

I'm not sure how the situation is on darwin, but I'd expect (:P) that most linux
systems either already have it installed, or have an easy way to do so. So we
may not even be able to get away with just using the system one and skipping
tests when it's not present.

BTW, for lldb-mi I would actually argue that it should *not* use pexpect :D.
Interactivity is one thing, and I'm very much in favour of keeping that ability,
but pexpect is not a prerequisite for that. For me, the main advantage of
pexpect is that it emulates a real terminal. However, lldb-mi does not need
that stuff. It doesn't have any command line editing capabilities or similar. It's
expecting to communicate with an IDE over a pipe, and that's it.

Given that, it should be fairly easy to rewrite the lldb-mi tests to work on top
of the standard python "subprocess" library. While we're doing that, we might
actually fix some of the issues that have been bugging everyone in the lldb-mi
tests. At least for me, the most annoying thing was that when lldb-mi fails to
produce the expected output, the test does not fail immediately, but instead
the implementation of self.expect("^whatever") waits until the timeout
expires, optimistically hoping that it will find some output that match the
pattern.

If we change this to something like self.expect_reply("^whatever"), and make
the "expect_reply" function smart enough to know that lldb-mi's response
should come as a single line, and if the first line doesn't match, it should abort,
this problem would be fixed. While we're at it, we could also tune the failure
message so that it's more helpful than the current implementation. Plus, that
would solve the issue of not being able to run lldb-mi tests on windows.

This would be OK, I think, as long as "expect_reply" has the option to do a partial match,
or a regex match. Some of the lldb-mi tests only look for certain parts of the reply.

Yes, except from the difference in treating "messages" independently, the function could/should have the same matching capabilities as the current one.

I do see an opportunity to improve this to do some kind of structure-aware matching (expect_reply("library-loaded", target_name="foo.so", loaded_addr=0x47000)). IMO, that would make these tests superior even to the current lit tests, but I'm not an lldb-mi developer, so I'll probably stop short of doing that. :slight_smile:

Also, until Python2 is declared dead and not supported at all by lldb, we should be able
to run this under 2 or 3.

Yes, of course.

Pavel, I think yours is a really nice idea.
I'm no python expert, but I found out making the conversion is
relatively simple.
I propose a proof-of-concept API and implementation here:

Comments appreciated! Once we agree on how this should look like, I do
recommend to have a new lldbMITest base class and incrementally start
moving the tests to it.
Once we're done, we can delete the old class.

Does this sound reasonable?