Update on Linux work.

Hi all!

This is a quick update on some work I have been doing on the linux
front, specifically a Process plugin for that platform.

The current effort is still at an early stage. In short, we can launch
an inferior process and set/step-over breakpoints.

However, I am thinking about merging this initial work into the tree
sooner rather than later to avoid dumping a huge patch on everyone at a
later time. Although *not quite ready*, you can see the current plugin
code here:

lldb/source/Plugins/Process/Linux at swilson-process · ice799/lldb · GitHub

A few issues I came across while working on this which should be
resolved before the linux plugin can be merged:

- We need to make the dwarf debug section names (as defined
   SymbolFileDwarf.cpp) selectable by platform. I implemented a simple
   macro hack to work around this for now. A better solution might be
   to have a header #define the section names. Any suggestions
   appreciated:

     Macrology to get DWARF section names working on linux. Temporary hack. · ice799/lldb@284c444 · GitHub

- There is a good bit of sharable code in the MacOSX-User plugin with
   respect to RegisterContext specializations. Please find attached two
   files which provide a partial x86_64 RegisterContext specialization.
   The idea is to introduce
   lldb/include/Target/X86/RegisterContext_x86_64.h so all all plugins
   can share common definitions and code. Again, any thoughts or
   suggestions on this approach appreciated!

Take care,
Steve

RegisterContext_x86_64.h (7.51 KB)

RegisterContext_x86_64.cpp (7.87 KB)

Hi all!

This is a quick update on some work I have been doing on the linux
front, specifically a Process plugin for that platform.

The current effort is still at an early stage. In short, we can launch
an inferior process and set/step-over breakpoints.

That is great news!

However, I am thinking about merging this initial work into the tree
sooner rather than later to avoid dumping a huge patch on everyone at a
later time. Although *not quite ready*, you can see the current plugin
code here:

lldb/source/Plugins/Process/Linux at swilson-process · ice799/lldb · GitHub

A few issues I came across while working on this which should be
resolved before the linux plugin can be merged:

- We need to make the dwarf debug section names (as defined
  SymbolFileDwarf.cpp) selectable by platform. I implemented a simple
  macro hack to work around this for now. A better solution might be
  to have a header #define the section names. Any suggestions
  appreciated:

    http://github.com/ice799/lldb/commit/284c444cae1feedc39f48ffed1265abf3fea291c

Sections already contain a "SectionType" enumeration. I _just_ added new definitions for the DWARF section types (r109041) and added the ability to search a SectionList by SectionType (r109040).

So if you make the ELF ObjectFile parser correctly set the SectionType for DWARF sections correctly, you can use the new function I just added:

lldb::SectionSP SectionList::FindSectionByType (lldb::SectionType sect_type, uint32_t start_idx) const;

This will allow us to insulate the LLDB core logic from having to know the the naming conventions imposed by the object file format (MachO and ELF). I will take care of making the changes for the MachO ObjectFile parser to set the section types correctly and also modify the DWARF plug-in to search for the sections by type.

- There is a good bit of sharable code in the MacOSX-User plugin with
  respect to RegisterContext specializations. Please find attached two
  files which provide a partial x86_64 RegisterContext specialization.
  The idea is to introduce
  lldb/include/Target/X86/RegisterContext_x86_64.h so all all plugins
  can share common definitions and code. Again, any thoughts or
  suggestions on this approach appreciated!

This sounds like a nice idea, though I don't know how well this would work in practice. The idea behind the register context is to allow each plug-in to define its own register numbers as they make sense for the platform. The number of available registers for x86_64 might be different between Darwin and Linux. The register numbers can always be set by each plug-in to match as closely to the native OS register structures as possible to make the creation of the ReadRegister/WriteRegister very easy to code for that platform. For example the Darwin user threads have access to the following structure for the GPR registers on Mac OS X:

    struct GPR
    {
        uint64_t rax;
        uint64_t rbx;
        uint64_t rcx;
        uint64_t rdx;
        uint64_t rdi;
        uint64_t rsi;
        uint64_t rbp;
        uint64_t rsp;
        uint64_t r8;
        uint64_t r9;
        uint64_t r10;
        uint64_t r11;
        uint64_t r12;
        uint64_t r13;
        uint64_t r14;
        uint64_t r15;
        uint64_t rip;
        uint64_t rflags;
        uint64_t cs;
        uint64_t fs;
        uint64_t gs;
    };

Does linux have the exact same registers? If not, I would vote to keep each implementation separate.

Also different OS version of the same system may have more or less registers available. So code could be dynamically determine which registers are available and different registers contexts could be returned as a result.

There is also a rule that all register numbers defined by a context must start at zero and have no gaps, which makes making a single definition for registers for a given architecture even harder because you might have to zero out registers that don't exist.

So I would say to keep these separate and allow each plug-in to define regiters contexts that exactly match the current registers that can be provided by each platform.

Hi Greg,

Greg Clayton <gclayton@apple.com> writes:

- We need to make the dwarf debug section names (as defined
  SymbolFileDwarf.cpp) selectable by platform. I implemented a simple
  macro hack to work around this for now. A better solution might be
  to have a header #define the section names. Any suggestions
  appreciated:

    Macrology to get DWARF section names working on linux. Temporary hack. · ice799/lldb@284c444 · GitHub

Sections already contain a "SectionType" enumeration. I _just_ added
new definitions for the DWARF section types (r109041) and added the
ability to search a SectionList by SectionType (r109040).

So if you make the ELF ObjectFile parser correctly set the SectionType
for DWARF sections correctly, you can use the new function I just
added:

lldb::SectionSP SectionList::FindSectionByType (lldb::SectionType
sect_type, uint32_t start_idx) const;

This will allow us to insulate the LLDB core logic from having to know
the the naming conventions imposed by the object file format (MachO
and ELF). I will take care of making the changes for the MachO
ObjectFile parser to set the section types correctly and also modify
the DWARF plug-in to search for the sections by type.

This is great. Will update the ELF reader at the first opportunity.

- There is a good bit of sharable code in the MacOSX-User plugin with
  respect to RegisterContext specializations. Please find attached two
  files which provide a partial x86_64 RegisterContext specialization.
  The idea is to introduce
  lldb/include/Target/X86/RegisterContext_x86_64.h so all all plugins
  can share common definitions and code. Again, any thoughts or
  suggestions on this approach appreciated!

This sounds like a nice idea, though I don't know how well this would
work in practice. The idea behind the register context is to allow
each plug-in to define its own register numbers as they make sense for
the platform.

OK. I misinterpreted the intended semantics for this class. My
impression was that we could have a base set of "known" registers that
would be translated into the internal encoding via a call to
ConvertRegisterKindToRegisterNumber (which would return
LLDB_INVALID_REGNUM if not available). Finally, to support additional
pseudo-register sets (like Mach exception state) the platform plugin
would provide additional RegisterSet's beyond that provided by the base
class.

But from what you write below I see there is more to consider :slight_smile:

The number of available registers for x86_64 might be
different between Darwin and Linux. The register numbers can always be
set by each plug-in to match as closely to the native OS register
structures as possible to make the creation of the
ReadRegister/WriteRegister very easy to code for that platform. For
example the Darwin user threads have access to the following structure
for the GPR registers on Mac OS X:

    struct GPR
    {
        uint64_t rax;
        uint64_t rbx;
        uint64_t rcx;
        uint64_t rdx;
        uint64_t rdi;
        uint64_t rsi;
        uint64_t rbp;
        uint64_t rsp;
        uint64_t r8;
        uint64_t r9;
        uint64_t r10;
        uint64_t r11;
        uint64_t r12;
        uint64_t r13;
        uint64_t r14;
        uint64_t r15;
        uint64_t rip;
        uint64_t rflags;
        uint64_t cs;
        uint64_t fs;
        uint64_t gs;
    };

Does linux have the exact same registers? If not, I would vote to keep
each implementation separate.

At first glance it looks like ES, DS and SS are available on linux as
well, but IIRC are always zero on 64 bit arch regardless.

Also different OS version of the same system may have more or less
registers available. So code could be dynamically determine which
registers are available and different registers contexts could be
returned as a result.

OK. I did not consider this possibility.

There is also a rule that all register numbers defined by a context
must start at zero and have no gaps, which makes making a single
definition for registers for a given architecture even harder because
you might have to zero out registers that

So I would say to keep these separate and allow each plug-in to define
regiters contexts that exactly match the current registers that can be
provided by each platform.

Thanks so much for the feedback! This is very helpful for a newcomer
such as myself! As I get the plugin ready for commit I will provide
private versions of the register context.

BTW, any hope some of the LLDB devs will show up on #llvm?

Thanks again,
Steve

Hi Greg,

Greg Clayton <gclayton@apple.com> writes:

- We need to make the dwarf debug section names (as defined
SymbolFileDwarf.cpp) selectable by platform. I implemented a simple
macro hack to work around this for now. A better solution might be
to have a header #define the section names. Any suggestions
appreciated:

   Macrology to get DWARF section names working on linux. Temporary hack. · ice799/lldb@284c444 · GitHub

Sections already contain a "SectionType" enumeration. I _just_ added
new definitions for the DWARF section types (r109041) and added the
ability to search a SectionList by SectionType (r109040).

So if you make the ELF ObjectFile parser correctly set the SectionType
for DWARF sections correctly, you can use the new function I just
added:

lldb::SectionSP SectionList::FindSectionByType (lldb::SectionType
sect_type, uint32_t start_idx) const;

This will allow us to insulate the LLDB core logic from having to know
the the naming conventions imposed by the object file format (MachO
and ELF). I will take care of making the changes for the MachO
ObjectFile parser to set the section types correctly and also modify
the DWARF plug-in to search for the sections by type.

This is great. Will update the ELF reader at the first opportunity.

I just did the Mach and ELF plugins: Committed revision 109054. This patch also fixes the DWARF plugin to get any needed DWARF sections by SectionType. So everything should be working from the DWARF perpsective now.

- There is a good bit of sharable code in the MacOSX-User plugin with
respect to RegisterContext specializations. Please find attached two
files which provide a partial x86_64 RegisterContext specialization.
The idea is to introduce
lldb/include/Target/X86/RegisterContext_x86_64.h so all all plugins
can share common definitions and code. Again, any thoughts or
suggestions on this approach appreciated!

This sounds like a nice idea, though I don't know how well this would
work in practice. The idea behind the register context is to allow
each plug-in to define its own register numbers as they make sense for
the platform.

OK. I misinterpreted the intended semantics for this class. My
impression was that we could have a base set of "known" registers that
would be translated into the internal encoding via a call to
ConvertRegisterKindToRegisterNumber (which would return
LLDB_INVALID_REGNUM if not available). Finally, to support additional
pseudo-register sets (like Mach exception state) the platform plugin
would provide additional RegisterSet's beyond that provided by the base
class.

But from what you write below I see there is more to consider :slight_smile:

The number of available registers for x86_64 might be
different between Darwin and Linux. The register numbers can always be
set by each plug-in to match as closely to the native OS register
structures as possible to make the creation of the
ReadRegister/WriteRegister very easy to code for that platform. For
example the Darwin user threads have access to the following structure
for the GPR registers on Mac OS X:

   struct GPR
   {
       uint64_t rax;
       uint64_t rbx;
       uint64_t rcx;
       uint64_t rdx;
       uint64_t rdi;
       uint64_t rsi;
       uint64_t rbp;
       uint64_t rsp;
       uint64_t r8;
       uint64_t r9;
       uint64_t r10;
       uint64_t r11;
       uint64_t r12;
       uint64_t r13;
       uint64_t r14;
       uint64_t r15;
       uint64_t rip;
       uint64_t rflags;
       uint64_t cs;
       uint64_t fs;
       uint64_t gs;
   };

Does linux have the exact same registers? If not, I would vote to keep
each implementation separate.

At first glance it looks like ES, DS and SS are available on linux as
well, but IIRC are always zero on 64 bit arch regardless.

Also different OS version of the same system may have more or less
registers available. So code could be dynamically determine which
registers are available and different registers contexts could be
returned as a result.

OK. I did not consider this possibility.

There is also a rule that all register numbers defined by a context
must start at zero and have no gaps, which makes making a single
definition for registers for a given architecture even harder because
you might have to zero out registers that

So I would say to keep these separate and allow each plug-in to define
regiters contexts that exactly match the current registers that can be
provided by each platform.

Thanks so much for the feedback! This is very helpful for a newcomer
such as myself! As I get the plugin ready for commit I will provide
private versions of the register context.

BTW, any hope some of the LLDB devs will show up on #llvm?

That is on my todo list, I will try and get on there ASAP!

I look forward to seeing you land your patch. I will try and take a look at it after my 4PM (PST) meeting.

Greg Clayton

I got a chance to checkout the start of your Process plug-in.

A few things:

LinuxThread::BreakNotify()

  You might want to call into your architecture specific register context and let it
  know it hit a breakpoint so you can have it do what it needs to. That way
  for i386 and x86_64 you can backup the PC, but for other architectures
  you don't have to.

Are you reaping your process after you launch/attach to it with waitpid anywhere?
There is a lldb_private::Host abstraction call to do this if you need that service.

Checkout the Host.h:
  Host::StartMonitoringChildProcess(...)

and search for its use in ProcessMacOSX for an example.

ProcessLinux.cpp:74

  Is there a reason for the g_process global? You can have more than one process
  at a time in LLDB, so it seems like a dangerous variable to keep around. If you
  need a global way to locate your process by pid, you can use a static function
  in lldb_private::Debugger:

  static lldb::TargetSP FindTargetWithProcessID (lldb::pid_t pid);

  If you need to lookup a target by any other means globally, let me know and
  we can add the needed functionality to a static call in Debugger.

ProcessLinux.cpp:79

  Your ProcessLinux constructor shouldn't need to call UpdateLoadedSections()
  since when a process is created it isn't alive yet, nor does it have
  any connection to a valid live process. DoLaunch will need to be called,
  or DoAttach before you would need to call this function. Also there are
  functions that get called prior to, and after DoLaunch and DoAttach:

  When launching the follwing functions will be called:

  virtual Error WillLaunch (...);
  virtual Error DoLaunch (...);
  virtual void DidLaunch (...);

  Likewise with DoAttach:

  virtual Error WillAttachToProcessWithID (lldb::pid_t pid);
  virtual Error DoAttachToProcessWithID (lldb::pid_t pid);
  virtual void DidAttach ();
  
  virtual Error WillAttachToProcessWithName (const char *process_name, bool wait_for_launch)
  virtual Error DoAttachToProcessWithName (const char *process_name, bool wait_for_launch)
  virtual void DidAttach ();

  If any of the Will* functions return an error, the process launch/attach
  will stop. If they return succes, then the Do* functions will be called.
  If the Do* functions return success, then the Did* functions will be called.

  So a good place to do your call to "UpdateLoadedSections()" is in the
  DidLaunch() or DidAttach() functions.

  There are similar Will*/Do*/Did* functions for detaching, and a few
  other things. Many of them have default implementations that do nothing,
  but are designed to be overridden so you can do just this kind of stuff.

  You will also want to make a DYLD plug-in at some point to take care of
  this if you plan to re-use the code that can locate where shared libraries
  are loaded in another linux process plug-in (like for remote debugging
  using "debugserver"). This way you can just plug your Linux dynamic loader
  plug-in into ProcessGDBRemote and all should work (after "debugserver" has
  been modified to run on linux that is).

RegisterContextLinux_x86_64:
  You will want to fill in a static RegisterInfo array and return valid
  values for your registers (See how the other RegisterContext subclasses
  in ProcessMacOSX do this).

Other than that, overall it looks pretty good. Feel free to commit your
"source/Plugins/Process/Linux" whenever you can!

Greg

Greg Clayton <gclayton@apple.com> writes:

I got a chance to checkout the start of your Process plug-in.

A few things:

LinuxThread::BreakNotify()

  You might want to call into your architecture specific register context and let it
  know it hit a breakpoint so you can have it do what it needs to. That way
  for i386 and x86_64 you can backup the PC, but for other architectures
  you don't have to.

Ah, good idea. Will do.

Are you reaping your process after you launch/attach to it with waitpid anywhere?
There is a lldb_private::Host abstraction call to do this if you need that service.

Checkout the Host.h:
  Host::StartMonitoringChildProcess(...)

and search for its use in ProcessMacOSX for an example.

Yes, I am calling waitpid from the so-called
ProcessMonitor::SignalThread. StartMonitoringChildProcess looks like it
will help with the job nicely. Thanks for pointing that one out!

ProcessLinux.cpp:74

  Is there a reason for the g_process global? You can have more than one process
  at a time in LLDB, so it seems like a dangerous variable to keep around. If you
  need a global way to locate your process by pid, you can use a static function
  in lldb_private::Debugger:

  static lldb::TargetSP FindTargetWithProcessID (lldb::pid_t pid);

  If you need to lookup a target by any other means globally, let me know and
  we can add the needed functionality to a static call in Debugger.

Ah, no. There is no reason for g_process anymore -- just leftover cruft from
previous work. Will remove.

Regarding multiple processes: Is it guaranteed that a new Process
instance is created for every process launched, or is it possible that
the same instance be called upon to manage a different inferior (say via
a call sequence of the form Launch(process-1), Destroy(process-1),
Launch(process-2)) ?

ProcessLinux.cpp:79

  Your ProcessLinux constructor shouldn't need to call UpdateLoadedSections()
  since when a process is created it isn't alive yet, nor does it have
  any connection to a valid live process.

This is just a temporary hack to work around the lack of a DynamicLoader
plugin. But even if it is temporary I should still be calling it from
the right spot. Will fix.

      DoLaunch will need to be called,
  or DoAttach before you would need to call this function. Also there are
  functions that get called prior to, and after DoLaunch and DoAttach:

  When launching the follwing functions will be called:

  virtual Error WillLaunch (...);
  virtual Error DoLaunch (...);
  virtual void DidLaunch (...);

  Likewise with DoAttach:

  virtual Error WillAttachToProcessWithID (lldb::pid_t pid);
  virtual Error DoAttachToProcessWithID (lldb::pid_t pid);
  virtual void DidAttach ();
  
  virtual Error WillAttachToProcessWithName (const char *process_name, bool wait_for_launch)
  virtual Error DoAttachToProcessWithName (const char *process_name, bool wait_for_launch)
  virtual void DidAttach ();

  If any of the Will* functions return an error, the process launch/attach
  will stop. If they return succes, then the Do* functions will be called.
  If the Do* functions return success, then the Did* functions will be called.

  So a good place to do your call to "UpdateLoadedSections()" is in the
  DidLaunch() or DidAttach() functions.

OK. Will move the UpdateLoadedSections call. And thanks for the
clarification about the role of these methods.

  There are similar Will*/Do*/Did* functions for detaching, and a few
  other things. Many of them have default implementations that do nothing,
  but are designed to be overridden so you can do just this kind of stuff.

  You will also want to make a DYLD plug-in at some point to take care of
  this if you plan to re-use the code that can locate where shared libraries
  are loaded in another linux process plug-in (like for remote debugging
  using "debugserver"). This way you can just plug your Linux dynamic loader
  plug-in into ProcessGDBRemote and all should work (after "debugserver" has
  been modified to run on linux that is).

There is another developer who is putting effort into a DYLD plugin, so
plans are in the works for adding this support.

RegisterContextLinux_x86_64:
  You will want to fill in a static RegisterInfo array and return valid
  values for your registers (See how the other RegisterContext subclasses
  in ProcessMacOSX do this).

This is on my todo list. Does not seem critical for the simple debug
sessions I can run with the plugin as is, but I will add this very soon.

Other than that, overall it looks pretty good. Feel free to commit your
"source/Plugins/Process/Linux" whenever you can!

Thanks so much for looking this over and letting me contribute to the
project! I should be able to get the plugin ready for commit within a
few days.

Thanks again!
Steve

Regarding multiple processes: Is it guaranteed that a new Process
instance is created for every process launched, or is it possible that
the same instance be called upon to manage a different inferior (say via
a call sequence of the form Launch(process-1), Destroy(process-1),
Launch(process-2)) ?

We currently always destroy the old process class and create a new one. So yes, you can expect a new Process instance for each process.

There is another developer who is putting effort into a DYLD plugin, so
plans are in the works for adding this support.

Great!

Other than that, overall it looks pretty good. Feel free to commit your

“source/Plugins/Process/Linux” whenever you can!

Thanks so much for looking this over and letting me contribute to the
project! I should be able to get the plugin ready for commit within a
few days.

I look forward to seeing your checkin!