ValueObjectChild and SetData

Hi LLDB devs,

short question. Since the method

  bool ValueObjectChild::SetData(DataExtractor &data, Status &error)

doesn't exist, what is the preferred way to update the contents of
scalar bitfields?

Is there any code in the repo demonstrating the technique?

I am interested, because for the language I am writing a plugin for
certain datatypes are MSBit-aligned, e.g. a Nat16 occupies the upper
portion of the bits in a (32-bit) word.

Viewing and setting of such variables thus involves shifting bits, and
I'd expect that ValueObjectChild (in bitfield mode) would do that for
me.

Thanks in advance for any clues,

cheers,

    Gabor

Hi LLDB devs,

short question. Since the method

bool ValueObjectChild::SetData(DataExtractor &data, Status &error)

doesn't exist, what is the preferred way to update the contents of
scalar bitfields?

Is there any code in the repo demonstrating the technique?

I am interested, because for the language I am writing a plugin for
certain datatypes are MSBit-aligned, e.g. a Nat16 occupies the upper
portion of the bits in a (32-bit) word.

Viewing and setting of such variables thus involves shifting bits, and
I'd expect that ValueObjectChild (in bitfield mode) would do that for
me.

Thanks in advance for any clues,

cheers,

   Gabor

What is the debug information format being used for these? If it is DWARF, the location expression for the variable should take care of extracting the value correctly.

A bit more info: Each ValueObject has a "Value m_value;" member variable that contains the value of the variable. This "Value" object has many member variables:

class Value {
  Scalar m_value;
  Vector m_vector;
  CompilerType m_compiler_type;
  void *m_context;
  ValueType m_value_type;
  ContextType m_context_type;
  DataBufferHeap m_data_buffer;
};

The "m_value_type" helps us to know how to interpret the value itself which is either contained in "Scalar m_value;" or "DataBufferHeap m_data_buffer;". ValueType is one of:

  enum ValueType {
    // m_value contains...
    // ============================
    eValueTypeScalar, // raw scalar value
    eValueTypeVector, // byte array of m_vector.length with endianness of
                           // m_vector.byte_order
    eValueTypeFileAddress, // file address value
    eValueTypeLoadAddress, // load address value
    eValueTypeHostAddress // host address value (for memory in the process that
                           // is using liblldb)
  };

eValueTypeScalar means that the value itself is actually in "Scalar m_value;". This is the typical way that built in types (ints, floats, chars, etc) get resolved. For bitfields, this value should already be shifted around as necessary when the location information from the debug info was parsed and used to create the value at a specific location in the code.

eValueTypeFileAddress means that "m_value" contains a "file address" which points to the location of the variable in memory. This value will need to be converted to a "load address" when we extract the value and then we will read the value from process memory each time we get the location.

eValueTypeLoadAddress means that "m_value" contains a "load address" which points to the memory address in the process where we will read the value from.

eValueTypeHostAddress means that "m_value" contains an address in the LLDB process itself. This is typically used for variables that are constructed with complex location expressions that might say "2 bytes of my value are at XXX, 4 bytes of my value are in this register and 2 bytes are constant". So when we evaluate the location expression, it will hand us a buffer that contains the variable value.

So your case seems like a standard bitfield case where the debug info should be adequately describing the bitfield and everything should just work. Are there any reasons why you think this might not be happening?

Greg

Hi LLDB devs,

short question. Since the method

bool ValueObjectChild::SetData(DataExtractor &data, Status &error)

doesn't exist, what is the preferred way to update the contents of
scalar bitfields?

Is there any code in the repo demonstrating the technique?

I am interested, because for the language I am writing a plugin for
certain datatypes are MSBit-aligned, e.g. a Nat16 occupies the upper
portion of the bits in a (32-bit) word.

Viewing and setting of such variables thus involves shifting bits, and
I'd expect that ValueObjectChild (in bitfield mode) would do that for
me.

Thanks in advance for any clues,

cheers,

   Gabor

What is the debug information format being used for these? If it is DWARF,
the location expression for the variable should take care of extracting the
value correctly.

Hi Greg,

thanks for the very elaborate answer! Please find my replies inline, below.

In my case `lldb` is reading DWARF. Things are being complicated a bit
by the fact that I am targeting a Wasm platform (WASI), and thus the
location of the
formal arguments is in locals (but this is comparable to registers on
common architectures).

Are you suggesting that the location expression should massage the
formal parameter?

Currently I emit

0x000000db:     DW_TAG_formal_parameter
                  DW_AT_name	("n")
                  DW_AT_decl_line	(5)
                  DW_AT_decl_column	(0x09)
                  DW_AT_type	(0x0000002b "Nat16")
                  DW_AT_location	(DW_OP_WASM_location 0x0 +1, DW_OP_stack_value)

and `Nat16` is defined as:

0x0000002b:   DW_TAG_base_type
                DW_AT_name	("Nat16")
                DW_AT_bit_size	(0x20)
                DW_AT_data_bit_offset	(0x10)
                DW_AT_encoding	(DW_ATE_unsigned)

A bit more info: Each ValueObject has a "Value m_value;" member variable
that contains the value of the variable. This "Value" object has many member
variables:

class Value {
  Scalar m_value;
  Vector m_vector;
  CompilerType m_compiler_type;
  void *m_context;
  ValueType m_value_type;
  ContextType m_context_type;
  DataBufferHeap m_data_buffer;
};

The "m_value_type" helps us to know how to interpret the value itself which
is either contained in "Scalar m_value;" or "DataBufferHeap m_data_buffer;".
ValueType is one of:

  enum ValueType {
    // m_value contains...
    // ============================
    eValueTypeScalar, // raw scalar value
    eValueTypeVector, // byte array of m_vector.length with endianness
of
                           // m_vector.byte_order
    eValueTypeFileAddress, // file address value
    eValueTypeLoadAddress, // load address value
    eValueTypeHostAddress // host address value (for memory in the process
that
                           // is using liblldb)
  };

eValueTypeScalar means that the value itself is actually in "Scalar
m_value;". This is the typical way that built in types (ints, floats, chars,
etc) get resolved. For bitfields, this value should already be shifted
around as necessary when the location information from the debug info was
parsed and used to create the value at a specific location in the code.

I am not sure I can observe this. But I'll go hunting.

eValueTypeFileAddress means that "m_value" contains a "file address" which
points to the location of the variable in memory. This value will need to be
converted to a "load address" when we extract the value and then we will
read the value from process memory each time we get the location.

eValueTypeLoadAddress means that "m_value" contains a "load address" which
points to the memory address in the process where we will read the value
from.

While stepping around I have seen `eValueTypeLoadAddress`.

eValueTypeHostAddress means that "m_value" contains an address in the LLDB
process itself. This is typically used for variables that are constructed
with complex location expressions that might say "2 bytes of my value are at
XXX, 4 bytes of my value are in this register and 2 bytes are constant". So
when we evaluate the location expression, it will hand us a buffer that
contains the variable value.

So your case seems like a standard bitfield case where the debug info should
be adequately describing the bitfield and everything should just work. Are
there any reasons why you think this might not be happening?

I am new in these matters, and I'll find out. It would be awesome to not
having a need to patch LLDB.

Thanks again, I'll come back as soon as I have better facts,
gathered by poking around.

Cheers,

     Gabor

Thanks for the info. Comments inlined below!

Hi LLDB devs,

short question. Since the method

bool ValueObjectChild::SetData(DataExtractor &data, Status &error)

doesn’t exist, what is the preferred way to update the contents of
scalar bitfields?

Is there any code in the repo demonstrating the technique?

I am interested, because for the language I am writing a plugin for
certain datatypes are MSBit-aligned, e.g. a Nat16 occupies the upper
portion of the bits in a (32-bit) word.

Viewing and setting of such variables thus involves shifting bits, and
I’d expect that ValueObjectChild (in bitfield mode) would do that for
me.

Thanks in advance for any clues,

cheers,

Gabor

What is the debug information format being used for these? If it is DWARF,
the location expression for the variable should take care of extracting the
value correctly.

Hi Greg,

thanks for the very elaborate answer! Please find my replies inline, below.

In my case lldb is reading DWARF. Things are being complicated a bit
by the fact that I am targeting a Wasm platform (WASI), and thus the
location of the
formal arguments is in locals (but this is comparable to registers on
common architectures).

Are you suggesting that the location expression should massage the
formal parameter?

Currently I emit

0x000000db: DW_TAG_formal_parameter
DW_AT_name ("n")
DW_AT_decl_line (5)
DW_AT_decl_column (0x09)
DW_AT_type (0x0000002b "Nat16")
DW_AT_location (DW_OP_WASM_location 0x0 +1, DW_OP_stack_value)

DW_OP_stack_value implies that after running this expression the value of this variable exists on the DWARF stack. This should mean that the “Value” would have a ValueType of eValueTypeScalar. I am guessing when you see these variables you always get the entire integer value of all bitfields that shared this integer. Is that correct?

and Nat16 is defined as:

0x0000002b: DW_TAG_base_type
DW_AT_name ("Nat16")
DW_AT_bit_size (0x20)
DW_AT_data_bit_offset (0x10)
DW_AT_encoding (DW_ATE_unsigned)

Interesting, from reading the DWARF specification, it is legal for a DW_TAG_base_type to have a DW_AT_bit_size and DW_AT_data_bit_offset, but LLDB is currently not handling this situation. We handle these for bitfields, which are currently attached to DW_TAG_member of a struct.

So we have two options to fix these kinds of variables:

  • fix LLDB to handle the DW_AT_data_bit_offset and DW_AT_bit_size on DW_TAG_base_type types (requires LLDB fix)
  • fix the DW_AT_location expression to shift and mask the integer with extra DW_OP opcodes (no fix required in LLDB)

The expression could be modified to add the data bit offset

DW_AT_location (DW_OP_WASM_location 0x0 +1, DW_OP_stack_value, DW_OP_const1u(0x10), DW_OP_shr, DW_OP_const4u(0xffffffff), DW_OP_and)

To break this down:

This gets the integer value and places it on the stack:
DW_OP_WASM_location 0x0 +1, DW_OP_stack_value

stack[0] = full_value

This pushes the data bit offset onto the stack:

DW_OP_const1u(0x10)

stack[0] = full_value
stack[1] = 0x10

This shifts the full_value to the right by 0x10:

DW_OP_shr

stack[0] = full_value >> 0x10

Now we need to make up a mask to mask of the first DW_AT_bit_size bits:

DW_OP_const4u(0xffffffff)

stack[0] = full_value >> 0x10
stack[1] = (1 << 0x20) - 1 (which makes the mask of 0xffffffff)

Now we mask off the high bits using the mask we just created

DW_OP_and

stack[0] = (full_value >> 0x10) & 0xffffffff

Now the value of the variable is correct.

Thanks for the info. Comments inlined below!

Hi LLDB devs,

short question. Since the method

bool ValueObjectChild::SetData(DataExtractor &data, Status &error)

doesn't exist, what is the preferred way to update the contents of
scalar bitfields?

Is there any code in the repo demonstrating the technique?

I am interested, because for the language I am writing a plugin for
certain datatypes are MSBit-aligned, e.g. a Nat16 occupies the upper
portion of the bits in a (32-bit) word.

Viewing and setting of such variables thus involves shifting bits, and
I'd expect that ValueObjectChild (in bitfield mode) would do that for
me.

Thanks in advance for any clues,

cheers,

  Gabor

What is the debug information format being used for these? If it is
DWARF,
the location expression for the variable should take care of extracting
the
value correctly.

Hi Greg,

thanks for the very elaborate answer! Please find my replies inline,
below.

In my case `lldb` is reading DWARF. Things are being complicated a bit
by the fact that I am targeting a Wasm platform (WASI), and thus the
location of the
formal arguments is in locals (but this is comparable to registers on
common architectures).

Are you suggesting that the location expression should massage the
formal parameter?

Currently I emit

0x000000db:     DW_TAG_formal_parameter
                 DW_AT_name	("n")
                 DW_AT_decl_line	(5)
                 DW_AT_decl_column	(0x09)
                 DW_AT_type	(0x0000002b "Nat16")
                 DW_AT_location	(DW_OP_WASM_location 0x0 +1,
DW_OP_stack_value)

Again, thanks!

DW_OP_stack_value implies that after running this expression the value of
this variable exists on the DWARF stack. This should mean that the "Value"
would have a ValueType of eValueTypeScalar. I am guessing when you see these
variables you always get the entire integer value of all bitfields that
shared this integer. Is that correct?

Yes, without my modifications (grafting a ValueObjectChild into
ValueObjectVariable)
I've been seeing 655360 (0xA0000), where I wanted to see `10 : Nat16`.

and `Nat16` is defined as:

0x0000002b:   DW_TAG_base_type
               DW_AT_name	("Nat16")
               DW_AT_bit_size	(0x20)
               DW_AT_data_bit_offset	(0x10)
               DW_AT_encoding	(DW_ATE_unsigned)

Interesting, from reading the DWARF specification, it is legal for a
DW_TAG_base_type to have a DW_AT_bit_size and DW_AT_data_bit_offset, but

I'd hope so :slight_smile: This was my reading of the standard also.

LLDB is currently not handling this situation. We handle these for
bitfields, which are currently attached to DW_TAG_member of a struct.

Yes, C bitfields work fine, when displayed and set, while Nat16 shows
up shifted (and in order to set it, I have to shift accordingly on the
llvm CLI).

So we have two options to fix these kinds of variables:
- fix LLDB to handle the DW_AT_data_bit_offset and DW_AT_bit_size on
DW_TAG_base_type types (requires LLDB fix)

This would be my preference. I guess the fix would be of limited
extent and I can probably backport it easily. I am basing on a 10.0.1
LLDB snapshot.

- fix the DW_AT_location expression to shift and mask the integer with extra
DW_OP opcodes (no fix required in LLDB)

Hmmm, this could be a short-term alternative. I have to see how
`wasmtime` translates that to x86_64 DWARF. Also, how is setting of
locals is supposed to work in such a workaround? Can LLDB run the
location expression in reverse?

Cheers,

    Gabor