[RFC] Extending llvm-mc .loc directive with labeling support

This RFC proposes an extension to the existing .loc directive provided by the llvm-mc assembler. The goal is to enable referencing specific entries within the debug_line section via labels - whereas currently only the beginning of the line table can be referenced via a label.

Proposed Extension

We propose adding an optional parameter debug_line_label to the .loc directive. This parameter would be followed by a user-defined label name.

Example:
.loc 1 1 13 debug_line_label .func_line_entries

When this label is specified, the assembler will emit the label name followed by the existing .loc information to the debug_line section of the assembly output.

Use Case
This functionality is needed for generating relative offsets into the debug_line section in order to support a new DWARF attribute - see the RFC for the DW_AT_LLVM_stmt_sequence attribute and the PR for the clang implementation of the attribute.
This extension to .loc will be used to generate the DW_AT_LLVM_stmt_sequence attribute as follows:

 _main:
.Lfunc_begin0:
	.cfi_startproc
	.file 1 "test.cpp" 
	.loc 1 1 13  debug_line_label .Lmain_line_entries
    [...]
	.byte	14                                           # Abbrev [14] DW_TAG_subprogram
	.long	.Lmain_line_entries - .Lline_table_start0    # DW_AT_LLVM_stmt_sequence
    [...]

Implementation Details
The implementation will involve modifying the llvm-mc parser to recognize the new debug_line_label parameter and update the code generation logic to emit the label name followed by the existing .loc information to the debug_line section.

Backward Compatibility
Existing .loc directives will remain functional. The new parameter is entirely optional.

Discussion Points

  • Are there any potential drawbacks to this extension?
  • Are there alternative approaches to achieve similar functionality?
1 Like

I take it the change to .loc syntax is because you don’t want to depend on -ffunction-sections. (The assembler could automatically emit labels derived from the section name in that case.) Would it be better to implement an entirely separate directive? Because the effect actually has 3 steps:

  1. Terminate any in-flight sequence
  2. Emit the label
  3. Start a new sequence

If .loc already has code to deal with that (e.g. when the new .loc is for a different section) then arguably a new clause on .loc would be preferable. I’m not actually sure which I prefer.

Changes to assembly syntax, especially when related to DWARF, kind of want to be coordinated with GNU binutils and GCC. In this case it’s an LLVM extension, so deferring that part until we get some solid experience with the feature would be the right move. But in the long run, and especially if this is standardized, that communication needs to happen. (That community might have different opinions on how to convey the requirement to the assembler, e.g., an entirely separate directive rather than adding yet another new trailing clause to .loc.)

1 Like

I was thinking that auto-generating names would not be easily discoverable and would not be developer-friendly. So I think explicit syntax would work best - but this might just be a personal preference - I am OK either way.

Implementation-wise it doesn’t seem that much different to have another instruction rather than extending .loc - slight advantage to .loc but that shouldn’t be the blocking factor.

A separate instruction seems to be better if we do end up needing to change it later depending on GCC opinions.

How about:
.locLabel THE_LABEL

Where the next line entry emitted after this instruction would be labeled with THE_LABEL and would also be the start of a new sequence.

A .locLabel instruction would invalidate the previous ones - meaning that in a stream of consecutive .locLabel statements, only the latest one would have any effect.

Example usage:

_main:
.Lfunc_begin0:
	.cfi_startproc
	.file 1 "test.cpp" 
	.loc 1 1 13
        .locLabel .Lmain_line_entries
    [...]
	.byte	14                                           # Abbrev [14] DW_TAG_subprogram
	.long	.Lmain_line_entries - .Lline_table_start0    # DW_AT_LLVM_stmt_sequence
    [...]

I don’t have a strong opinion, and there’s a reasonable argument for adding it to .loc so go ahead.

1 Like

Sounds good - I’ll go ahead with implementing the original proposal - debug_line_label argument addition to .loc.

I initially thought the label would be emitted into the .text stream, but after re-reading the DW_AT_stmt_sequence use case, I understand the problem this is intended to solve, and I think this is a reasonably good solution.

One concern is that .loc is also implemented by binutils as, so there is risk of LLVM extending .loc in ways that are incompatible with future GNU .loc extensions. That could be an argument in favor of a separate directive like .locLabel.

cc @MaskRay for assembler compatibility concerns

I have read [RFC] New DWARF attribute for symbolication of merged functions about the DW_AT_LLVM_stmt_sequence but
I am still having trouble to follow the assembly examples.
_main is defined while the proposed .loc or .locLabel references a magic symbol .Lmain_line_entries.
How does this work?

Can you describe the behavior in detail? I think rephrasing the first few paragraphs will help other readers as well.

About the assembler extension:

  • The assembler directives are case insensitive. The canonical spelling is lower case, so .locLabel should be changed.
  • Assembler directives should not define magic symbols.

Extending llvm-mc .loc directive

by the llvm-mc assembler …

This library is called the LLVM integrated assembler. You can use the term instead of llvm-mc, which is an internal testing tool.

with labeling support

This is too vague. Assemblers support label: well.

1 Like

_main is defined while the proposed .loc or .locLabel references a magic symbol .Lmain_line_entries .

The .loc or .locLabel will be defining the label so it can be used by

.long	.Lmain_line_entries - .Lline_table_start0

i.e .loc 1 1 13 debug_line_label .func_line_entries means "Emit this location in the line table, and before you do, define the label ‘.func_line_entries’ "

I will rephrase the original post to make it clearer. EDIT - it looks like I can’t edit the original post :\

Assembler directives should not define magic symbols.

Is the behavior as described in this comment OK ?

MCObjectStreamer::emitLabel defines symbols (labels). While directive handlers also call emitLabel, they only define temporary symbols. Therefore, I think it’s best not to make directive handlers perform label definition. You can do

xxx:
.loc ... xxx  # or .loclabel

instead.

Discourse users of Trust Level 4 can edit others’ posts. You can probably prepare an updated description and ask such a user (I am not) to edit the post on your behalf… Understanding Discourse Trust Levels

To be clear, you’re proposing that we normally define a label xxx via xxx: and then when this label is specified as an argument to the .loc directive, the label would be moved(redefined) to the debug_line section.
i.e. if the “xxx” label is not specified in the .loc directive, it will be a label referring to the .text section - but after the .loc directive specifies it it will be moved (or redefined) in the debug_line section.
I am not sure if the assembler currently moves labels around or redefines them, just want to confirm this is what you are proposing.

Also, looking at the existing patterns I think the syntax for .loc needs to include the debug_line_label also, not just the xxx label. Does this sound right ?

I’ve dited the OP here and will try find an L4 user to update.

The gist is that a directive should not define a non-temporary label.
A directive could define temporary labels like DOT for internal use cases (e.g. .cfi_startproc), but these labels should be invisible to the user.
.long .Lmain_line_entries - .Lline_table_start0 is a clear signal that the defined label is visible and should not be defined automatically by a directive.

Whether you write xxx: .loc ... xxx or .loc ... xxx; xxx: does not matter. The label and the directive have the same location.

Note that the proposed .locLabel directive is defining a label within the magically-assembler-generated .debug_lines section, not at the point in the code where the directive is placed.

I don’t think having the user write a label with xxx: syntax in the file would be reasonable, if the label doesn’t actually get placed where the user wrote it.

1 Like

Sorry, I am not sure I understand. The goal of this extension is for the directive to define a non-temporary label - so that this label can be accessed by the user. If this is a bad pattern, then I am open other ways of doing it.

Maybe a practical example would help clarify:

#!/bin/bash

# Create a temporary directory
temp_dir=$(mktemp -d)
cd "$temp_dir"

# Write the LLVM assembly test file
cat << EOF > test.s
    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 11, 0    sdk_version 11, 3
    .globl  _simple_function
    .p2align    2
    .file   1 "test.c"
_simple_function:
    .loc    1 10 0
    mov     x0, #1
label1:
    .loc    1 11 0
label2:
    mov     x0, #2
    ret
EOF

# Assemble the .s file into a binary
xcrun -sdk macosx clang -target arm64-apple-macos11 -c test.s -o test.o

# Disassemble the binary
xcrun -sdk macosx llvm-objdump --disassemble --line-numbers --source test.o

# Dump the line table of the binary
xcrun -sdk macosx llvm-dwarfdump --debug-line test.o -v

# Clean up
cd -
rm -rf "$temp_dir"

The above example shows as output:

Disassembly of section __TEXT,__text:
[...]
; label2():
       4: d2800040     	mov	x0, #0x2                ; =2
[...]
.debug_line contents:
[...]
            0x0000000000000004     11      0      1   0             0       0  is_stmt
0x00000037: 02 DW_LNS_advance_pc (addr += 8, op-index += 0)
[...]

So here, both label1 and label2 point to __TEXT,__text:0x04.
In the DWARF attribute RFC - and also as requested in this PR comment - there is a requirement for the user to be able to reference .debug_line:0x37.

As the .debug_line:0x37 is generated by the .loc directive, I was thinking that the most straight forward way to allow the user to reference it is via a non-temporary label generated by the .loc directive.

I understand the proposal now.

The goal of the extension is to provide a way to reference the generated line entries in the debug_line section.

The proposal resembles .cfi_label [MC] Support .cfi_label by MaskRay ¡ Pull Request #97922 ¡ llvm/llvm-project ¡ GitHub

The label is emitted to .debug_line (or Mach-O __debug_line).
The label will be referenced by .debug_info with a DW_AT_LLVM_stmt_sequence attribute.

The feature resembles Mach-O .zerofill __DATA, __bss, sym_a, 1, which emits a label not in the current section.
I think this is ok.

For your “Maybe a practical example would help clarify:” example, label1 and label2 have no distinction in MC’s internal representation or the object file: they are located at the same offset in the same fragment.

I informed binutils and gcc: DW_TAG_compile_unit's references to .debug_line and .loc directive extension https://gcc.gnu.org/pipermail/gcc/2024-July/244277.html

1 Like

For expediency, how about we revisit the .loclabel proposal? I think we have a freer hand in adding new assembler directives, and by all means, we should tell GNU binutils that we’ve added a directive for the purpose of inserting a label into the .debug_line section. I kind of think having a separate directive is more orthogonal, and maybe more readable. The documentation will be more discoverable, since a separate directive is more searchable.

.loc_label <label> makes sense to me.

I discovered that GNU assembler supports .cfi_label for a label in .eh_frame/.debug_frame. I supported it for the LLVM integrated assembler LLVM's `MCParser` does not understand `.cfi_label` directives ¡ Issue #97222 ¡ llvm/llvm-project ¡ GitHub

1 Like

Yes, given that we have a very similar precedent with .cfi_label - I think .loc_label <label> is the way to go. I’ll go ahead with implementing the .loc_label version of this.