Obtaining the origin function for a local var after inlining

(I think I've asked a similar question off-list a couple of times, but
never got an answer)

Hi folks,

For [K]MSAN we need to figure out which inlined function a local var
originally belonged to in the source file.
E.g. when a local buffer %buf is declared in @bar(), but @bar() is
inlined into @foo(), then there's a local %buf.i in @foo(), but we
need to determine that the local came from @bar(). In the case of
nested inline functions we need the deepest one.

Is there any existing code for that? If not, which debug info
constructs do we need to look up to get this information?

https://llvm.org/docs/SourceLevelDebugging.html mentions
@llvm.dbg.addr as the source of information about a local var, but the
ToT Clang doesn't emit it. There're calls to @llvm.debug.declare in
the IR, but it's said to be deprecated, so I'm not sure if it's ok to
use it.

Thanks in advance,

I wanted to start a migration from dbg.declare to dbg.addr, but never finished it. For your purposes, they are probably equivalent. The slight semantic difference is that dbg.declare cannot be mixed with dbg.value to create a variable that is sometimes in memory and sometimes a pure value.

To distinguish between two inlined copies of the same variable, look at the inlinedAt location of the location of the dbg intrinsic. The inlinedAt location will be distinct for every inlined call site. You can see how it is used in CodeGen/AsmPrinter/Dwarf* by looking for uses of InlinedEntity:

using InlinedEntity = std::pair<const DINode *, const DILocation *>;

for (const auto &VI : Asm->MF->getVariableDbgInfo()) {
if (!VI.Var)
continue;
assert(VI.Var->isValidLocationForIntrinsic(VI.Loc) &&
“Expected inlined-at fields to agree”);

InlinedEntity Var(VI.Var, VI.Loc->getInlinedAt());

(I think I've asked a similar question off-list a couple of times, but
never got an answer)

Hi folks,

For [K]MSAN we need to figure out which inlined function a local var
originally belonged to in the source file.

If you are looking at a llvm.dbg.declar/value/addr intrinsic, then the DILocation attached to the intrinsic indirectly points there:

  DIScope *Scope = DILocation(dbg_intrinsic.getDebugLoc()).getScope();
  while (!isa<DISubprogram>(Scope))
    Scope = Scope->getScope();
  auto *origFunction = cast<DIFunction>(Scope);

if you want to find the function that it was inlined *into* then you need to follow the inlinedAt link in the DILoation.

-- adrian

>
> (I think I've asked a similar question off-list a couple of times, but
> never got an answer)
>
> Hi folks,
>
> For [K]MSAN we need to figure out which inlined function a local var
> originally belonged to in the source file.

If you are looking at a llvm.dbg.declar/value/addr intrinsic, then the DILocation attached to the intrinsic indirectly points there:

  DIScope *Scope = DILocation(dbg_intrinsic.getDebugLoc()).getScope();
  while (!isa<DISubprogram>(Scope))
    Scope = Scope->getScope();
  auto *origFunction = cast<DIFunction>(Scope);

This works, thank you!

(I had to slightly modify the code FWIW:
    DILocation *DIL = dbg_intrinsic.getDebugLoc();
    if (DIL) {
      DIScope *Scope = DIL->getScope();
      while (Scope && !isa<DISubprogram>(Scope))
        Scope = Scope->getScope().resolve();
      auto *origFunction = cast<DISubprogram>(Scope)
)

I also thought that it would be natural if the AllocaInst
corresponding to the llvm.dbg.declare() call will share the same
DILocation as the debug intrinsic.
Does anyone have an idea why this isn't so?
Right now one needs to build a mapping between AllocaInst and
llvm.dbg.declare() in order to get the debug info for the allocation.

(I think I've asked a similar question off-list a couple of times, but
never got an answer)

Hi folks,

For [K]MSAN we need to figure out which inlined function a local var
originally belonged to in the source file.

If you are looking at a llvm.dbg.declar/value/addr intrinsic, then the DILocation attached to the intrinsic indirectly points there:

DIScope *Scope = DILocation(dbg_intrinsic.getDebugLoc()).getScope();
while (!isa<DISubprogram>(Scope))
   Scope = Scope->getScope();
auto *origFunction = cast<DIFunction>(Scope);

This works, thank you!

(I had to slightly modify the code FWIW:
   DILocation *DIL = dbg_intrinsic.getDebugLoc();
   if (DIL) {
     DIScope *Scope = DIL->getScope();
     while (Scope && !isa<DISubprogram>(Scope))
       Scope = Scope->getScope().resolve();
     auto *origFunction = cast<DISubprogram>(Scope)
)

I also thought that it would be natural if the AllocaInst
corresponding to the llvm.dbg.declare() call will share the same
DILocation as the debug intrinsic.
Does anyone have an idea why this isn't so?

First off, this is up to the frontend to decide. But generally, an alloca is almost always part of the function prologue and the DILocation assigned to it is almost meaningless because it won't (directly) get generated into any code that could be associated with a dbeug line table entry. The second reason is that it could be that the dbg.declare is inlined and describing an sret value where the alloca belngs to the call site's stack frame.

Right now one needs to build a mapping between AllocaInst and
llvm.dbg.declare() in order to get the debug info for the allocation.

No, you can just call llvm::findDbgUsers() to find any debug intrinsics referring to any llvm::Instruction.

-- adrian

>
>>
>>
>>
>>>
>>> (I think I've asked a similar question off-list a couple of times, but
>>> never got an answer)
>>>
>>> Hi folks,
>>>
>>> For [K]MSAN we need to figure out which inlined function a local var
>>> originally belonged to in the source file.
>>
>> If you are looking at a llvm.dbg.declar/value/addr intrinsic, then the DILocation attached to the intrinsic indirectly points there:
>>
>> DIScope *Scope = DILocation(dbg_intrinsic.getDebugLoc()).getScope();
>> while (!isa<DISubprogram>(Scope))
>> Scope = Scope->getScope();
>> auto *origFunction = cast<DIFunction>(Scope);
> This works, thank you!
>
> (I had to slightly modify the code FWIW:
> DILocation *DIL = dbg_intrinsic.getDebugLoc();
> if (DIL) {
> DIScope *Scope = DIL->getScope();
> while (Scope && !isa<DISubprogram>(Scope))
> Scope = Scope->getScope().resolve();
> auto *origFunction = cast<DISubprogram>(Scope)
> )
>
> I also thought that it would be natural if the AllocaInst
> corresponding to the llvm.dbg.declare() call will share the same
> DILocation as the debug intrinsic.
> Does anyone have an idea why this isn't so?

First off, this is up to the frontend to decide. But generally, an alloca is almost always part of the function prologue and the DILocation assigned to it is almost meaningless because it won't (directly) get generated into any code that could be associated with a dbeug line table entry. The second reason is that it could be that the dbg.declare is inlined and describing an sret value where the alloca belngs to the call site's stack frame.

> Right now one needs to build a mapping between AllocaInst and
> llvm.dbg.declare() in order to get the debug info for the allocation.

No, you can just call llvm::findDbgUsers() to find any debug intrinsics referring to any llvm::Instruction.

It also turns out that for certain AllocaInst instances there're no
llvm.debug.declare intrinsics referring to them, only several
different llvm.dbg.value calls.
For most of them the DILocation match that of the inlined local
variable, however some reference other code locations, e.g. places
where these allocas are passed as function parameters.

For example, in the attached IR file (ptrace.ll, generated from the
attached ptrace.c) the following llvm.debug.value() calls reference
the |siginfo| variable declared at line 888 in ptrace.c:

  %siginfo = alloca %struct.siginfo, align 8
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6801, metadata !DIExpression()) #6, !dbg !7046
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !7022, metadata !DIExpression()) #6, !dbg !7027
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6793, metadata !DIExpression()) #6, !dbg !6967
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6937, metadata !DIExpression()) #6, !dbg !6942
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6556, metadata !DIExpression(DW_OP_deref)), !dbg !6931
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6556, metadata !DIExpression(DW_OP_deref)), !dbg !6931
  call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6556, metadata !DIExpression(DW_OP_deref)), !dbg !6931

E.g. here the second and the fourth intrinsics have debug info values
!7027 and !6942 pointing at lines 670 and 654 respectively (siginfo_t*
parameters of ptrace_setsiginfo() and ptrace_getsiginfo())
The last three intrinsics indeed point to line 888, where |siginfo| is declared.

Is the DW_OP_deref tag enough to distinguish the right llvm.dbg.value?

ptrace.ll (1.64 MB)

ptrace.c (32.4 KB)

(I think I've asked a similar question off-list a couple of times, but
never got an answer)

Hi folks,

For [K]MSAN we need to figure out which inlined function a local var
originally belonged to in the source file.

If you are looking at a llvm.dbg.declar/value/addr intrinsic, then the DILocation attached to the intrinsic indirectly points there:

DIScope *Scope = DILocation(dbg_intrinsic.getDebugLoc()).getScope();
while (!isa<DISubprogram>(Scope))
  Scope = Scope->getScope();
auto *origFunction = cast<DIFunction>(Scope);

This works, thank you!

(I had to slightly modify the code FWIW:
  DILocation *DIL = dbg_intrinsic.getDebugLoc();
  if (DIL) {
    DIScope *Scope = DIL->getScope();
    while (Scope && !isa<DISubprogram>(Scope))
      Scope = Scope->getScope().resolve();
    auto *origFunction = cast<DISubprogram>(Scope)
)

I also thought that it would be natural if the AllocaInst
corresponding to the llvm.dbg.declare() call will share the same
DILocation as the debug intrinsic.
Does anyone have an idea why this isn't so?

First off, this is up to the frontend to decide. But generally, an alloca is almost always part of the function prologue and the DILocation assigned to it is almost meaningless because it won't (directly) get generated into any code that could be associated with a dbeug line table entry. The second reason is that it could be that the dbg.declare is inlined and describing an sret value where the alloca belngs to the call site's stack frame.

Right now one needs to build a mapping between AllocaInst and
llvm.dbg.declare() in order to get the debug info for the allocation.

No, you can just call llvm::findDbgUsers() to find any debug intrinsics referring to any llvm::Instruction.

It also turns out that for certain AllocaInst instances there're no
llvm.debug.declare intrinsics referring to them, only several
different llvm.dbg.value calls.

In clang, all variables that are stored in allocas are described by dbg.declares in the frontend, b ut later LLVM transformations may lower them to dbg.values. In optimized code it is expected that you will only see dbg.declares for variables whose address is actually taken.

- A dbg.declare declares a variable lives in a particular memory location. Its DILocation usually points to the declaration of the variable.
- A dbg.value says that the result of a computation (an LLVM SSA value) is the current value of a source variable. A dbg.value's DILocation points typically to the location of that computation, though we aren't particularly consistent about that and the DILocation is really only used for its inlinedAt field in the backend.

For most of them the DILocation match that of the inlined local
variable, however some reference other code locations, e.g. places
where these allocas are passed as function parameters.

For example, in the attached IR file (ptrace.ll, generated from the
attached ptrace.c) the following llvm.debug.value() calls reference
the |siginfo| variable declared at line 888 in ptrace.c:

%siginfo = alloca %struct.siginfo, align 8
call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6801, metadata !DIExpression()) #6, !dbg !7046
call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,

this is
!6801 = !DILocalVariable(name: "from", arg: 2, scope: !6794, file: !6795, line: 14, type: !6798)

metadata !7022, metadata !DIExpression()) #6, !dbg !7027
call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,

this is
!7022 = !DILocalVariable(name: "info", arg: 2, scope: !7016, file: !3, line: 670, type: !7019)

metadata !6793, metadata !DIExpression()) #6, !dbg !6967
call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6937, metadata !DIExpression()) #6, !dbg !6942
call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6556, metadata !DIExpression(DW_OP_deref)), !dbg !6931

the last three are describing
!6556 = !DILocalVariable(name: "siginfo", scope: !6546, file: !3, line: 888, type: !5009)

the last two look redundant (probably due to a bug in whatever transformation inserted them thrice).

call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6556, metadata !DIExpression(DW_OP_deref)), !dbg !6931
call void @llvm.dbg.value(metadata %struct.siginfo* %siginfo,
metadata !6556, metadata !DIExpression(DW_OP_deref)), !dbg !6931
E.g. here the second and the fourth intrinsics have debug info values
!7027 and !6942 pointing at lines 670 and 654 respectively (siginfo_t*
parameters of ptrace_setsiginfo() and ptrace_getsiginfo())
The last three intrinsics indeed point to line 888, where |siginfo| is declared.

Is the DW_OP_deref tag enough to distinguish the right llvm.dbg.value?

I don't understand what you mean by "right" here. These intrinsics are describing different inlined variables that happen to share the same value at htis point in the program.

-- adrian