Questions about the work of StripDeadDebugInfo pass

Hello everyone,

I’m investigating if it is possible to reduce debug info in a submodule that is extracted from a bigger one.
It would be nice if someone could help me figure out a couple of points in the work of StripDeadDebugInfo pass.

What our tool does with input module is it splits it into submodules. Each submodule contains an entry function, called functions and used global variables. Th several optimization passes are executed on each submodule.
One of them is StripDeadDebugInfo pass. However, when I look in a submodule IR I see that not only compile units with entry function and dependenсies are included into debug metadata, but several other compile units.
I looked into implementation of StripDeadDebugInfo and saw two points that are unclear to me.

  1. First condition for living compile units is existence of DISubprograms that references to those units:
  for (const DISubprogram *SP : F.subprograms())
    if (SP->getUnit())
      LiveCUs.insert(SP->getUnit());

The question here is: can we also check that there is a function definition that is present in current module and contains this DISubprogram?

  1. Another one place where extra compile units may be included is a check for presence of constant global variable expression in a compile unit:
  if (DIG->getExpression() && DIG->getExpression()->isConstant())
    LiveGVs.insert(DIG);
  ...
  if (LiveGVs.count(DIG))
    LiveGlobalVariables.push_back(DIG);
  ...
  if (!LiveGlobalVariables.empty())
    LiveCUs.insert(DIC);

What case is covered by this check for const global variable? May there be a situation when constant global var is defined in one translation unit, but used also in another one?
Or it is only for case when there is no functions in processed module?

I think these are good questions. StripDeadDebugInfo starts by instantiating a DebugInfoFinder for the module, so it’s worth looking there too. I haven’t looked at these passes before, but I’m intrigued enough by the questions to look through them.

DebugInfoFinder remembers all compile units in the module, and each compile unit’s “retained types” which might include DISubprograms. It also remembers every global variable and imported entity associated with each compile unit.
DebugInfoFinder then does the step you are wondering about, iterating all Functions in the module and remembering any DISubprogram attached to a function (and even looks through all instructions to find any DISubprogram for an inlined function).

StripDeadDebugInfo then works back from globals and DISubprograms to determine which compile units to retain. However, as DebugInfoFinder has already identified all these from all compile units, this means (for example) any compile unit that has a DISubprogram in its retained type list will be retained by StripDeadDebugInfo, even if those DISubprograms don’t have any attached Function.

What I see in StripDeadDebugInfo is that it is iterating the Module globals to find the relevant global variables; however, it is iterating DebugInfoFinder’s subprograms to decide which compile units to keep, and probably it should be iterating the Module (same as DebugInfoFinder) instead. Likewise, StripDeadDebugInfo is iterating DebugInfoFinder’s list of compile units to derive the list of global variables to keep, and probably it should be iterating the compile units that StripDeadDebugInfo has already identified as “live” instead (in LiveCUs).

I’d be curious what other debug-info experts think; I don’t really do much with the metadata side of things. Maybe @dexonsmith or @adrian.prantl could chime in.

I think so. There are two kinds of DISubprograms: Function declarations (they don’t have a unit: field) and function definitions, which are distinct and attached to an IR Function. There is no reason to keep a DISubprogram definition if it isn’t attached to anything. That said, an unused distinct DISubprogram will automatically become unreachable and thus should disappear when the Function is removed.

May there be a situation when constant global var is defined in one translation unit, but used also in another one?

I would assume that this can happen in an LTO build.

1 Like

@pogo59 Thank you a lot for your thoughts.

@adrian.prantl Is it possible to provide a minimal example of sources and compile commands to reproduce such scenario?

The code that I investigates has many static const globals in different translation units. When we extract some function all compile units that have these static const globals are also copied in that submodule. There even may be no globals that are used by extracted function. And this pass doesn’t strip those compile units. So, I’d like to find additional conditions that might help to exclude compile units with constant globals as well.

What I was thinking of was something like:

a.c:
int global;
b.c
extern int global;
void f() { global++; }

clang -c -S -emit-llvm -o a.ll -o a.c
clang -c -S -emit-llvm -o b.ll -o b.c
llvm-link a.ll b.ll -S -o linked.ll

not sure if that’s what you had in mind though.

Yes, I tried similar example, but with const int global because !DIExpression(DW_OP_constu, ... is generated for const global variable.

file1.cpp:
extern const int GLOB_CONST;
void func1() {
  int a = GLOB_CONST;
}

file2.cpp:
extern const int GLOB_CONST;
const int GLOB_CONST = 4321 * 10;

void func2() {
  int b = GLOB_CONST;
}

Resulting IR after linkage:

Resulting IR after linkage
@GLOB_CONST = dso_local constant i32 43210, align 4, !dbg !0

define dso_local void @_Z5func1v() !dbg !10 {
entry:
  %a = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %a, metadata !14, metadata !DIExpression()), !dbg !15
  %0 = load i32, i32* @GLOB_CONST, align 4, !dbg !16
  store i32 %0, i32* %a, align 4, !dbg !15
  ret void, !dbg !17
}

define dso_local void @_Z5func2v() !dbg !18 {
entry:
  %b = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %b, metadata !19, metadata !DIExpression()), !dbg !20
  store i32 43210, i32* %b, align 4, !dbg !20
  ret void, !dbg !21
}

!llvm.dbg.cu = !{!7, !2}
!llvm.module.flags = !{!9}

!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "GLOB_CONST", scope: !2, file: !3, line: 2, type: !5, isLocal: false, isDefinition: true)
!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, producer: "clang version 14.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, nameTableKind: None)
!3 = !DIFile(filename: "module2_std.cpp", directory: "/localdisk/test")
!4 = !{!0}
!5 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !6)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !8, producer: "clang version 14.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
!8 = !DIFile(filename: "module1_std.cpp", directory: "/localdisk2/test")
!9 = !{i32 2, !"Debug Info Version", i32 3}
!10 = distinct !DISubprogram(name: "func1", linkageName: "_Z5func1v", scope: !8, file: !8, line: 3, type: !11, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !7, retainedNodes: !13)
!11 = !DISubroutineType(types: !12)
!12 = !{null}
!13 = !{}
!14 = !DILocalVariable(name: "a", scope: !10, file: !8, line: 4, type: !6)
!15 = !DILocation(line: 4, column: 7, scope: !10)
!16 = !DILocation(line: 4, column: 11, scope: !10)
!17 = !DILocation(line: 5, column: 1, scope: !10)
!18 = distinct !DISubprogram(name: "func2", linkageName: "_Z5func2v", scope: !3, file: !3, line: 4, type: !11, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !13)
!19 = !DILocalVariable(name: "b", scope: !18, file: !3, line: 5, type: !6)
!20 = !DILocation(line: 5, column: 7, scope: !18)
!21 = !DILocation(line: 6, column: 1, scope: !18)

So, DIGlobalVariableExpression looses DW_OP_constu attribute if it has extern specifier.

Sorry for a long silence.
I’ve created a review ⚙ D122163 [StripDeadDebugInfo] Drop dead CUs with proposed changes.