output metadata for extern declared functions?

Hi,

I haven’t worked on Clang before and have a simple question (supposedly) for those who are familiar with metadata and LLVM bitcode generation. Assume that I have a function which is declared as extern as the following:

extern int convert(unsigned u);

I want to have Clang generate metadata nodes for it by adding a metadata node of subprogram into the list of subprograms defined in the current compilation unit. The subprogram metadata node and its associated nodes should have the info of the type signature. For example, I get the following set of metadata nodes for function

int convert(unsigned u) {return 0;}

!3 = metadata !{metadata !4, metadata !10, metadata !33}
!4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !“convert”, metadata !“convert”, metadata !"", i32 23, metadata !6, i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false, i32 (i32)* @convert, null, null, metadata !2, i32 23} ; [ DW_TAG_subprogram ] [line 23] [def] [convert]
!7 = metadata !{metadata !8, metadata !9}
!8 = metadata !{i32 786468, null, null, metadata !“int”, i32 0, i64 32, i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] [line 0, size 32, align 32, offset 0, enc DW_ATE_signed]
!9 = metadata !{i32 786468, null, null, metadata !“unsigned int”, i32 0, i64 32, i64 32, i64 0, i32 0, i32 7} ; [ DW_TAG_base_type ] [unsigned int] [line 0, size 32, align 32, offset 0, enc DW_ATE_unsigned]

which allows me to extract the source-level type signature for the function by using LLVM debug info APIs. I’d like to get the source-level type signature of the extern declared function, but Clang does not produce metadata for it.

By looking at the Clang AST for the extern declared function

-FunctionDecl 0x70598c0 <line:23:1, col:30> convert ‘int (unsigned int)’ extern

-ParmVarDecl 0x7059800 <col:20, col:29> u ‘unsigned int’

I know that Clang has the information I need, and I just need to turn off or remove the filter that ignores functions whose bodies are not available during metadata node or/and code generation. Are there simple switches that do this? If not, can anyone please explain how to do it by pointing me to the right code snippets?

Thanks very much,

For those following this thread a critical detail would be that you want debug info metadata.

There’s no simple flag for this as we don’t attach the function debug info metadata to every declaration, just to definitions (there’s no filtering step)

But why do you want this anyway? If you’re performing optimizations/transformations based on debug info metadata, that’s not really the desired approach. Debug info is not meant to affect code generation.

I'm fixing an existing source-to-source translator, and it was unfortunately written by someone using LLVM as the intermediate step before Clang was mature. I don't need to generate object code at all, and performance is not a concern here.

Sounds like that the only way for me to save this translator is to patch Clang by generating debug info metadata for declarations whose definitions are not available. Is this a feasible approach? I mean how difficult it would be comparing with hacking into Clang AST and start from there? I guessed that it would be much easier than starting over with Clang AST from scratch. Since I haven't worked on Clang before, any suggestion and help would be greatly appreciated.

Generally we don’t encourage source to source translation via code generation from the AST even - its best to use the AST to motivate edits to the original source. The AST just doesn’t have the fidelity to regenerate the original source perfectly.

As for your current problem, I guess ‘is it cheaper to rewrite in a better way’ depends on the complexity, but i assume its hard to justify a rewrite unless this tool is going to live a long time further and need new features/maintenance.

That said, I’m not sure how much I can help you produce debug info for every function declaration. If I were doing this I’d go back and look at the support I added for imported directives (using declarations and directives) and look at other unhandled decks that could be supported.

Can you please be more specific about the imported directives, such as the names of related classes? I’m totally new to Clang and only used little about LLVM.

I took a look at clang::CodeGen::CGDebugInfo and it seems that it uses llvm::DIBuilder as the underlying horsepower. I played with the if statements related to “isDefinition” in DIBuilder::createFunction and DIBuilder::createMethod, and it didn’t work. Obviously, I’ve missed many things here.

Since generating debug info metadata has been done for function definitions, so conceptually generating the same type of data for declarations should be easier. I might be wrong though. The more specific you can point me to some code, the better.

Thanks very much,

I ran Clang in a debugger and traced how debug info metadata was emitted. It’s a part of code generation of functions.

I have a question about when the declaration of an extern function is emitted. For example, I have very simple code:

extern int convert(unsigned u);

void foo() {
int x = convert(0);
}

The corresponding LLVM code is:


; Function Attrs: nounwind uwtable
define void @foo() #0 {
entry:
%x = alloca i32, align 4
call void @llvm.dbg.declare(metadata !{i32* %x}, metadata !8), !dbg !10
%call = call i32 @convert(i32 0), !dbg !10
store i32 %call, i32* %x, align 4, !dbg !10
ret void, !dbg !11
}

declare i32 @convert(i32) #2 // when this line is emitted

My question is where the “declare i32 @convert(i32) #2” line is emitted. I tried many breakpoints in EmitXXX family of functions in CodeGenModule and noticed that this piece of code

// Ignore declarations, they will be emitted on their first use.
if (const FunctionDecl *FD = dyn_cast(Global)) {
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition())
return;

causes the postpone of emission of the convert function declaration, but I couldn’t figure out where and when the declaration is emitted. I set a breakpoint in the CodeGenModule::EmitDeferred() function, but nothing was done in that function.

Any help is really appreciated.

Do you care about generating debug info for declarations of functions that aren’t even called? If so, then the approach you’re taking will be insufficient (since we won’t even emit an IR declaration for such a function)

If not, then you might want to take a look at where the IR for the call is constructed (I don’t know where this is, but you seem to be gaining some proficiency tracing through Clang/LLVM internals that will serve you well here) and then see how the target of the call is built and passed in to that.

No, I don’t care about those functions that aren’t called.

Okay, I walked through the EmitCall family of functions of CodeGenFunction, but didn’t notice much. I guess that I’ll have to trace them through more carefully to see when it is done.

My thought is to copy the debug info metadata emission logic used during generating the function prolog to function declarations. Do you think if it works?

Thanks,

No, I don't care about those functions that aren't called.

Okay, I walked through the EmitCall family of functions of CodeGenFunction,
but didn't notice much. I guess that I'll have to trace them through more
carefully to see when it is done.

My thought is to copy the debug info metadata emission logic used during
generating the function prolog to function declarations. Do you think if it
works?

Easiest is looking at getOrCreateFunctionType. You can probably hook
into EmitCall if you want to emit a debug info declaration for the
function.

-eric

I’ve finally figured out where a function declaration is lazily emitted. It is done in the CodeGenModule::GetOrCreateLLVMFunction() function, which is in turned called by EmitCall functions. I wrote a function which is similar to the CGDebugInfo::EmitFunctionStart() function and hooked it into the GetOrCreateLLVMFunction function. It seems to work, since it passed all LLVM & Clang regression tests except two.

I’m now having two other questions.

(1) How to add a Clang command line option to control the call to my function? The hook-up point in my code is as follows:

llvm::Constant *
CodeGenModule::GetOrCreateLLVMFunction(StringRef MangledName,
llvm::Type *Ty,
GlobalDecl D, bool ForVTable,
llvm::AttributeSet ExtraAttrs) {

llvm::Function *F = llvm::Function::Create(FTy,
llvm::Function::ExternalLinkage,
MangledName, &getModule());

if (ExtraAttrs.hasAttributes(llvm::AttributeSet::FunctionIndex)) {
llvm::AttrBuilder B(ExtraAttrs, llvm::AttributeSet::FunctionIndex);
F->addAttributes(llvm::AttributeSet::FunctionIndex,
llvm::AttributeSet::get(VMContext,
llvm::AttributeSet::FunctionIndex,
B));
}

// Emit subprogram debug descriptor for this new declaration
// if “-gg” is given like clang -gg, then call this function
EmitFunctionDeclaration(D, F); – hook up line

// This is the first use or definition of a mangled name. If there is a
// deferred decl with this name, remember that we need to emit it at the end
// of the file.
llvm::StringMap::iterator DDI = DeferredDecls.find(MangledName);

}

It seems to be very complicated to do this in Clang. I’ve spent hours and still couldn’t solve it.

(2) I’ve run LLVM&Clang regression tests on my code and there are two failures. One of them is debug-info-class.cpp (the other is debug-info-template-quals.cpp), and the failure message is:

llvm-3.3.src/tools/clang/test/CodeGenCXX/debug-info-class.cpp:45:11: error: expected string not found in input
// CHECK: DW_TAG_class_type ] [B]
^
:230:75: note: scanning from here
!49 = metadata !{i32 786445, metadata !1, metadata !46, metadata !“HdrSize”, i32 17, i64 0, i64 0, i64 0, i32 4096, metadata !50, i32 52} ; [ DW_TAG_member ] [HdrSize] [line 17, size 0, align 0, offset 0] [static] [from ]
^
:231:109: note: possible intended match here
!50 = metadata !{i32 786470, null, null, metadata !"", i32 0, i64 0, i64 0, i64 0, i32 0, metadata !23} ; [ DW_TAG_const_type ] [line 0, size 0, align 0, offset 0] [from int]
^

debug-info-class-2.ll (21 KB)

I've finally figured out where a function declaration is lazily emitted.
It is done in the CodeGenModule::GetOrCreateLLVMFunction() function, which
is in turned called by EmitCall functions. I wrote a function which is
similar to the CGDebugInfo::EmitFunctionStart() function and hooked it into
the GetOrCreateLLVMFunction function. It seems to work, since it passed all
LLVM & Clang regression tests except two.

I'm now having two other questions.

(1) How to add a Clang command line option to control the call to my
function? The hook-up point in my code is as follows:

llvm::Constant *
CodeGenModule::GetOrCreateLLVMFunction(StringRef MangledName,
                                       llvm::Type *Ty,
                                       GlobalDecl D, bool ForVTable,
                                       llvm::AttributeSet ExtraAttrs) {
  ...
  llvm::Function *F = llvm::Function::Create(FTy,

llvm::Function::ExternalLinkage,
                                             MangledName, &getModule());
  ...
  if (ExtraAttrs.hasAttributes(llvm::AttributeSet::FunctionIndex)) {
    llvm::AttrBuilder B(ExtraAttrs, llvm::AttributeSet::FunctionIndex);
    F->addAttributes(llvm::AttributeSet::FunctionIndex,
                     llvm::AttributeSet::get(VMContext,

llvm::AttributeSet::FunctionIndex,
                                             B));
  }

  // Emit subprogram debug descriptor for this new declaration
  // if "-gg" is given like clang -gg, then call this function
  EmitFunctionDeclaration(D, F); -- hook up line

  // This is the first use or definition of a mangled name. If there is a
  // deferred decl with this name, remember that we need to emit it at the
end
  // of the file.
  llvm::StringMap<GlobalDecl>::iterator DDI =
DeferredDecls.find(MangledName);
  ...
}

It seems to be very complicated to do this in Clang. I've spent hours and
still couldn't solve it.

Have you looked at how other command line arguments are handled in Clang?

(2) I've run LLVM&Clang regression tests on my code and there are two
failures. One of them is debug-info-class.cpp (the other is
debug-info-template-quals.cpp), and the failure message is:

llvm-3.3.src/tools/clang/test/CodeGenCXX/debug-info-class.cpp:45:11:
error: expected string not found in input
// CHECK: DW_TAG_class_type ] [B]
          ^
<stdin>:230:75: note: scanning from here
!49 = metadata !{i32 786445, metadata !1, metadata !46, metadata
!"HdrSize", i32 17, i64 0, i64 0, i64 0, i32 4096, metadata !50, i32 52} ;
[ DW_TAG_member ] [HdrSize] [line 17, size 0, align 0, offset 0] [static]
[from ]
                                                                          ^
<stdin>:231:109: note: possible intended match here
!50 = metadata !{i32 786470, null, null, metadata !"", i32 0, i64 0, i64
0, i64 0, i32 0, metadata !23} ; [ DW_TAG_const_type ] [line 0, size 0,
align 0, offset 0] [from int]

^
--

However, the generated bitcode assembly does have DW_TAG_class_type ] [B].
The related entry is:

!31 = metadata !{i32 786434, metadata !1, null, metadata !"B", i32 11, i64
64, i64 64, i32 0, i32 0, null, metadata !32, i32 0, metadata !31, null} ;
[ DW_TAG_class_type ] [B] [line 11, size 64, align 64, offset 0] [from ]

I've attached the entire code to the letter. I'm wondering what is going,
and how can I solve this?

FileCheck is order dependent. Your change probably caused the class type to
be emitted earlier, prior to some other aspect of the file that's being
checked (eg: the file might be "a b c" and the checks might be "a" and "c"
- but if your change caused things to change order to "c a b" then
FileCheck will be searching for "c" after "a" and not find it - the
FileCheck output should tell you where it's searching from so you can
confirm this hypothesis)

The checks may need to be generalized (using "CHECK-DAG" where the order
isn't dependent (read the FileCheck documentation for more details)) or
perhaps just changed if there's no good generalization.

- David

Sure, I’ve followed other flags and examples, especially one from here: . However, it worked finally when I changed -gg to some other value, say -declaration-descriptor. Clang doesn’t want to other people to use “-gg.” :frowning: