Function start address

Hi

I am using LLVM Pass combined with dwarf debug information to get all the function’s start address. My steps are below:

First, I write the function pass to get the start line of each function, which is finished.

Then, based on the start line of every single function, I try to query the specific line from the dwarf’s line binary table, which is generated with llvm-dwarfdump -debug-line.

However, About one third of the whole functions’ start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don’t think dwarf will miss so many functions’ start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

Regards
Muhui

Any particular reason you’re using debug info to achieve this (& if you are, why you’re using the line table?)? You could query the object/executable file’s symbol table to find all the functions in an object or executable, and the instruction/address they start at. Or, if you are using debug info for some reason, you could look in the debug_info rather than the line table, and find the DW_TAG_subprogram for each function and look at its low_pc.

[Re-sending with llvm-dev included this time]

Hi Muhui,

Are the functions emitted to the final binary? If a function is not used, there might not be any object code for it in the final binary. Naturally there would be no entry in the line table in this case.

If the function does exist in the binary, it is entirely possible (I think) to have no instruction specifically associated with the function definition’s source line, even though other instructions are associated with other lines in the function. I (or someone) would need to look at a specific example before being able to say one way or the other if that is what you are running into.

Have you considered building a static array of function addresses? If you used weak references it would not interfere with optimizing away entire functions, which I mentioned above. Or would that be too intrusive into your use case? Apologies if this suggestion has come up before.

–paulr

Hi Paulr

Thanks for your very useful and quick reply. Below is my response.

Are the functions emitted to the final binary? If a function is not used, there might not be any object code for it in the final binary. Naturally there would be no entry in the line table in this case.

Hi

Actually, No particular reason. I just think this might be a solution, then I use think kind of method. Querying the symbol table would be a good choice, but I prefer to use LLVM and dwarf information. I am sorry that I am not familiar with debug_info. But thanks to your suggestions. I would like to try to solve it with debug_info. It seems work according to your comments

By the way, I am still curious about the reason, why dwarf line mapping table would lost so many function’s start addresses’ information. It would be great if you have any comments on this problem. Many Thanks

Regards
Muhui

Hi

I tried to grep the “DW_TAG_subprogram” from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

Regards
Muhui

Hi Muhui,

If the function does exist in the binary, it is entirely possible (I think) to have no instruction specifically associated with the function definition’s source line, even though other instructions are associated with other lines in the function. I (or someone) would need to look at a specific example before being able to say one way or the other if that is what you are running into.

Hi Muhui,

I tried to grep the “DW_TAG_subprogram” from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

The only explanation that comes to mind, is that the functions are not in the final binary object file. However, previously you said you believed they were present. If that is the case, please provide us with an example source file and compiler command line, to help diagnose the behavior.

Thanks,

–paulr

Hi Paulr

I think I’ve already know the reason. I use the -save-temps to help me to save the LLVM IR during the compiling time.

You know, there are four different kinds stages and every stage map to one file for one binary. And they are

*.preopt.bc
*.internalize.bc
*.opt.bc
*.precodegen.bc

My LLVM Pass is running on *.preopt.bc so that I get 376 functions. However, when I run the same pass on *.precodegen.bc. I get 266 functions, which is the same number according to the symbol table. My mistake that I didn’t consider which bitcode file should I run. Thanks for your suggestions.

Regards
Muhui

Hi

I would like to ask something more. >From this experience, I think I didn’t understand very well on the generation of LLVM IR.

I am using autotools(configure, make) to compile the binaries. I use the LLVMgold.so and the -save-temps option to save the LLVM IR. If you have any other good suggestions on keeping the LLVM IR, especially on the compilation with autotools, please tell me. Many Thanks.

Regards
Muhui

Hi

One more thing I would like to confirm. It seems that the dwarf-info will only contain the functions inside the .text section rather than the .plt section. Am I right? You can check the attached file

Regards
Muhui

new_cp (336 KB)