[MCJIT] messy call stack debug on x64 code in VisualStudio

Vivien_Millet · February 29, 2020, 4:14pm

Hi,

I’m using IR and MCJIT to compile a script language. I debug it with on the fly generated .pdb files. During debugging, almost each time I step into a function, I loose information about calling function inside the visual studio callstack view or I have a bunch of pure addresses in the callstack in between the current function and the calling function, for example :

MyJit.dll!MyCurrentFunction()
[0x1234567887654321]
[0x8765432112345678]
MyJit.dll!MyCallingFunction()
...

It looks like visual studio get lost while walking up stack.
Does anyone know where it could come from ?

I have disabled all optimisations (among them is the omit-frame-pointer).

I have seen this bug here : https://bugs.llvm.org/show_bug.cgi?id=24233 which is quite similar but it is quite old now, and since the proposed patch has been posted, the code in RuntimeDyldCOFFX86_64.h has changed and it is difficult for me to know if it has really been fixed since or not.

Could it be related to the way IR CreateAlloca are used to build local variables ? Could it be related to missing informations inside the PDB ? (I don’t know if there is stack related information inside PDB files to ensure good stack walking).

Thanks.

Vivien

dblaikie · February 29, 2020, 10:28pm

+Lang Hames for JIT things
+Reid Kleckner for Windows things

(in case they’ve got any thoughts on this issue - no guarantees though, I didn’t even know there was any option to use PDBs through MCJIT & I know MCJIT is mostly unmaintained in favor of ORC JIT)

Vivien_Millet · February 29, 2020, 10:56pm

Thank you David for forwarding my question.
There is no official support for PDB with MCJIT, I developed one myself as part of a project and with the help of Zachary Turner, I will soon make it public on my github.
(If the community is interested for an official support of a JIT-PDB feature, I can discuss with the persons in charge for helping with its development).

Anyway, to give a little more information on my issue :

adding “nounwind” attribute to jitted functions solved the problem for some functions but I still have issues on others (I still have to find why… and this is where I need help).
it seems that the prolog register pushes “mess up” the stack walk because when breaking just at the beginning of the prolog or after epilog, I can see the callling function correctly under the current one (RBP problem ?).
when stepping inside a function, the mess in the callstack diplay evolves, like if visual studio was trying to walk the stack based on something not static (RSP ?)
I don’t understand how the walk could be complicated because frame pointers are preserved in the assembly(RBP is pushed)… except if RBP is used for something else but that would be weird.

Regards,
Vivien

rnk · March 1, 2020, 4:06am

Yes, I think https://bugs.llvm.org/show_bug.cgi?id=24233 needs to be implemented to fix this.

The Windows x64 unwinder doesn’t generally look at frame pointers. We would need to register unwind info to make this work. What you see is fairly typical of attempting to unwind the stack when unwind info is missing.

PDBs shouldn’t generally enter into the picture.

vtjnash · March 1, 2020, 4:54am

I’ve always just hacked support for this in to the various JITs (for JuliaLang, in our debuginfo.cpp file), by setting the no-frame-pointer-optim flag in the IR, then creating and populating a dummy unwind description object in the .text section, and registering that dynamically. Some day I hope to actually just register the .pdata/.xdata sections with the unwinder.

PDBs are a bit different though, since the above steps work well for gdb, but generally I find that WinDbg is less willing or able to be given JIT-frame information from LLVM. (I assume somehow it can be done, for dotNET. I just don’t know how.)

Vivien_Millet · March 3, 2020, 11:06pm

Sorry for being late to reply, I was investigating what you advised to me. And with all your informations and some (really) hard time I succeeded to register the unwinding information (not the way you think) and have a clear callstack ! it feels good !

@Reid Kleckner
After looking deeper, I don’t understand why this bug is open for such a long time as everything is here to fix it : the unwinding process is completely implemented in LLVM and is enabled by adding llvm::Attribute::UWTable to llvm::Function instances.
The only thing missing is the calls to RtlAddFunctionTable. They should be added to the default MCJIT or OrcJIT memory manager (maybe with an option to enable/disable them), simply by tracking memory requested for allocation of “.pdata” and “.xdata” sections and calling RtlAddFunctionTable on notifyObjectLoaded. I don’t know who is responsible for this, but that might be an easy win for a great feature completion (this is nice to avoid the user having to understand all this machinery by themselves…)

In my case, I can’t call RtlAddFunctionTable because I inject my code into a fake .DLL (for PDB hotreload purpose) which already have its static function table.
To explain my process (for other devs willing to suffer like I did) :
If debugging is required by the user, I switch from MCJITMemoryManager to a home-made DllMemoryManager which :

loads a dummy .DLL consisting of empty .text .rdata (including .xdata) and .pdata sections, without relocations.
allocates memory inside the loaded .DLL address range (the dll has been hacked for WRITE access)
unload the .DLL
generate a PDB
INTERESTING PART : rewrites PDATA and XDATA sections with the one emitted by LLVM (fixing virtual address inside the image).
writes the .DLL back on disk
make PDB file match .DLL file (GUID)
reload the .DLL (it reloads at same position 100% of the time in my case, I might be lucky but I’m ok with it) so that visual studio detects it and load the matching PDB.

All of this process was painful but it works and I can build and rebuild my language on-the-fly while debugging it alongside with native code inside Visual Studio. You might wonder why not generate a real .dll and reload it ? Because I can’t predict where the functions will be reloaded inside memory and I need to keep “reflection” of the JIT symbols. (+ it’s slower and requires a lot of work on the mangle/link/pdb dependency sides).

@Jameson : Thanks for your feedback, it helped me to identify .xdata and .pdata sections for unwind stuff !
Are you sure that unwinding info is not enough for you to make it debuggable ? I personally removed the “no-frame-pointer-elim” and it keeps working well, I keep seeing my full callstack (maybe is it only useful on x86 ?), because Win64 does not use RBP to walk the stack at all, all is done with unwind infos apparently.
PDB are not concerned at all with all of this, I thought it might but no…

vtjnash · March 4, 2020, 3:45pm

I think it’s not enough, for reasons related to the pain you went through. Normally, the JIT doesn’t have a backing DLL and so it doesn’t support the relocation type required by the xdata and pdata sections. As you say, it generally doesn’t work very well in WinDBG anyways, since they removed the use of RBP to walk the stack. At the time I wrote my hack, LLVM didn’t even have emission code for those unwind sections, and it hasn’t been worth the hard effort to change it for me. Is your code public somewhere? It seems like it could be useful for all JIT users to have a drop-in option to enable debugging.

Vivien_Millet · March 5, 2020, 10:35pm

I would still try because having a backing DLL or not doesn’t matter here, the only thing that matters is if VisualStudio finds a FunctionTable (RtlAddFunctionTable / RtlLookupFunctionEntry api). This post explains it : https://stackoverflow.com/a/58227575/809199.
Having a backing dll is harder because it requires that every .pdata and .xdata sections are large enough to receive jitted unwind info, which is easier in case of dynamic allocated memory not when you are a newbie in DLL format knowledge like me !

Here it is, I just droped it !
https://github.com/vlmillet/llvmjitpdb
A review from @Zachary Turner and one of you would be welcome !
I’ve reworked it so that it depends only on LLVM libraries and made it inside the llvm:: namespace
I don’t know if it is the right way to distribute an extension to LLVM…
That would be easier for the users if it could be integrated directly to the LLVM solution but I don’t know what is the process
for validation of such an integration.

Just in case someone wants to implement something similar without a hacked DLL (which means no PDB debugging, or at least I don’t know how). :
The steps I would follow are these:

use VirtualAlloc to allocate memory outside of any module range (using a custom MemoryManager).
keep track of “.xdata” and “.pdata” sections allocations in MemoryManager::allocateDataSection
fix (if not already ok) each RUNTIME_FUNCTION::UnwindData in .pdata so that they point inside .xdata allocated memory range (newCurrDataPos = (oldCurrDataPos - oldFirstDataPos) + VirtualAllocStart).
call RtlAddFunctionTable with "BaseAddress= VirtualAllocStart " and “FunctionTable=.pdata address”.

Kind regards,
Vivien

Topic		Replies	Views
ORC JIT Weekly #13 -- MCJIT / ORC PDB Debugging LLVM Dev List Archives	0	72	April 27, 2020
LLVM2.2 x64 JIT trouble on VStudio build LLVM Dev List Archives	4	60	February 18, 2008
Stack traces from JIT code LLVM Dev List Archives	0	88	November 12, 2013
Missing data on PDB's generated by lld LLVM Dev List Archives	4	91	March 18, 2019
Debug information and JIT LLVM Dev List Archives	5	81	July 30, 2012

[MCJIT] messy call stack debug on x64 code in VisualStudio

Related Topics