[llvm-pdbutil] : merge not working properly

(BTW, I’m adding llvm-dev back to the list, I didn’t notice it got taken off. In general I try to keep the list on all emails, even if it’s extremely technical and specific, because someday someone else will try to do this, and it’ll be nice if they can read the whole thread).

I can think of a couple of things that might be wrong:

  1. If the string table is in a different order, then anything that refers to the string table need to be changed to refer to the new offset. If the string “foo” is at offset 12 in the old PDB, but offset 15 in the new PDB, then somewhere there is a record which is going to look at offset 12 and expect to find something, and that will mess up. The main place this is important is in the File Checksums table, there is an entry that says which file it is a checksum for, and that refers to the string table. However, it’s possible for certain symbol records to refer to the string table too. See lld/COFF/PDB.cpp and Ctrl+F for “PDBStrTab” and you will find some information about this.

  2. When you run llvm-pdbutil dump -streams on the copied PDB, do all of them show a reasonable description? Are there any streams that say (???)? If so, that’s a problem.

does visual studio will consider a symbol file broken if the address goes beyond the official module address range (the compiled one), because my JIT code is allocated after the end of the module with VirtualAlloc
That is a good question, and part of why my job is so difficult, because I can’t look at their code. But I think the answer is “probably”. The debugger has to have some way to convert an address in your running process into a symbol and offset, because that’s how all debug info is represented in the PDB. So if there is no module, then there is no RVA (because the R in RVA means relative, and what would it be relative to?).

One idea to test this would be to create a DLL called jitted_code.dll, give it a huuuuuge .text section (probably just a .asm file and use some assembly directives to allocate a very large series of null bytes), and then write your jit code into that area. This way you would not need to modify the existing PDB you would only need to make a new PDB called jitted_code.pdb with 1 module, and those symbols could have meaningful RVAs. And you might not even need to detach the debugger if you do things this way, because you could just right click the jitted_code.dll module in the modules window and choose Load Symbols.

(Yes you are right this is my fault)

Considering the string table, it only seems to contains file relative informations in every pdb I am using, and it looks correct but I will check it.
I looked at the pdb.cpp code about checksums and tables, I copied some stuff and got things wrong according to cvdump, then I simplified the process of copying the table and it worked (in cvdump it finds the file matching line etc…) so I suspect this is also correct.

All the streams look good, but I will check deeper !

It seems right what you say about rva and modules, this is what I m afraid of, doing all of this for nothing or almost…

Your idea looks good concerning the .text section in a separated dll, but will it be executable memory ? .text is where static strings go right ? When you say putting my jit in there, do you mean writing it when the jitted_code.dll is loaded in memory or on the .dll file directly before loading it ? In the first scenario I wonder if the section will be executable, in the second scenario I can’t do it because it would require perfect linking with the other code my jit points to…

.text is where code goes, I don’t know why it’s called .text, it’s just been that way for many decades and the name stuck around. But actually you can call the section whatever you want. Maybe it’s even better to call it something other than .text, because .text is where your DllMain and other stuff will be. You could call it .jit if you wanted to. You should be able to create the section with whatever flags you want to. You’ll need to produce a jit_code.obj probably compiled from assembly that makes a section named .jit and sets the flags to be executable (you can just copy the flags from a normal .text section of some other program). Then link this file together along with a jitted_code_main.obj which you compiled from a simple source file with a DllMain function that does nothing. This would make jitted_code.dll, then have your program link against jitted_code.lib.

Right now you jit the code into some buffer that you created with VirtualAlloc. If you do the above, it will load jitted_code.dll into memory and the OS loader will allocate some memory for each section. So this would be like your VirtualAlloc, you can just find the address of the .jit section and use that buffer instead of the VirtualAlloc buffer as the target address of your jit operations.

Again, this is just an idea, no promises it will work, but unfortunately that’s kind of the best you can do when dealing with closed source things, just make guesses and hope for the best.

Hello Zachary,
Sorry for replying so lately but It’s been a week I’m thinking an working hard on your “dll memory buffer” idea to see if it works and give you feedbacks !
And it works pretty well until now :
I shared on the list what I did :

  • create a .ASM file full of “int 3” instructions (to ensure that if we execute over the boundaries we instantly break.
  • Compile this to a .DLL
  • use hexadecimal editor to change “.text” section Characteristics from Read/Execute to Read/Write/Execute
  • run my program which does JIT compilation
  • get the start RVA of the .text section (which is always 0x1000 in my case)
  • Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a custom DllMemMgr I give to MCJIT
  • On NotifyObjectEmitted replace the dll pdb by a custom one I build myself with your PDBFileBuilder
  • On finalizing memory, reload first the dll to trigger visual studio pdb reloading (not working don’t know why yet), ensure it goes into the same virtual space, protect memory using VirtualProtect.
  • Place a breakpoint in my JIT file, it displays “loaded”, execute JIT, it breaks
    …and …
  • drums *
    Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc … and this, every time, I don’t know why…
    I noticed, when compiling C++ equivalent to my JIT program, that a simple “int param” is written size=20 in C++ pdb and size=16 in my JIT pdb, do you know what this “size” attribute represent in the S_LOCAL Symbol section ? I suspect the symbol section to have program for the watch issue … but I am not sure, If you have an idea…
    I also had an “illegal instruction” exception when stepping with F10 after break, but when I’m not breaking the code it runs well…

A lot of mysteries there again…

Visual studio displays well the disassembly with the debug lines at the right place, etc … so I don’t get why visual studio crashes…
Another issue I have is that I always have to remove/add my breakpoint so that visual studio realy breaks, even if it says “I’m a good breakpoint at that good address”. Does it have a relation with file checksums ? It seems mine has a “none” checksum so I suspect this to be the problem… but I don’t know how to fix it because I added the checksum with addChecksum with the good file name and still I get “none” in the dump…
So right know I’m quite hopeful because I get something reacting in Visual studio, but I have no idea why it crashes…

Have you already encountered this issue when testing your generated pdbs ?
Do you know the role of Section Contributions in the PDB/debugging session ?
Any tip for checking Symbol record validity in the dump ? looks good to me, no ??? anywhere or Error …

Thank you !

To be more precise on the crash context, it only happens if I write in the “Watch” window a variable with a name unknown from the local context, for example I type “foobar” which does not exist in my program, and then Visual Studio freezes (the cursor busy), then it crashes/closes and relaunch as usual. I suspect the Global/Public stream stuff in this case to be wrong, or at least a problem in my symbol record but my method parameter displays well in the “Watch”… If you have an idea…
Could it be a mangling problem in my symbol records ? I don’t use C++ mangling, then maybe parsing my symbols can generate bugs…
Is there a C++ mangler in LLVM I can use to produce correct names ?

two ideas for checking validity of the records:

  1. Use llvm-pdbutil with the “pretty” command line option. This will use DIA SDK, which as far as I know is the same as what VS Debugger uses. It won’t really tell you specifically what is wrong, but it can give you some hints because if it crashes somewhere, then at least you know where it’s crashing. For example, if it’s crashing on trying to get an address of a symbol, then you might check the symbol record’s section/offset, look at the section map and section contributions, compare those to your executable, and make sure everything matches up.

  2. Use cvdump. This can also be a hint about which specific records it doesn’t like.

If you’re not using C++ mangling, then that could definitely be a problem. We have MicrosoftMangle.cpp which is in clang, so it might take some work to hook it up in your case, as it’s meant to be used from the C++ compiler frontend, and not from the JIT. But if you can get that to work, that would be a good thing to try.

When it comes to things like this, my strategy is always that if I have to ask myself “I’m not doing X and I know normally the compiler does X, I wonder if I should do that?” then the answer is always “yes”. It’s only after I have no other ideas and everything looks identical that I really get stuck.

To answer your question about section contributions, it’s necessary because the debugger needs to be able to determine which “module” (i.e. source / object file) a symbol comes from. Imagine, for example, that the debugger is trying to find what symbol is at a particular address, say 0x12345678. Then it will subtract the load address of the module (let’s say 0x10000000) and get the address 0x02345678. Then in order to find the symbol for this, it has to first find the module. So it looks in the section headers (llvm-pdbutil dump -section-headers), and finds which section would contain the address 0x02345678. Pretend that it finds that it is section #1, the .text section, and the section header says the base address is 0x02300000. So now it knows that it is offset 0x45678 from section #1.

It can then binary search the section contributions since it is sorted by module and offset, to find an entry that contains its size and section properties.

In the reverse direction, many symbol records often contain a section/offset. So in this can if it has one of these records, it can binary search in the section contribs to find the properties.

Thanks again for your detailed tips !
I tried with “pretty” and indeed it crashes while writing a symbol name inside the parameter list, so I’m already
looking into mangling code in clang, which is a bit discouraging … I will do it step by step as I don’t understand
everything… I have the chance to have an AST and some visitor on my side so I can build the mangling without
having to parse more stuff.
Considering what you say about Section Contribution, it looks like that mines are correct since visual studio
breaks where it should … right ?

Hi Zachary,

It’s been a while but I got some new result and issues I’d like to share with you and the community.

I fixed the crash in DIA/MSVC and indeed it was a naming problem : In the DIBuilder I forgot to put Unique Names to my symbols while DIA requires them.
It was a really good step forward for me.

I had some problems with relocations too that I hardly understand (hard to know what is a section offset, compared to a section RVA, to a symbol RVA or a segment… and when it is used… Anyway after some magic inspired from the LLD code, I think I have applied good relocations in my PDB.

I still have some “minor” issues which are :

  • In the Watch window I cannot display parameters by writing them manually while in Auto they appear without problems.

  • The breakpoints only work if I disable them and re-enable them (even if visual studio see them as valid in the first place). I added checksums to the DIBuilder but it changes nothing (they look good in dump). Do you have an idea ?

  • If I change a source file and then rebuild the pdb, the breakpoints say “source different from original” even if the checksum has changed in the PDB has well… I really don’t get how these checksums work, really…

  • In between each call in my callstack I have “[External Code]” frames which are calls to __chkstk native function and it changed when steppin in disassembly and sometimes the callstack is broken and skips all super calls. It seems to indicate MSVC fails to walk the stack correcly. It is probably missing some frame information… Do you have an idea what it could be ? (I work on x64)

Thanks again !

PS: the fpo data is not involved because my codeview has none of them in the debug subsections.