Extending AsmPrinterHandler

Hey everyone,

I’m looking into extending AsmPrinterHandler out of tree, to communicate information back to the CoreCLR EE, but all of the headers are in the lib directory. Is there any way to extend it out of tree, or is that not supported? The reason I’d like to do this out of tree is because we want to include clr headers as well, and we don’t want to introduce that into llvm sources.

Thanks,

Michelle

We generally don’t extend the AsmPrinter… can you be more specific about what you’re trying to do?

-eric

I work on the LLILC team, and we are trying to send debug line info through to the CoreCLR EE without using an EventListener because we need to send extra info (more than just available in DebugLoc) like if it’s a call instruction, a call site, etc. We thought extending AsmPrinterHandler would be useful since it seems to have information about debug locations, label offsets, and instruction specific information.

You could write the debug information you want into just a section in memory and have your external/alternate/other process/thread/etc pick it up on the other end? I don’t see how the extra info you want to send is important here, you’d just be extending the existing debug support. Or I’m missing something at which point I’m not sure which additional questions to ask :slight_smile:

-eric

(background) The CoreCLR expects a JIT to produce a MSIL bytecode offset to code offset mapping annotated with a few extra bits denoting if it’s prolog/epilog, or it’s a call, or if there’s operands remaining on the MSIL virtual stack in some cases. Our initial prototype has the MSIL offset stashed in the line number field. We could stash the extra bits in the column info but that’s starting to feel too much like a hack. We’re looking for a way to 1) extend the debug metadata to hold our info and get it dumped into the in memory object – a new section would be fine if it’s not too complicated. Or 2) a place to extract the data we need when we have both encoded offset and access to the instructions. We’re looking for some advice. J

-R

Hi Russell,

Instead of hiding bits inside the debug metadata, why not just attach additional metadata to each instruction and look for that during emission? You’ll probably want to take a look at how things like the vectorizer are taking advantage of metadata if you want to encode things. Then during your front end compilation for MSIL->LLVM IR you can just attach random metadata to some instructions. This does have the drawback that, theoretically at least, you can strip metadata from LLVM IR and get a working binary. If that’s not the case for CoreCLR you might want to look into a way to overload some of the instructions or … something. Or just require people not delete your metadata I guess.

Does this help?

-eric

Thanks Eric, the pointes are appreciated. J

I was a little put off by the documentation for metadata that said it could be dropped by LLVM at any time, as well as the extra assertion that the debug metadata was “special”. Is there a reasonable expectation that added metadata will make it to encode? (or AsmPrinter? I’m llvm jargon isn’t down yet) Is the debug metadata handled specially such that it has priority over other metadata? Also, if we go the separate metadata route, we’d need to extend all the debug helpers in the AsmPrinter to extract that data to a special section. Is that what you’re suggesting? In terms of correctness, not returning the debug info to CoreCLR only impacts debugging. The executable will still run. J

Thanks,

-R

Russell Hadley wrote:

Thanks Eric, the pointes are appreciated. J

I was a little put off by the documentation for metadata that said it
could be dropped by LLVM at any time, as well as the extra assertion
that the debug metadata was “special”. Is there a reasonable expectation
that added metadata will make it to encode? (or AsmPrinter? I’m llvm
jargon isn’t down yet)

The idea is that metadata is not guaranteed to be preserved or updated by an llvm optimization (or other kind of transformation). The bitcode reader/writer and .ll parser and asmprinter will of course preserve them.

  Is the debug metadata handled specially such that

it has priority over other metadata?

At the risk of being out of date (this area has changed more recently than when I last fully understood it), most metadata is encoded sparsely where the LLVMContext owns a single map from Value* to Metadata. Debug info is special in that the Instruction* has a direct Metadata pointer for debug info. This is an efficiency consideration, and has an LLVM API difference, but has no semantic effect.

Nick

  Also, if we go the separate

This sounds like it's not really debug info so much as a description of the
stack frame that is required for correctness, like CFI (call frame info
that describes prologues and epilogues) and EH action tables. You probably
want to subclass AsmPrinterHandler and hook that into the pipeline along
with EH and debug info generation. Today this requires upstream
modification, but the actual pass code can live where ever you want. Take a
look at how Win64Exception.cpp and others are emitting things like the
ip2state table for __CxxFrameHandler3.

Long term, if you want to 100% guarantee that the MSIL offset is preserved
through LLVM optimizations, I think we need some other solution. Phillip
Reames was describing a similar problem, and I was thinking that we should
have a way to tack semantically important data onto a function call like
this. The best solution I could come up with using existing tools was to
use an invoke that unwinds to an artificial landing pad that ends in
unreachable and contains the preserved data in its clause operands. LLVM
optimizers will only merge such calls if the landingpad destinations are
the same, and it can't merge landingpads with different clauses.

Alternatively, it occurs to me that call sites support attributes, which
are different from metadata in that they are semantically important.
Optimizations cannot remove them. Maybe what we need is just an attribute
on the call site?

Hope that helps. :slight_smile:

FYI, if these are semantically important (and not just debug info) using metadata is a really bad idea. We’ve got a similar problem with information required to support deoptimization and have local changes which mostly solve it. I hope to eventually get that upstreamed, but we’re not particularly happy with what we’ve got at the moment and are the process of a rewrite. If you’re interested, I can try to do that rewrite upstream. If I do, it’ll be with the caveat that the code upstreamed will be extremely experimental and likely to change radically over time. Philip

FYI, if these are semantically important (and not just debug info) using metadata is a really bad idea. We’ve got a similar problem with

Agreed. I don’t know that I got an answer to “is this required for correctness” or not. I thought I did, but not positive now.

-eric