Help adding entries to .symtab

Hi everyone,

I am fairly new to LLVM, I’m working on a new backend.
I am trying to add information to a specific instruction using the .symtab in the ELF format.
I’ve been searching through the LLVM source code trying to find a way to do such a thing.
Can anyone help me with some directions or point me to some documents in the matter.

Thanks, Liad.

The .symtab has symbols; you could define a label for the instruction, and store something in the symbol table entry for the label. This kind of thing would be done in the MC layer.

This would be one way to provide annotations that aren’t needed for execution. If you need to change the instruction itself, for example to have the instruction refer to memory somewhere, then you need to modify the instruction itself.

–paulr

Hey Paul,
first of all thank you for taking the time to answer me,
if I understand you correctly, I need to modify the instruction it self so one of it’s operands is a symbol, and then at MC layer handle that symbol and add an entry to the symtab for that label?
What kind of symbol should I use doing such thing? external symbol or MCSymbol?
I was trying to find where in the code during the MC layer I add entries to the symtab, I’d really appreciate some directions in the area.

Let me give an example so my questions might be a bit more clearer:
BB#0
…some opcodes…
mov r1, BB#1

BB#1
…some opcodes…

I’d like to put a place holder instead of the BB#1 in the mov opcode, and create a symbol named “BB#1” that points to that opcode, so during link time I can replace that placeholder with the actual address of BB#1.

Again thank you for your time and answer,
Liad.

Hi Liad,

I’m not an expert in MC, but what you describe doesn’t sound any different from how you would handle a branch instruction. Create an MCSymbol that represents the address of the target instruction; use that symbol as an operand in the referencing instruction; emit the symbol as a label just prior to emitting the target instruction. The second and third steps can occur in either order, depending on which instruction is emitted first. I’d expect the reference to be fixed up as part of the normal assembly and linking process, nothing special there.

It’s a little more complicated if the two instructions are in different compilation units, but if they are in the same compilation then it should be pretty straightforward.

–paulr

Hey all, and Paul,
first of all thanks again to Paul for the help, I’m replying to this message with a little more details to the solution I used for future reference for anyone who’s a beginner in LLVM and requires it.

Once the code selection is complete, the AsmPrinter takes control,
At first it lowers the MachineInstr into MCInst, and then proceed to generate what ever was asked of it (assembly/obj).

During the work of AsmPrinter, you could call OutStreamer->EmitLabel which accepts MCSymbol, and in turn creates a symbol in the current location you are at (for example, if you call this during the EmitInstruction function, the symbol will be generated at the location of the current instruction being emitted)

I also needed to add relocation to a MOV_RI instruction, during the time working on this I learned about ComplexPattern which gave me the ability to create an instruction that can accept either immediate and MCSymbol as it’s second operand, and in the MCCodeEmitter::getMachineOpValue hook you can turn the MCSymbol into a fix-up (which in the next few steps, in my case, will turn in to link-time relocation). (Some of you might notice this case is very similar to x86 MOV32RI instruction)

I hope this information helps people with the same case as mine in the future, Liad.