Extracing constant values from calling functions

Hello all,

So I have this LLVM IR which contains calls to functions that pass macros (which can be both string and integer), integers, and strings. So I want to extract all such constant values and I see most of the time the arguments being passed in these function calls are not dynamically created using malloc, So I believe it’s easy to extract.

My idea was to traverse through Instrs finding all the instructions which are of type CallInst and extract the argument accordingly. But I am finding it difficult as neither instruction nor function doesn’t have getOperand. I see llvm::Value has getOperand but I am unable to quite understand how it can be used in this context. Referring to the online resources I see most of them use Function.getOperand or such available functions which were available in the previous llvm version. I am currently using llvm 17 version.

Any suggestions are appreciated.

For CallInst, the member function you’re looking for is getArgOperand (which is inherited from CallBase).

Alternatively, you can iterate through all the arguments via the args member.

Hey,

Thanks for the reply. This is exactly what is suggested at:- https://groups.google.com/g/llvm-dev/c/LfBGwpvUXwg and what I tried as well. getOperand gives you can address and you need to typecast it accordingly. But when I am trying to access the string by typecasting them as described on site I am getting a segmentation fault. I double-checked the name of the string using getOperand.getName and I can clearly see it’s present in the IR file that I am passing as input. What typecasts should I be using as its seg-faulting during the very first typecast to ConstantExpr itself

Thank you.

It’s difficult to tell without seeing some sample IR. Are you using a pre-built version of LLVM or is it one you built yourself? If you’re getting a segfault while performing an LLVM cast, it’s likely due to an assertion failure. You won’t be able to see the assertion message without building a debug version of LLVM or building LLVM with the CMake option -DLLVM_ENABLE_ASSERTIONS=TRUE.

Also, are you using a version of LLVM newer than 14? If so, opaque pointers can have an effect on what sort of casting is needed.

Yes, I am using LLVM version 17. I have resolved the issue. As I am trying to extract a global string not an expression, I have removed ConstantExpr casting and it worked.

I am using the Release version as it compiles faster but is debug version best in such cases?

Also, I have one more follow up question. I want an add a comment instruction (placeholder for a string constant) to the basic block that I am currently traversing, but most of the online forums suggested using Metadata. So I used MetaData and setMetaData but when I try to visualize the CFG by dumping it into a dot file I see a reference number (obviously) instead of a string constant. My question is can such a thing be achieved in this case? I know there is something called NamedMetaData but most examples I have seen for it are related to inserting it on Module, but I want to insert String Metadata at instruction level, not module level.

Glad you got the casting issue fixed!

I would also suggest metadata for that, but as you’ve seen, it doesn’t display very well in graph output. Named metadata refers to llvm::NamedMDNode, and that does only apply at the module level, so you can’t attach it to instructions. That said, when you do attach “unnamed” metadata to instructions, you still give it a “name”, so you can set that name to just about anything you want. (It is possible to go overboard with this and make the IR unreadable, though, so that’s something to keep in mind.)

You could override the CFG graph traits by creating a separate, full-specialization of llvm::DOTGraphTraits<llvm::DOTFuncInfo*>> where you would change the code in getSimpleNodeLabel to write each instruction in the block to a string line-by-line while inspecting the metadata and appending the relevant ones to the end of the instruction as a comment string (as an example). You’d have to implement your own CFG printer pass written to use your DOTGraphTraits specialization to get it to work, though; the default one wouldn’t use yours. (If anyone else reading this knows this to not be true, please let me know! This would be useful!) This is quite a bit of work, but you’d end up with a CFG printer you could customize however you wish later.

Alternatively, you could:

  • Insert inline assembly that consists solely of an assembly comment using llvm:InlineAsm::get(); you’d have to mark it as having side-effects to stop other optimizations from moving it around, and adding inline assembly can have an effect on how optimizations operate in general, so that’s something to consider before doing this.
  • Create a bitcast instruction that casts a constant integer 0 value from one integer type to the exact same integer type – basically a no-op – and name it what you want your comment to be. Like with the inline assembly, it can look a little strange, and other optimizations may remove it. There aren’t any restrictions as far as the naming of virtual registers (as far as I’ve seen) so you could name it a complete sentence if you wanted.

You could override the CFG graph traits by creating a separate, full-specialization of llvm::DOTGraphTraits<llvm::DOTFuncInfo*>>

Yes, this was what exactly I was looking into when I couldn’t achieve the same with metadata.

You’d have to implement your own CFG printer pass written to use your DOTGraphTraits specialization to get it to work, though; the default one wouldn’t use yours. (If anyone else reading this knows this to not be true, please let me know! This would be useful!)

I am already writing my own pass so basically as you said above one needs to override the GraphTraits and define their very own label information. So, no need for one more extra pass in case you are already writing your own pass, you can embed it in there.

I am using the Release version as it compiles faster but is debug version best in such cases when I need to do a lot of trial and error for figuring out things?

Like does it provide more extra information than the already provided stack trace when it seg-faults? Any pointers on this?

Edit :- Just got curious and tried the debug build. It’s so harsh that my pass that is running successfully in Release mode is throwing 20+ errors in debug mode (Like can’t pass BranchInst in place of Inst and such things).

Thanks for your time.