static taint analysis in LLVM

I want to know if LLVM support static taint analysis now ? and how to implement static taint analysis code in term of LLVM pass or something else ?

can anyone help me?Thank you very much!


It appears that you’ve not done the requisite reading that’s highlighted multiple times in the very beginning of the document. Compilers are extremely sophisticated and hard; the assumed proclivity for self learning here is high, so if you don’t demonstrate that you’ve done your homework it will probably be hard to solicit support.

In any case, not that I know of, and I’ve been on a stride for the past while, reading about and learning llvm. You can see all the different passes that are publicly available in the documentation as well. In addition, since I’ve already told you that it’s not there, I might as well help you out by saying that the way llvm works is as a library in that when you want your pass to be executed, you register the code with the overall framework. In your case, you’ll need two things, from a higher level perspective-a way to insert some form of a runtime/library regarding how the shadow memory is maintained. My perspective regarding this is you can construct your own shadow memory functionality as a shared object that is loaded and initialized via a companion preamble to main, and produce compiled executables that implicitly use this (or edit the compilation behavior of your targets, more tedious). And the other thing you need is to weave in the calls, or inline, the work necessary to maintain the shadow memory. I recommend writing a pass that will work at basic block granularity level, because at that point you can array the memory operations to facilitate liveness of the shadow memory callback or offset information, thereby further streamlining the efficiency of the final code by combining shadow memory maintenance work.

This book is good for getting started: Getting Started with LLVM Core Libraries. It has lots of examples, but to be honest, you don’t need to pay for anything until you’ve read what’s publicly available, and llvm even comes with examples.

Let me know your thoughts and we can pick up when you’ve seen the passes and learned about how to extend the correct C++ class.

Thank you very much for you nice reply.
I have red some parts of LLVM documents, but not all. However, I think I have no time to read more documents. because I must complete my work almost 40 days later.

I want to writer a simple checker to check a OS(wrote by C) to determine if it has buffer overflow(or more) vulnerability using LLVM. And I want to write it as a LLVM pass. I think static taint analsiy technique can solve it.

Limited by time, I need a static taint analysis example to imitation and improve. I hope the example should using LLVM, and it must have well annotation because coding is a difficult thing for me.

I have found a example :
but it has little annotations(I have send email to the wirter, but I haven’t receive reply). I have implement “sourcesinkanalysis” parts as a LLVM pass by myself. but other parts is difficult for me without annotation.
so if you have a better examples, or you have some better suggestions for my work. Please tell me. Thank you very much!

best wishes ,

Well, if you’re working with compiler infrastructure and you’re not familiar with how to build a custom pass, just know that compilers are extremely sophisticated and very difficult. But I’m not trying to discourage you, so do you think maybe it would be more suitable to limit your thesis to some subtopic, like just making the shadow memory runtime/library, that would certainly be a worthwhile endeavor. Lots of different utilities do taint analysis, having a single reusable component that could be high quality would be very attractive. You could reuse it with PANDA, Intel PIN, a compiler based instrumentation pass, or DynamoRio. That would be attractive. It could be malleable and offer different modes of operating, such as asynchronous or blocking, have different shadow memory representations, such as offset-range encodings or just a simple bitmap and support different identity schemes. That would be valuable and realizable within your constraints. There are existing taint analysis mechanisms that aren’t static, but you might be able to lean from those.

The actual mechanism by which to determine a propagation of taint itself though is really hard to nail down. It’s usually some tradeoff of performance vs precision, but in general making a determination about whether a clobber has occurred on the fly would be hard, since it an be determined by a function of the value local to the operation you’re trying to reason about. There was some research oh dynamic flow analysis that’s pertinent to this, but if you’re struggling to meet a deadline and also working on picking up the compiler infrastructure, this would definitely over laden you.