A query language for LLVM IR (XPath)

Hi, sometimes when dealing with LLVM IR getting to a desired point of
the code is a bit cumbersome, in particular if you're instrumenting
existing code. A lot of nested loops and if checks.

Maybe all of this could be avoided by employing a query language. Since
an LLVM module can be seen as a sort of tree with attributes, I think
that reusing an existing query language for XML would be appropriate.

In particular I choose XPath [1] since it's more expressive than, say,
CSS selectors (e.g., you can move from the current element to the
parent).

Therefore, in a spare night, I took pugixml [2], a lightweight XML parser
with XPath support, stripped away everything was XML-specific and
adapted it so that it could query an arbitrary tree, as long as a class
providing certain traits is provided.

Attached you can find the class to query a LLVM module and example LLVM
module (using LLVM 3.8, but newer versions should do to).

The current implementation pretends that a module looks like the
following XML tree (more or less):

    <main.ll>
      <main>
        <basicblock1>
          <alloca />
          <alloca />
          ...
        </basicblock1>
        ...
      </main>
    </main.ll>

Additional information could be encoded in attributes.
Please note that the queries are done on the LLVM IR directly, no XML
tree is materialized.

In the following you can find some examples:

    $ # Find all the basic blocks containing at least an alloca
    $ llvm-xpath '/main/*[count(alloca) > 0]' main.ll

      %1 = alloca i32, align 4
      %2 = alloca i32, align 4
      %i = alloca i32, align 4
      store i32 0, i32* %1, align 4
      store i32 %argc, i32* %2, align 4
      %3 = load i32, i32* %2, align 4
      store i32 %3, i32* %i, align 4
      br label %4

    $ # Find all store instructions
    $ llvm-xpath '/*/*/store'
      store i32 0, i32* %1, align 4
      store i32 %argc, i32* %2, align 4
      store i32 %3, i32* %i, align 4
      store i32 %6, i32* %i, align 4

Obviously this doesn't have to be exclusively a command line tool, but
we could have something like:

    for (auto *Store : TheModule.xpath<StoreInst>("/*/*/store"))
      /* ... */

I'm not releasing the full code yet since it's very much work in
progress, but if anyone is interested in such a thing, just ping me.
The applications could range from using it in existing code to just
provide it for fast prototyping, e.g., in llvmcpy [3].

Obviously there are some open questions, such as how to deal with
operands, which could lead to an infinite tree, or how to organize
attributes. But it should be doable.

main.ll (1.24 KB)

llvm-node.cpp (5.83 KB)

This is so cool! I once had a similar idea but the way I was thinking about it ended up more complex than I had time to implement (I sketched it here: http://lists.llvm.org/pipermail/llvm-dev/2013-November/067720.html).

Good idea using xpath to simplify the implementation and reuse existing languages/libraries as a starting point!

As much as I’m not a fan of most XML things, this application of XPath is inspired.

This would be a great testing/query tool for tests.

It would also be a great way to prototype passes.

Looking forward to seeing something like this in llvm/tools/ !

Cheers

It would be really interesting to see this as an extension of FileCheck. Having the ability to write these sorts of predicates in CHECK: lines would be pretty cool, and could make existing regex checks a lot more principled.

-Chris

At some point I tried doing something like this for debug information verification (via XML, the effort did not go very far). I would be very interested in using and contributing to this code.

-Petr