want to intercept array dereferences

Normally for int n unknown at static time, "a[n]" and "*(a+n)" results
in an add and then a dereference. I want instead for it to compile to
a system call that takes two arguments, a and n. Where should I
intercept this in LLVM?

Gry

Far too late. That would need to be in Clang.

If I understand correctly, LLVM is a *typed* assembly language. Could
I just look for a pointer type plus an integer type followed by a
dereference? That would catch both a[n] and *(a+n).
Gry

You might be able to match the GEP + load pair and replace with a call.
But, it would depend entirely on how the llvm instructions were generated and what optimisations have been run.

Ok, sounds like you are saying it would be a flaky solution.
Gry

Normally for int n unknown at static time, "a[n]" and "*(a+n)" results
in an add and then a dereference. I want instead for it to compile to
a system call that takes two arguments, a and n. Where should I
intercept this?

The LLVM guys suggest that intercepting a pointer type plus an int
type followed by a deref would not be a stable solution. They suggest
that I do this at the clang layer. Where would be the stable place to
do it? I'm new to clang development, so some context would be
helpful.

Gry

I think that your last sentence applies to everyone else too: can you explain what you are trying to achieve? In C, it is perfectly valid to write a[n], n[a], or *(a+n). Do you want to distinguish these three or catch all of them? What problem are you actually trying to solve?

David

The LLVM guys suggest that intercepting a pointer type plus an int
type followed by a deref would not be a stable solution. They suggest
that I do this at the clang layer. Where would be the stable place to
do it? I'm new to clang development, so some context would be
helpful.

I think that your last sentence applies to everyone else too:

All are noobies! In the future, technology is moving so fast that no
one knows anything.

can you explain what you are trying to achieve? In C, it is perfectly valid to write a[n], n[a], or *(a+n). Do you want to distinguish these three or catch all of them? What problem are you actually trying to solve?

Well as I said initially, I want to intercept base pointer + offset
data accesses, no matter which of those forms it is in. So if the
front end converted a[n] to *(a+n) internally and then I intercepted
it, that would be fine.

Again, I asked the LLVM list if, since LLVM is a typed assembly
language, if I could just look for pointer plus offset followed by a
dereference. They seemed to suggest that looking for that idiom would
not produce a stable result, depending on what the optimizer did and
that I should look in the front-end for how to intercept a[n].
Although, I'm thinking if I did it before any optimization passes, it
seems to me it should work.

So now I'm thinking that perhaps in the front-end there might be a
stage where a[n] has been lowered to *(a+n) and then I could just look
for an abstract syntax subtree of the form *(a+n) and replace that
with a different subtree. Some suggestion as to a stable way to do
that would be great.

Gry

Again, I asked the LLVM list if, since LLVM is a typed assembly
language, if I could just look for pointer plus offset followed by a
dereference. They seemed to suggest that looking for that idiom would

JFTR, we're one big community, and it's not as segregated into "the clang devs" vs "the llvm devs" as you might think.

It's encouraged to cc both lists (as appropriate) when having these sorts of discussions that span the interface between the two projects. This helps give context to statements like "well I asked the other list", without having to dig for that other message. At the very least you should provide a link to the other discussion.

Cheers,

Jon

Again, I asked the LLVM list if, since LLVM is a typed assembly
language, if I could just look for pointer plus offset followed by a
dereference. They seemed to suggest that looking for that idiom would

JFTR, we're one big community, and it's not as segregated into "the clang
devs" vs "the llvm devs" as you might think.

On the LLVM list I was told "That would need to be in Clang" so I am
writing the front-end list.

It's encouraged to cc both lists (as appropriate) when having these sorts of
discussions that span the interface between the two projects. This helps
give context to statements like "well I asked the other list", without
having to dig for that other message. At the very least you should provide a
link to the other discussion.

Ok, in the future I will write both list, but I initially thought it
was a purely backend question. The other discussion starts here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084280.html

So now that I am writing both lists: where can I intercept a
Clang/LLVM compile that will catch expressions that locally look like
a[n] or *(a+n) ?

My guess is sometime after a[n] is lowered to *(a+n) (if you do that)
and sometime before optimization passes start chewing on it.

Gry

Probably you should take a look at sema/TreeTransform.

  1. ápr. 9. du. 8:06 ezt írta (“Gry Gunvor” <gry.gunvor@gmail.com>):

Again, I asked the LLVM list if, since LLVM is a typed assembly
language, if I could just look for pointer plus offset followed by a
dereference. They seemed to suggest that looking for that idiom would

JFTR, we're one big community, and it's not as segregated into "the clang
devs" vs "the llvm devs" as you might think.

On the LLVM list I was told "That would need to be in Clang" so I am
writing the front-end list.

It's encouraged to cc both lists (as appropriate) when having these sorts of
discussions that span the interface between the two projects. This helps
give context to statements like "well I asked the other list", without
having to dig for that other message. At the very least you should provide a
link to the other discussion.

Ok, in the future I will write both list, but I initially thought it
was a purely backend question. The other discussion starts here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084280.html

So now that I am writing both lists: where can I intercept a
Clang/LLVM compile that will catch expressions that locally look like
a[n] or *(a+n) ?

CodeGen for the former happens in CodeGenFunction::EmitArraySubscriptExpr(). For the latter, you'll need to look for UO_Deref in EmitUnaryOpLValue(), and work backwards up the AST to find the BinOp for the +. This might be a little tricky as not every dereference will have that addition binop as its child node, and not every one that you find there is actually a case of array indexing.

My guess is sometime after a[n] is lowered to *(a+n) (if you do that)
and sometime before optimization passes start chewing on it.

As Bruce told you in the other thread, it's probably too late to look for it in llvm. The best place really is to do it in Clang, though it might depend on why you want to do this particular transformation. If you're trying to do bounds checking, the sanitizers already do that.

Jon

The best place really is to do it in Clang, though it might depend
on why you want to do this particular transformation. If you're trying to do
bounds checking, the sanitizers already do that.

Ah, well bounds checking passes do need to know that information.
Perhaps it would work to just hack on one of the sanitizers and get it
to gen the code I want rather than generating a software bounds check?

Gry

The best place really is to do it in Clang, though it might depend
on why you want to do this particular transformation. If you're trying to do
bounds checking, the sanitizers already do that.

Ah, well bounds checking passes do need to know that information.
Perhaps it would work to just hack on one of the sanitizers and get it
to gen the code I want rather than generating a software bounds check?

I'm still not sure what it is that you actually want to do, so I can't tell you whether or not this plan will work to get you there.

Jon