methods to access "base" and "index" in ArraySubscriptExpr

Recently I added a comment to the class definition for ArraySubscriptExpr (in Sema/Expr.h) noting that the true "base" of the array may actually be returned by "getIdx()" if the array happened to be written in the source (for example) as "4[A]" instead of "A[4]." I think this is confusing, and may cause glitches down the line for semantic analysis tools built on the frontend.

I propose renaming the current getBase and getIdx methods to "getLHS" and "getRHS" respectively, as the current interpretation of these functions has more do with source code syntax (their position in the source code) rather than semantics. I then propose adding new getBase and getIdx functions that look something like this:

Expr *getBase() { return (Base->getType()->isPointerType()) ? Base : Idx; }
Expr *getIdx() { return (!Idx->getType()->IsPointerType()) ? Idx : Base; }

Naturally, along with these changes I suggest changing the member variables "Base" and "Idx" to "LHS" and "RHS" respectively.

Comments? One disadvantage that I see of this change is that using getBase() and getIdx() in tandem may require some extra unnecessary checking (as the same check is performed in both functions). For such cases, we could consider adding an extra method that returns both the Base and the Idx expressions in a std::pair (would there be an efficiency issue here?).

Recently I added a comment to the class definition for
ArraySubscriptExpr (in Sema/Expr.h) noting that the true “base” of the
array may actually be returned by “getIdx()” if the array happened to
be written in the source (for example) as “4[A]” instead of “A[4].” I
think this is confusing, and may cause glitches down the line for
semantic analysis tools built on the frontend.

I agree. I had similar thoughts, and was going to discuss this with you.

I propose renaming the current getBase and getIdx methods to “getLHS”
and “getRHS” respectively, as the current interpretation of these
functions has more do with source code syntax (their position in the
source code) rather than semantics. I then propose adding new getBase
and getIdx functions that look something like this:

Expr *getBase() { return (Base->getType()->isPointerType()) ? Base :
Idx; }
Expr *getIdx() { return (!Idx->getType()->IsPointerType()) ? Idx :
Base; }

Naturally, along with these changes I suggest changing the member
variables “Base” and “Idx” to “LHS” and “RHS” respectively.

Comments? One disadvantage that I see of this change is that using
getBase() and getIdx() in tandem may require some extra unnecessary
checking (as the same check is performed in both functions). For such
cases, we could consider adding an extra method that returns both the
Base and the Idx expressions in a std::pair (would there be an
efficiency issue here?).

I have no problem with the extra checking. I do have another suggestion:

Why not simply change the following…

return new ArraySubscriptExpr(LHSExp, RHSExp, ResultType, RLoc);

to…

return new ArraySubscriptExpr(BaseExpr, IndexExpr, ResultType, RLoc);

? And keep everything else the same…

The only disadvantage with this is the crazy case of idx[base] will always be output as base[idx] (if we are doing a pretty printer or source analysis tool). From my perspective, this loss is extremely minor (and can be fixed one day if necessary). In other words, I think it makes sense for the AST to normalize this case.

Let me know what you think,

snaroff

Steve and I just discussed this in person.

We decided that we will have BaseExpr and IndexExpr refer to the true “base” and “index” respectively (just as in Steve’s post), but then add getLHS() and getRHS() methods to ArraySubscriptExpr that return the lexically “left” and “right” expressions based on comparing their SourceLocations.

This will allow the common case that does semantic analysis on the AST to not have any extra logic when doing getBase() and getIdx(). For tools requiring manipulating expressions at the source level, we have getLHS() and getRHS(). This will be the slightly slower than getBase()/getIdx() (but likely unnoticeable in performance for such applications).

So we ended up talking about this again. As stated before, the solution we will take is to have 4 accessors:

getBase, getIdx, getLHS, and getRHS

Because we cannot easily compare SourceLocations, it is easier for us to remember which expression was the LHS and which expression was the RHS, and then normalize in the accessors getBase and getIdx rather than doing the reverse normalization.

I'll make a patch today that changes ArraySubscriptExpr to have these accessors, and to change the current semantics of getBase and getIdx to do the normalization of the expression. I'll also patch the source code pretty printer, Codegen, etc. (the current clients of this interface). All future clients will follow the new semantics (a big fat comment will be included in ArraySubscriptExpr).

Thanks Ted!

-Chris