Redundant load in llvm's codegen compares to gcc when accessing escaped pointer?

Hi,

Please look at this c code:

typedef struct PB {
void* data; /* required.*/
int f1
;
float f2_;
} PB;

PB** bar(PB** t);

void qux(PB* c) {
bar(&c); /* c is escaped because of bar */
c->f1_ = 0;
c->f2_ = 0.f;
}

// gcc-5.2.1 with -fno-strict-aliasing -O2 on x86
call bar
movq 8(%rsp), %rax
movl $0, 8(%rax)
movl $0x00000000, 12(%rax)

// llvm 3.9.0 with -fno-strict-aliasing -O2 on x86
callq bar
movq (%rsp), %rax
movl $0, 8(%rax)
movq (%rsp), %rax
movl $0, 12(%rax)

You can see that llvm load “c” twice, but gcc only load “c” once.

Of course, in bar function, you may do something very dangerous, e.g.

PB** bar(PB** t) {

t = (PB) t;

}

But gcc doesn’t care bar’s definition.

Is llvm too conservative, or gcc too aggressive in this pattern?

Thanks for your help.

CY

In my opinion, in the face of -fno-strict-aliasing, GCC is being too aggressive. It would be interesting to hear what they think.

-Chris

We discussed this issue briefly on the #gcc IRC channel.
Richard Biener pointed out that bar cannot make c point to &c - 8,
because computing that pointer would be invalid. So c->f1_ cannot
clobber c itself.

Why would computing that pointer be invalid?

(I could imagine, if there was no object behind c to point to it would be invalid - but that’s a dynamic property of the program that the compiler, given this code, can’t prove /isn’t true/ (the programmer might’ve constructed the caller such that it does always have an object behind ‘c’ to point to))

  1. Same question as David, why &c - 8 is invalid? Is it related to below statements In C99 standard?

6.5.3.3:
“Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.”

  1. We are trying to preserve 1st load and remove other loads now, because our test pattern can not get rid of “-fno-strict-aliasing”, and additional loads hurt performance. We did some change in SROA::runOnAlloca, we try to do something like this:

void qux(PB* _c) {
PB* c; <= insert this for original code
bar(&_c);

c = c; <= insert this for original code
c->f1
= 0;
c->f2_ = 0.f;
}

Any opinions please let us know.

Thanks!

CY

I think the argument goes that this is a 20 or 24 byte object, so if you could put something of type PB at c-8, you’d illegally overlap with the object at c.

Thus, there can’t be an object of type PB at c-8.

(IE any valid object must be sizeof(PB) away in either direction, which means it’s not possible for c->f1_ to clobber c no matter what bar does)

1. Same question as David, why &c - 8 is invalid? Is it related to below
statements In C99 standard?

6.5.3.3:
"Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the
address of an object after the
end of its lifetime."

2. We are trying to preserve 1st load and remove other loads now, because
our test pattern can not get rid of "-fno-strict-aliasing", and additional
loads hurt performance. We did some change in SROA::runOnAlloca, we try to
do something like this:

void qux(PB* _c) {
  PB* c; <= insert this for original code
  bar(&_c);

  c = _c; <= insert this for original code
  c->f1_ = 0;
  c->f2_ = 0.f;
}

Not sure I quite understand the question - if we believe GCC's
interpretation to be incorrect/Clang's to be correct, there's nothing our
optimizers can do to correct the code & free us up to do the optimization
GCC does here.

If you're talking about modifying your code to allow Clang to optimize it
better - yes, it seems like if you copy the pointer:

void qux(PB* _c) {
  bar(&_c);
  PB *c;
  c->f1_ = 0;
  c->f2_ = 0.f;
}

Should do the right optimization, because there's no way that 'bar' could
give _c an address that would alias 'c' in any way.

I'm not sure what you're referring to when you mention making changes to
SROA.

- Dave

I *think the argument* goes that this is a 20 or 24 byte object, so if you
*could* put something of type PB at c-8, you'd illegally overlap with the
object at c.

Thus, there can't be an object of type PB at c-8.

(IE any valid object must be sizeof(PB) away in either direction, which
means it's not possible for c->f1_ to clobber c no matter what bar does)

Ah, I'm not sure just how loose no-strict-aliasing is, I figured that would
allow overlapping objects, etc? (if it allows treating memory as both an
int and a float, etc, I wouldn't've guessed it would disallow accessing
part of it as one, part as another, etc - so long as alignment was
preserved) seemed to me like that was the point, but, yeah, I really don't
know much about it.

We are making experimental changes to SROA.

CY

I suspect you should just go ask #1 on the gcc mailing list and see what the answer is.
We are basically trying to figure out their reasoning, but we should instead just go ask what it is :slight_smile:

Agree, and I did : )

Please refer to this mailing list: https://gcc.gnu.org/ml/gcc/2016-03/msg00179.html

Reply from Michael:

&x points to the start of object x, and &x - something (something != 0)
points outside object x. 'c' was a complete object, so &c-8 points
outside any object, hence the formation of that pointer is already
invalid (as is its dereference).

https://gcc.gnu.org/ml/gcc/2016-03/msg00185.html

The rationale given does not seem to square (IMHO) with the ubiquitous practice of having 0- or 1-length array at the end of a struct and then allocating additional elements for it using malloc, or the so-called “struct hack”:

http://c-faq.com/struct/structhack.html

For example:

typedef struct {
enum inst_type type;
unsigned num_ops;
struct operand ops[1];
} inst;

// allocate an instruction with specified number of operands
int *allocate_inst(unsigned num_operands) {
char *mem = malloc(sizeof(inst) + sizeof(struct operand) * (num_operands-1));
return (inst *) mem;
}

Or maybe the reasoning is that computing a pointer off the beginning of something (e.g. &c - X) is somehow worse than computing a pointer off the end of something (e.g. &c + X)?

Than

GCC doesn't break this, AFAIK.

Or at least, the last time i broke it, i had to make it not break it :slight_smile: