Redundant load in llvm's codegen compares to gcc when accessing escaped pointer?

Chuang-Yu_Cheng · March 15, 2016, 2:58pm

Hi,

Please look at this c code:

typedef struct PB {
void* data; /* required.*/
int f1;
float f2_;
} PB;

PB** bar(PB** t);

void qux(PB* c) {
bar(&c); /* c is escaped because of bar */
c->f1_ = 0;
c->f2_ = 0.f;
}

// gcc-5.2.1 with -fno-strict-aliasing -O2 on x86
call bar
movq 8(%rsp), %rax
movl $0, 8(%rax)
movl $0x00000000, 12(%rax)

// llvm 3.9.0 with -fno-strict-aliasing -O2 on x86
callq bar
movq (%rsp), %rax
movl $0, 8(%rax)
movq (%rsp), %rax
movl $0, 12(%rax)

You can see that llvm load “c” twice, but gcc only load “c” once.

Of course, in bar function, you may do something very dangerous, e.g.

PB** bar(PB** t) {

t = (PB) t;

}

But gcc doesn’t care bar’s definition.

Is llvm too conservative, or gcc too aggressive in this pattern?

Thanks for your help.

CY

Chris_Lattner · March 17, 2016, 11:35pm

In my opinion, in the face of -fno-strict-aliasing, GCC is being too aggressive. It would be interesting to hear what they think.

-Chris

Markus_Trippelsdorf · March 18, 2016, 8:28am

We discussed this issue briefly on the #gcc IRC channel.
Richard Biener pointed out that bar cannot make c point to &c - 8,
because computing that pointer would be invalid. So c->f1_ cannot
clobber c itself.

dblaikie · March 18, 2016, 3:24pm

Why would computing that pointer be invalid?

(I could imagine, if there was no object behind c to point to it would be invalid - but that’s a dynamic property of the program that the compiler, given this code, can’t prove /isn’t true/ (the programmer might’ve constructed the caller such that it does always have an object behind ‘c’ to point to))

Chuang-Yu_Cheng · March 18, 2016, 3:33pm

Same question as David, why &c - 8 is invalid? Is it related to below statements In C99 standard?

6.5.3.3:
“Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.”

We are trying to preserve 1st load and remove other loads now, because our test pattern can not get rid of “-fno-strict-aliasing”, and additional loads hurt performance. We did some change in SROA::runOnAlloca, we try to do something like this:

void qux(PB* _c) {
PB* c; <= insert this for original code
bar(&_c);

c = c; <= insert this for original code
c->f1 = 0;
c->f2_ = 0.f;
}

Any opinions please let us know.

Thanks!

CY

Daniel_Berlin1 · March 18, 2016, 3:46pm

I think the argument goes that this is a 20 or 24 byte object, so if you could put something of type PB at c-8, you’d illegally overlap with the object at c.

Thus, there can’t be an object of type PB at c-8.

(IE any valid object must be sizeof(PB) away in either direction, which means it’s not possible for c->f1_ to clobber c no matter what bar does)

dblaikie · March 18, 2016, 3:47pm

1. Same question as David, why &c - 8 is invalid? Is it related to below
statements In C99 standard?

6.5.3.3:
"Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the
address of an object after the
end of its lifetime."

2. We are trying to preserve 1st load and remove other loads now, because
our test pattern can not get rid of "-fno-strict-aliasing", and additional
loads hurt performance. We did some change in SROA::runOnAlloca, we try to
do something like this:

void qux(PB* _c) {
  PB* c; <= insert this for original code
  bar(&_c);

  c = _c; <= insert this for original code
  c->f1_ = 0;
  c->f2_ = 0.f;
}

Not sure I quite understand the question - if we believe GCC's
interpretation to be incorrect/Clang's to be correct, there's nothing our
optimizers can do to correct the code & free us up to do the optimization
GCC does here.

If you're talking about modifying your code to allow Clang to optimize it
better - yes, it seems like if you copy the pointer:

void qux(PB* _c) {
  bar(&_c);
  PB *c;
  c->f1_ = 0;
  c->f2_ = 0.f;
}

Should do the right optimization, because there's no way that 'bar' could
give _c an address that would alias 'c' in any way.

I'm not sure what you're referring to when you mention making changes to
SROA.

- Dave

dblaikie · March 18, 2016, 3:50pm

I *think the argument* goes that this is a 20 or 24 byte object, so if you
*could* put something of type PB at c-8, you'd illegally overlap with the
object at c.

Thus, there can't be an object of type PB at c-8.

(IE any valid object must be sizeof(PB) away in either direction, which
means it's not possible for c->f1_ to clobber c no matter what bar does)

Ah, I'm not sure just how loose no-strict-aliasing is, I figured that would
allow overlapping objects, etc? (if it allows treating memory as both an
int and a float, etc, I wouldn't've guessed it would disallow accessing
part of it as one, part as another, etc - so long as alignment was
preserved) seemed to me like that was the point, but, yeah, I really don't
know much about it.

Chuang-Yu_Cheng · March 18, 2016, 4:29pm

We are making experimental changes to SROA.

CY

Daniel_Berlin1 · March 18, 2016, 5:25pm

I suspect you should just go ask #1 on the gcc mailing list and see what the answer is.
We are basically trying to figure out their reasoning, but we should instead just go ask what it is

Chuang-Yu_Cheng · March 19, 2016, 2:37am

Agree, and I did : )

Please refer to this mailing list: https://gcc.gnu.org/ml/gcc/2016-03/msg00179.html

Chuang-Yu_Cheng · March 22, 2016, 3:41pm

Reply from Michael:

&x points to the start of object x, and &x - something (something != 0)
points outside object x. 'c' was a complete object, so &c-8 points
outside any object, hence the formation of that pointer is already
invalid (as is its dereference).

https://gcc.gnu.org/ml/gcc/2016-03/msg00185.html

Than_McIntosh · March 23, 2016, 6:20pm

The rationale given does not seem to square (IMHO) with the ubiquitous practice of having 0- or 1-length array at the end of a struct and then allocating additional elements for it using malloc, or the so-called “struct hack”:

http://c-faq.com/struct/structhack.html

For example:

typedef struct {
enum inst_type type;
unsigned num_ops;
struct operand ops[1];
} inst;

// allocate an instruction with specified number of operands
int *allocate_inst(unsigned num_operands) {
char *mem = malloc(sizeof(inst) + sizeof(struct operand) * (num_operands-1));
return (inst *) mem;
}

Or maybe the reasoning is that computing a pointer off the beginning of something (e.g. &c - X) is somehow worse than computing a pointer off the end of something (e.g. &c + X)?

Than

Daniel_Berlin1 · March 23, 2016, 6:25pm

GCC doesn't break this, AFAIK.

Or at least, the last time i broke it, i had to make it not break it

Topic		Replies	Views
Aliasing confusion LLVM Dev List Archives	2	63	October 7, 2011
For alias analysis, It's gcc too aggressive or LLVM need to improve? LLVM Dev List Archives	14	68	August 20, 2014
GCC vs. LLVM difference on simple code example LLVM Dev List Archives	2	65	March 25, 2011
Stupid '-load-vn -licm' question (LLVM 1.6) LLVM Dev List Archives	7	57	March 19, 2006
Question on Aliasing and invariant load hoisting LLVM Dev List Archives	4	78	July 9, 2019

Redundant load in llvm's codegen compares to gcc when accessing escaped pointer?

Related Topics