ASAN reporting heap overrun when doing a partial store to extended vector

Does the following code have undefined behavior?

$ cat test.c

typedef attribute((ext_vector_type(32))) unsigned short vec32;
typedef attribute((ext_vector_type(16))) unsigned short vec16;

void writeVec(vec32 *data) {
vec16 value = 0xffff;
data->lo = value;
}

void foo1() {
vec32 *p = (vec32 *)malloc(sizeof(unsigned short) * 16);
writeVec(p);
}

The code above causes ASAN to report a heap overrun because clang creates a vector with 32 elements using vector shuffling and writes it back via the pointer passed to writeVec.

In case it wasn’t clear, malloc is allocating memory that is large enough to contain only the first 16 elements of a vec32. writeVec loads ‘data’ as a vec32, writes ‘value’ to the first 16 elements of the vector, and stores the vec32 vector via pointer ‘data’.

It’s not clear to me whether this is an error in the source code or IRGen.

Unfortunately, the OpenCL standard doesn’t really state anything explicitly here.

clang does in fact lower the construct in question to a load+shuffle+store, so it’s likely to fail in practice.

-Eli

It’s possible to create a similar example using structs:

$ cat test.c

struct S0 {
  int a, b;
};

typedef struct S0 S0;

void foo1(void) {
  // allocate only 4 bytes.
  void *p = malloc(sizeof(int));
  S0 *p2 = (S0*)p;
  p2->a = 1;
}

If the code above is correct, I suppose the ext_vector code is correct too.

Your example is arguably UB, but really, the interesting question is whether we’re allowed to introduce a data race on the unmodified elements. For the struct case, we aren’t. clang intentionally diverges from that for vector components.

One reason clang assumes it’s allowed to write to all the vector elements is that the lowering is a bit dubious otherwise. For example, we don’t want to be forced to lower “vec->even = 1” on a char16 vector to 8 store instructions.

-Eli

Your example is arguably UB, but really, the interesting question is whether we’re allowed to introduce a data race on the unmodified elements. For the struct case, we aren’t. clang intentionally diverges from that for vector components.

One reason clang assumes it’s allowed to write to all the vector elements is that the lowering is a bit dubious otherwise. For example, we don’t want to be forced to lower “vec->even = 1” on a char16 vector to 8 store instructions.

I guess you can still write to ‘vec->lo' or ‘vec->hi' without touching the unmodified elements using a single store? Although that complicates the rules since we are guaranteeing to the users ‘vec->lo/hi’ won’t touch the unmodified elements while ‘vec->even’ or ‘vec->xz’ can.