Copy to/from volatile structure

We have a bunch of embedded C source manipulating structures of bitfields. For example

typedef struct {
  int a : 4;
  int b : 8;
  int c : 6;
} Obj;

The objects themselves are volatile, being accessed across concurrent threads:

void Frob (Obj volatile *f, int a, int b) {
  f->a = a, f->b = b;
}

This behaves as poorly as one might expect volatile bitfields to behave. A read-insert-write for F::a and then a read-insert-write for F::b.

One way to alleviate this is to copy to a temporary. Update the temp and then copy back. Fortunately the Frob functions appear to affect (nearly?) all the storage units of Obj, so we don’t end up with (many?) unnecessary reads/writes of unaffected elements of Obj.

void Frob (Obj volatile *obj, int a, int b) {
   Obj tmp = *obj;
   tmp.a = a, tmp.b = b;
   *obj = tmp;
}

this is all fine and dandy, except llvm represents the structure copies using llvm.memcpy with isvolatile argument as true.

declare void @llvm.memcpy.p0.p0.i32(ptr <dest>, ptr <src>,
                                    i32 <len>, i1 <isvolatile>)

(LLVM Language Reference Manual — LLVM 18.0.0git documentation)
That memcpy is later lowered to actual loads and stores, all marked volatile as we’ve lost information about whether only the src or the dst is volatile.

That’s bad because we end up with a temporary taking up a stack slot and have explicit reads & writes to it – both from the memcpy and for the intervening bitfield manipulation code.

If we directly express the copy in the source using (alias-safe) casts, we get much better code generated:

  1. a volatile read of *obj to a register
  2. a bunch of bit manipulation in registers
  3. a volatile write to *obj at the end

This is hard to get right though. One has to notice all the copies at the source level, understand the aliasing rules and insert the appropriate casts and copies. (Remember, this is C not C++. No templates, no overloading.)

It would be better to augment llvm.memcpy intrinsics to express the independent volatility of src and dst.

A previous discussion, [RFC] volatile mem* builtins, discusses adding clang builtins to express volatile memcpy builtins, and touches on some of the issues raised here (like the under-specification of what a ‘volatile operation’ is).

AFACIT there are essentially few ways of making such an augmentation:

  1. a newly named intrinsic.

  2. add a pair of parameters to express the separate src & dst volatilities, leaving the isvolatile parm as the inclusive or.

  3. change the isvolatile parameter from i1 to (say) i2 and express 2 bits of volatility. This would have the property that a non-zero isvolatile would express the same meaning as currently (but isvolatile wouldn’t really be a suitable name any more.

Any thoughts on this or which approach is likely to be more successful? (success == get upstream)

(The full set of intrinsics this applies to are {,inline.}mem{cpy,move} I think.)

Volatile is supposed to mean "everywhere the pointer is dereferenced in right-hand-context the location is loaded again, and every time it is dereferenced in left-hand-context a store is performed.

But I agree with the notion that the volatility of to and the volatility of the from should be independent.

void Frob (Obj volatile *f, int a, int b) {
f->a = a, f->b = b;
}
should translate into::

LD Rt,[Rf+a]
INS Rf,Ra,<offset(a),size(a)>
ST Rt,[Rf+a]
/// comment for later–consider a context switch occurring right here
LD Rt,[Rf+b]
INS Rf,Rb,<offset(b),size(b)>
ST Rt,[Rf+b]

If an interested 3rd party also has access to where f is pointing and observes the object
where the comment above was inserted, he should see that field a has been updated but
not field b.

You can change the code to use it the way volatile is supposed to be used::

void Frob (Obj volatile *f, int a, int b) {
Obj g = f, m = {0,0,~0};
g->a = a, g->b = b; g->c = 0;
f = g | (m & f);
}

Here, an interested 3rd party will not see intermediate state only initial and final.

Use volatile sparingly.

Yes, that’s exactly my point – that doesn’t work because to copy of *f into g (I presume you meant to dereference the pointer), because llvm’s representation causes that store of ‘g’ to be marked volatile. I’m well aware of the semantics of volatile.

a) yes I did mean to use *f instead of f sorry…

b) if Obj g = f; causes g to become volatile then something is fishy wrt volatile; g is not volatile as typed.
perhaps::

void Frob (Obj volatile *f, int a, int b) {
Obj g = {a,b,f->c};
*f = g;
}

This should get 1 read of *f and 1 write to *f. But g should not get the volatle attribute(type) because it is not typed to be volatile. Perhaps even as simple as::

void Frob (Obj volatile *f, int a, int b) {
*f = {a, b, f->c};
}

I suspect the lowest-friction way forward is to do #1, and deprecate the old spellings. This minimizes the chance that downstream frontends could mis-compile due to a change in that parameter’s semantics, and gives a nice “hey, you should think about this” intermission to the upgrade path.

+1 for overall support

Note, that there was already patch that implemented separate dest/src volatile flags but it sadly got abandoned: :gear: D386 Specify the access behaviour of the memcpy, memmove and memset intrinsics (llvm.org)

If we’re going to make a change here (which has high churn in terms of test impact), I think we should consider splitting the volatile and non-volatile memory intrinsics entirely. The problem with the status quo is that the attributes we declare on the intrinsics (things like argmemonly) are invalid if the volatile flag is set. This means that code treating these intrinsics generically does not respect the volatile qualifier. And code that handles memory intrinsics in particular pretty much always starts with an isVolatile() bailout, so we don’t get much out of sharing here.

I did consider that, but it’s a bigger change so went the simpler route – I’m fine going that route with sufficient buy-in.

I didn’t know that about the attributes.

If we go this route, some of these intrinsics don’t really seem good to have volatile variants (i.e. memmove --but of course maybe a backward compatibility issue?)

… though of course, with better knowledge of src vs dst volatility, existing optimizations can do more things (like memcpy sequence reductions)