Copy to/from volatile structure

We have a bunch of embedded C source manipulating structures of bitfields. For example

typedef struct {
  int a : 4;
  int b : 8;
  int c : 6;
} Obj;

The objects themselves are volatile, being accessed across concurrent threads:

void Frob (Obj volatile *f, int a, int b) {
  f->a = a, f->b = b;
}

This behaves as poorly as one might expect volatile bitfields to behave. A read-insert-write for F::a and then a read-insert-write for F::b.

One way to alleviate this is to copy to a temporary. Update the temp and then copy back. Fortunately the Frob functions appear to affect (nearly?) all the storage units of Obj, so we don’t end up with (many?) unnecessary reads/writes of unaffected elements of Obj.

void Frob (Obj volatile *obj, int a, int b) {
   Obj tmp = *obj;
   tmp.a = a, tmp.b = b;
   *obj = tmp;
}

this is all fine and dandy, except llvm represents the structure copies using llvm.memcpy with isvolatile argument as true.

declare void @llvm.memcpy.p0.p0.i32(ptr <dest>, ptr <src>,
                                    i32 <len>, i1 <isvolatile>)

(LLVM Language Reference Manual — LLVM 18.0.0git documentation)
That memcpy is later lowered to actual loads and stores, all marked volatile as we’ve lost information about whether only the src or the dst is volatile.

That’s bad because we end up with a temporary taking up a stack slot and have explicit reads & writes to it – both from the memcpy and for the intervening bitfield manipulation code.

If we directly express the copy in the source using (alias-safe) casts, we get much better code generated:

  1. a volatile read of *obj to a register
  2. a bunch of bit manipulation in registers
  3. a volatile write to *obj at the end

This is hard to get right though. One has to notice all the copies at the source level, understand the aliasing rules and insert the appropriate casts and copies. (Remember, this is C not C++. No templates, no overloading.)

It would be better to augment llvm.memcpy intrinsics to express the independent volatility of src and dst.

A previous discussion, [RFC] volatile mem* builtins, discusses adding clang builtins to express volatile memcpy builtins, and touches on some of the issues raised here (like the under-specification of what a ‘volatile operation’ is).

AFACIT there are essentially few ways of making such an augmentation:

  1. a newly named intrinsic.

  2. add a pair of parameters to express the separate src & dst volatilities, leaving the isvolatile parm as the inclusive or.

  3. change the isvolatile parameter from i1 to (say) i2 and express 2 bits of volatility. This would have the property that a non-zero isvolatile would express the same meaning as currently (but isvolatile wouldn’t really be a suitable name any more.

Any thoughts on this or which approach is likely to be more successful? (success == get upstream)

(The full set of intrinsics this applies to are {,inline.}mem{cpy,move} I think.)

Volatile is supposed to mean "everywhere the pointer is dereferenced in right-hand-context the location is loaded again, and every time it is dereferenced in left-hand-context a store is performed.

But I agree with the notion that the volatility of to and the volatility of the from should be independent.

void Frob (Obj volatile *f, int a, int b) {
f->a = a, f->b = b;
}
should translate into::

LD Rt,[Rf+a]
INS Rf,Ra,<offset(a),size(a)>
ST Rt,[Rf+a]
/// comment for later–consider a context switch occurring right here
LD Rt,[Rf+b]
INS Rf,Rb,<offset(b),size(b)>
ST Rt,[Rf+b]

If an interested 3rd party also has access to where f is pointing and observes the object
where the comment above was inserted, he should see that field a has been updated but
not field b.

You can change the code to use it the way volatile is supposed to be used::

void Frob (Obj volatile *f, int a, int b) {
Obj g = f, m = {0,0,~0};
g->a = a, g->b = b; g->c = 0;
f = g | (m & f);
}

Here, an interested 3rd party will not see intermediate state only initial and final.

Use volatile sparingly.

Yes, that’s exactly my point – that doesn’t work because to copy of *f into g (I presume you meant to dereference the pointer), because llvm’s representation causes that store of ‘g’ to be marked volatile. I’m well aware of the semantics of volatile.

a) yes I did mean to use *f instead of f sorry…

b) if Obj g = f; causes g to become volatile then something is fishy wrt volatile; g is not volatile as typed.
perhaps::

void Frob (Obj volatile *f, int a, int b) {
Obj g = {a,b,f->c};
*f = g;
}

This should get 1 read of *f and 1 write to *f. But g should not get the volatle attribute(type) because it is not typed to be volatile. Perhaps even as simple as::

void Frob (Obj volatile *f, int a, int b) {
*f = {a, b, f->c};
}

I suspect the lowest-friction way forward is to do #1, and deprecate the old spellings. This minimizes the chance that downstream frontends could mis-compile due to a change in that parameter’s semantics, and gives a nice “hey, you should think about this” intermission to the upgrade path.

+1 for overall support

Note, that there was already patch that implemented separate dest/src volatile flags but it sadly got abandoned: :gear: D386 Specify the access behaviour of the memcpy, memmove and memset intrinsics (llvm.org)

If we’re going to make a change here (which has high churn in terms of test impact), I think we should consider splitting the volatile and non-volatile memory intrinsics entirely. The problem with the status quo is that the attributes we declare on the intrinsics (things like argmemonly) are invalid if the volatile flag is set. This means that code treating these intrinsics generically does not respect the volatile qualifier. And code that handles memory intrinsics in particular pretty much always starts with an isVolatile() bailout, so we don’t get much out of sharing here.

I did consider that, but it’s a bigger change so went the simpler route – I’m fine going that route with sufficient buy-in.

I didn’t know that about the attributes.

If we go this route, some of these intrinsics don’t really seem good to have volatile variants (i.e. memmove --but of course maybe a backward compatibility issue?)

… though of course, with better knowledge of src vs dst volatility, existing optimizations can do more things (like memcpy sequence reductions)

Instead of splitting the intrinsic into volatile and non-volatile, an alternative is to mark a call to memcpy as argmemonly only if it’s non-volatile (instead of marking the memcpy declaration as argmemonly which means all calls to memcpy are argmemonly).

I’m unsure how much work it would be to split the intrinsic, but it seems like it’d be more work than the above without really being much cleaner.

Interesting idea.

Can you mark the call site that way? I thought those attributes applied to the callee’s declaration. (So this suggestion implies splitting the intrinsics)

But, I think omitting argmemonly from volatile calls is too heavy a hammer – it’d prevent any reordering of memory accesses (and suitable intrinsics) across a volatile call. I did think ‘inaccessiblememorargmemonly’ would work, but that’d permit reordering of a volatile memcpy and a (non aliasing) volatile load/store insn. (that’d be permitted if there was no sequence point between them, but I don’t think we model sequence points suitably in the IR.)

Looking at the IR for a fns that do nonvolatile vs volatile loads I see:

; Function Attrs: mustprogress nofree norecurse nounwind willreturn memory(argmem: readwrite, inaccessiblemem: readwrite) uwtable
define dso_local i32 @ReadIV(ptr noundef %f) local_unnamed_addr #2 {
entry:
  %0 = load volatile i32, ptr %f, align 4, !tbaa !6
  ret i32 %0
}

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) uwtable
define dso_local i32 @ReadI(ptr nocapture noundef readonly %f) local_unnamed_addr #1 {
entry:
  %0 = load i32, ptr %f, align 4, !tbaa !6
  ret i32 %0
}

The volatile reader, ReadIV has been annotated with memory(argmem: readwrite, inaccessiblemem: readwrite) as opposed to the non-volatile ReadI gets memory(argmem: read). The inaccessiblemem attribute supports my earlier speculation that that was the way to go to mark volatile memcpy, and that we need separate builtins because these are callee attributes.

I’m not sure why the volatile reader gets argmem: readwrite, that seems excessive.