We have a bunch of embedded C source manipulating structures of bitfields. For example
typedef struct {
int a : 4;
int b : 8;
int c : 6;
} Obj;
The objects themselves are volatile, being accessed across concurrent threads:
void Frob (Obj volatile *f, int a, int b) {
f->a = a, f->b = b;
}
This behaves as poorly as one might expect volatile bitfields to behave. A read-insert-write for F::a
and then a read-insert-write for F::b
.
One way to alleviate this is to copy to a temporary. Update the temp and then copy back. Fortunately the Frob
functions appear to affect (nearly?) all the storage units of Obj
, so we don’t end up with (many?) unnecessary reads/writes of unaffected elements of Obj
.
void Frob (Obj volatile *obj, int a, int b) {
Obj tmp = *obj;
tmp.a = a, tmp.b = b;
*obj = tmp;
}
this is all fine and dandy, except llvm represents the structure copies using llvm.memcpy
with isvolatile
argument as true
.
declare void @llvm.memcpy.p0.p0.i32(ptr <dest>, ptr <src>,
i32 <len>, i1 <isvolatile>)
(LLVM Language Reference Manual — LLVM 18.0.0git documentation)
That memcpy is later lowered to actual loads and stores, all marked volatile as we’ve lost information about whether only the src or the dst is volatile.
That’s bad because we end up with a temporary taking up a stack slot and have explicit reads & writes to it – both from the memcpy and for the intervening bitfield manipulation code.
If we directly express the copy in the source using (alias-safe) casts, we get much better code generated:
- a volatile read of
*obj
to a register - a bunch of bit manipulation in registers
- a volatile write to
*obj
at the end
This is hard to get right though. One has to notice all the copies at the source level, understand the aliasing rules and insert the appropriate casts and copies. (Remember, this is C not C++. No templates, no overloading.)
It would be better to augment llvm.memcpy
intrinsics to express the independent volatility of src and dst.
A previous discussion, [RFC] volatile mem* builtins, discusses adding clang builtins to express volatile memcpy builtins, and touches on some of the issues raised here (like the under-specification of what a ‘volatile operation’ is).
AFACIT there are essentially few ways of making such an augmentation:
-
a newly named intrinsic.
-
add a pair of parameters to express the separate src & dst volatilities, leaving the
isvolatile
parm as the inclusive or. -
change the
isvolatile
parameter fromi1
to (say)i2
and express 2 bits of volatility. This would have the property that a non-zeroisvolatile
would express the same meaning as currently (butisvolatile
wouldn’t really be a suitable name any more.
Any thoughts on this or which approach is likely to be more successful? (success == get upstream)
(The full set of intrinsics this applies to are {,inline.}mem{cpy,move}
I think.)