Some quibbles: I think that the SC-DRF discussion might be a bit of a red herring (this implementation strategy won’t result in relaxed accesses providing that guarantee anyways; code will still be difficult to analyze). Likewise, for better or worse, the C++ memory model isn’t (yet, at least) RC11: we effectively decided not to require SB union RF to be acyclic. So I don’t know that it’s necessarily right to call transformations that break those rules incorrect.
That being said, I’m interested in this as an experiment; I’ve come around to the belief that OOTA is not solvable without some runtime costs or heavyweight syntactic constructs, so we might as well get some data on the runtime side of things.
Two other thoughts:
-
Historically, hardware vendors were opposed to require load-to-store ordering. I see you’ve talked to Arm people, but I wonder what the Power people think, as well as GPU vendors.
-
I think the risk here is the language fork one. There are ways in which this is analogous to default 0 initialization.