AtomicExpandPass and branch weighting

I’m working on a change to the layout algorithm, and I noted that test/CodeGen/ARM/cmpxchg-weak.ll was affected.

Normally, that would be fine, but I noted that the layout changed the fallthrough from the success case to the failure case. I was surprised to see that the success case isn’t annotated with a branch weight by AtomicExpandPass.cpp

Would it make sense to annotate the success case as more likely when we expand the intrinsic to help guarantee that the success case remains the fallthrough? Even a 2:1 or 3:2 weighting would correct the layout issue I noted.

Thanks,
Kyle.

From: "Kyle Butt via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Tim Northover" <t.p.northover@gmail.com>
Cc: "LLVM Developers" <llvm-dev@lists.llvm.org>
Sent: Monday, December 12, 2016 11:30:32 AM
Subject: [llvm-dev] AtomicExpandPass and branch weighting

I'm working on a change to the layout algorithm, and I noted that
test/CodeGen/ARM/cmpxchg-weak.ll was affected.

Normally, that would be fine, but I noted that the layout changed the
fallthrough from the success case to the failure case. I was
surprised to see that the success case isn't annotated with a branch
weight by AtomicExpandPass.cpp

Would it make sense to annotate the success case as more likely when
we expand the intrinsic to help guarantee that the success case
remains the fallthrough?

Certainly makes sense to me.

-Hal

Seems reasonable.

I'd note additionally that on some architectures, that the success block *must* be the fallthrough case (that is to say: you must not have any taken branches between the load-linked and store-conditional) in order to have an architectural guarantee that two such loops on different CPUs won't livelock against eachother.

Seems reasonable.

I'd note additionally that on some architectures, that the success block
*must* be the fallthrough case (that is to say: you must not have any taken
branches between the load-linked and store-conditional) in order to have an
architectural guarantee that two such loops on different CPUs won't
livelock against eachother.

Do we have a way to *require* that 2 blocks be laid out consecutively? I
don't think that we do. A hint is better than nothing, but not a guarantee.

Not as far as I know. (Of course, we also ought to be requiring that
there's no extraneous loads/stores between the ll/sc, which we also cannot
do, especially across basic blocks, but also need to do.)

I mostly mentioned that as a B.T.W. :slight_smile: