PR 5723

I just filed PR 5723. This is a rather serious bug for us,
causing all sorts of problems in creating dynamically-linked
C++ programs due to the C++ runtime containing lots of leaf-like
routines that use thread-local storage.

I can imagine a number of hackish workarounds, but I think probably
the right way to go is to mark routines with thread-local storage
accesses in them as non-leaf. I guess that would have to happen in
the PrologueEpilogueInserter.

Is there an easy way to tell if a MachineFunction uses TLS without doing
a full scan of the body? Perhaps SelectionDAG will have to mark the
function somehow.

                              -Dave

How does a MachineInstr acquire a CallFrameSetupOpcode? I can't find
anywhere in the sources or generated files where that opcode is set.
I believe we need to generate that instruction if it does not exist
(along with CallFrameDestroyOpcode) in the presence of TLS, at least
on X86 where's it's implemented via a function call.

                                -Dave

For X86 CALLSEQ_START gets selected to ADJCALLSTACKDOWN or
ADJCALLSTACKDOWN64 in this case. So is CALLSEQ_START expected
to appear only once (at the top of the function)? The comments
are rather confusing. It seems like CALLSEQ_START is supposed
to appear before every call, but surely there's only one stack
adjustment in the final code.

How does this all work? How do I convince codegen to adjust the
stack even though it thinks it's a leaf routine? Where do I have
to toggle things to either have it generate the stack adjustment or
not delete it?

                           -Dave

Hello, David

For X86 CALLSEQ_START gets selected to ADJCALLSTACKDOWN or
ADJCALLSTACKDOWN64 in this case. So is CALLSEQ_START expected
to appear only once (at the top of the function)? The comments
are rather confusing. It seems like CALLSEQ_START is supposed
to appear before every call, but surely there's only one stack
adjustment in the final code.

Right. Every call is surrounded by callseq_{start,end} nodes. The
difference here is that call to special symbol is not real, it's not
generated by lowering of the ISD::CALL node, thus no callseq stuff is
ever generated.

You might want to look into TLS lowering code inside
X86ISelLowering.cpp to introduce yourself into this this.

With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

I looked at that but it's not very helpful. It just creates some generic
instructions that get emitted as a code sequence. It doesn't say anything
about the stack frame.

Where in the backend is the operation to create and destroy the stack frame
either inserted or deleted? That's where I need to make a change.

                                    -Dave

Ah, ok. I think I know what I have to do. I'll put callseq_start/end nodes
around the TLS addressing.

                                    -Dave

Hello, David

Ah, ok. I think I know what I have to do. I'll put callseq_start/end nodes
around the TLS addressing.

Don't do that (tm)

I'm testing the fix.

> Ah, ok. I think I know what I have to do. I'll put callseq_start/end nodes
> around the TLS addressing.
Don't do that (tm)
I'm testing the fix.

As a workaround for now - just mark all TLS-using functions as
"noredzone".

Hello, David

> Ah, ok. I think I know what I have to do. I'll put callseq_start/end
> nodes around the TLS addressing.

Don't do that (tm)

Why not?

I'm testing the fix.

Ok, I'll try to "mark noredzone" thing and see if that works.

                                -Dave

Hello, David

Don't do that (tm)

Why not?

They will be eliminated w/o any visible effect.

I'm testing the fix.

Ok, I'll try to "mark noredzone" thing and see if that works.

It does, I verified this with the testcase you attached to PR. The
problem is not stack allocation, but red zone thing.
As I already said, I'm testing the fix; since it touches some target
independent code as well, I need to be sure that I haven't screwed
stuff on other targets.

FYI, it worked for our implementation in LLVM 2.5. I haven't tried porting it
to TOT.

                             -Dave

Hello, David

>> Don't do that (tm)
>
> Why not?

They will be eliminated w/o any visible effect.

Hmm...As I said I added them in our 2.5 implementation and they didn't
get eliminated. The stack gets properly adjusted and everyone's happy.

Our 2.5 implementation creates the call to __tls_get_addr manually in the
X86ISelLowering code. In TOT special SDNodes are generated which get
matched by TableGen patterns and dump out a fixed sequence of instructions
which includes the call. I'm guessing that's the critical difference.

Is there a reason that route was taken in TOT? Why have an SDNode represent
a bunch of instructions rather than just "properly" creating them?

>> I'm testing the fix.
>
> Ok, I'll try to "mark noredzone" thing and see if that works.

It does, I verified this with the testcase you attached to PR. The
problem is not stack allocation, but red zone thing.

Well, we shouldn't be using the redzone because any function that uses
TLS is not a leaf (at least on X86). Simply marking the function "noredzone"
seems like a bit of a hack to me. How do we guarantee some other piece of
code that thinks the function is a leaf won't do something else to screw
things up?

                             -Dave

Is there a reason that route was taken in TOT? Why have an SDNode represent
a bunch of instructions rather than just "properly" creating them?

Yes. We need to be sure that "special" code is glued properly to the
call (mostly due to presence of post-RA scheduler). And technically
this "bunch of instructions" is a whole, not something + call.

Well, we shouldn't be using the redzone because any function that uses
TLS is not a leaf (at least on X86). Simply marking the function "noredzone"
seems like a bit of a hack to me. How do we guarantee some other piece of
code that thinks the function is a leaf won't do something else to screw
things up?

As I said, this is temporary workaround until I will commit a fix
(just in the case if you're in hurry).

> Is there a reason that route was taken in TOT? Why have an SDNode
> represent a bunch of instructions rather than just "properly" creating
> them?

Yes. We need to be sure that "special" code is glued properly to the
call (mostly due to presence of post-RA scheduler). And technically

Isn't that what the chain/flag is for?

this "bunch of instructions" is a whole, not something + call.

Conceptually, yes, but it ends up being multiple instructions. Doing
things like this can REALLY screw up certain kinds of optimizations. I
can't tell you how much pain I went through in school fixing a code generator
so I could properly do software instruction prefetching. It's just generally
a bad idea to do pseudo-ops like this because the IR doesn't represent what's
actually going to be executed.

> Well, we shouldn't be using the redzone because any function that uses
> TLS is not a leaf (at least on X86). Simply marking the function
> "noredzone" seems like a bit of a hack to me. How do we guarantee some
> other piece of code that thinks the function is a leaf won't do something
> else to screw things up?

As I said, this is temporary workaround until I will commit a fix
(just in the case if you're in hurry).

Ah, ok.

                               -Dave