TSAN hack on AArch64 for Android

Folks,

The review of patch http://reviews.llvm.org/D11532 is extremely slow
due to the number of hacks, left-overs and general undesired changes
and style that the submission has. That happens, and it's ok when the
overall direction the patch is going was agreed, and is acceptable as
generally good.

But this is not the case.

To wake up the elephant in the room, do we really think that adding
such a hack to an LLVM project for the sake of saving Android the
burden of carrying such hack is *acceptable*?

The only "reason" given to add such a hack was that, and I quote:
* "what are we to do for NOW when the "proper" fix is maybe months
off, or possibly longer?"
* "This patch will allow people to experiment with TSAN on their
android devices"
* "don't let the perfect be the enemy of "limping along for a bit"

So, in order to let *some* people *experiment* with TSAN on *Android*,
we're going to consciously make TSAN *limp along* for the foreseeable
future? Is that a wise price to pay for such a far fetched goal?

The "proper" solution seems to be to fix TLS support, which:
* "Ideally, this should be supported with the __thread keyword, but
that is not yet supported, and it is a much bigger chunk of work to
get it integrated into the default android cross compiler."

So, to avoid a larger amount of work on another compiler, we're going
to cripple LLVM?

I cannot see why this would *ever* be a wise decision... Please,
someone enlighten me.

cheers,
--renato

+dvyukov

+dvyukov

Folks,

The review of patch http://reviews.llvm.org/D11532 is extremely slow
due to the number of hacks, left-overs and general undesired changes
and style that the submission has. That happens, and it's ok when the
overall direction the patch is going was agreed, and is acceptable as
generally good.

But this is not the case.

To wake up the elephant in the room, do we really think that adding
such a hack to an LLVM project for the sake of saving Android the
burden of carrying such hack is *acceptable*?

Hi Renato,

What exactly hack/hacks do you mean?

Hi Dmitry,

The sanitizer code seems to be more accepting to adding complicated
pre-processor logic, as well as a number of alternative functions to
work around various inefficiencies in other systems, but in other LLVM
projects, such as Clang and the back-ends, we tend to be less
accepting of work-arounds being implemented just for the same of a
test.

This case adds an immense complexity (see the size of the patch), and
it has an immense number of odd assumptions (see the amount of
negative feedback on the patch), that may never have been tested in
other architectures (see some comments about generalising the wrong
stuff). All of that, to be able to "experiment" with TSAN on Android
in AArch64.

See, if TSAN was completely broken in AArch64, or if other people
weren't trying to push it upstream, I'd understand someone throwing a
bunch of hacked files that "works on their boxes", if that's disabled
by default and is only supposed to work when invoked. But none of that
is true.

TSAN works to an extent in AArch64, and we have been pushing some
patches to prove that. With ASAN live on the buildbots, and the
builtins working, we'll now focus on TSAN to get it being tested in
the buildbots, too. That's Linux AArch64, of course, as we don't test
Android at all.

Now, the oddities with Android is that bionic and the rest of the
run-time is crippled in ways that no other system I know gets close.
Layers upon layers of libraries, aliases, asm hackery, just to make it
build. Adding Clang to the list of compilers did not make it better.

This comment baffles me: "Ideally, this should be supported with the
__thread keyword, but that is not yet supported, and it is a much
bigger chunk of work to get it integrated into the default android
cross compiler."

AFAIK, __thread is supported by GCC and Clang, and this seems to be a
problem in Android's linker. So, to work around a problem in Android's
linker, we're adding a huge pile of code to LLVM for a temporary fix
because of a possible experimentation in Android?

The amount of work being done, and still required, to get this patch
upstream in any decent shape or form will take many engineer-months,
probably the same as it would to actually fix the linker in the first
place. Especially if you add up the work that will be required to
*remove* all that after the linker is fixed.

But I'm being over-optimistic here. I'm old enough to know that,
whenever someone introduces a "work-around that works", the
probability of anyone fixing the original problem is zero. And the
Android code that I had the displeasure to see, so far, has shown me
that this is ultimately true on their side. And that's why I'm
baffled, because we're moving that mentality, that work arounds are
ok, no matter how ugly they are, into LLVM, and everyone seems to be
ok with it.

If everyone is ok with it, I'll just shut up. But I'd like to make
sure that everyone is ok to do so.

My tuppence.

--renato

Hi Renato,

So the problem is only with the tls emulation, right?

I don't know what it worth to add __thread support to android, and
whether somebody is going to address it soon.
But I think we can make the tls emulation _much_ cleaner. First, we
need to put all tsan per-thread data into ThreadState object, accesses
to ThreadState are already wrapped into cur_thread() function. So now
we have a single function to modify, no macros spread across the
codebase, no new files, etc. Then, I've looked at pthread_getspecific
code in bionic and it is actually quite fast and does not call any
other system function (except get_thread() which is basically an
access to real tls). So we can use it, and then the whole support
becomes merely:

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState *>(pthread_getspecific(thread_key));
#endif
}

Which looks quite acceptable to me.

Also we could try to ask android to dedicate a real tls slot for
sanitizers (I don't know what are the chances, but ART obviously get a
one). Then we can change the support to:

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
#endif
}

Which is roughly the same code-wise, but faster and safer.

Am I missing something?

Also we could try to ask android to dedicate a real tls slot for
sanitizers (I don’t know what are the chances, but ART obviously get a
one).

Done: https://android-review.googlesource.com/#/c/167420/

So the problem is only with the tls emulation, right?

No. It's with the whole patch, including its contents, how it was
submitted, what is its purpose and how nobody cared about any of this.

Adding TLS emulation is, in its own, a worthy feature. It allows for
platforms that can't do full TLS to use TSAN, which is a great tool.

But when you want to add a feature to a tool that runs on multiple
architectures, you plan the feature, you design it, you send an RFC
email, you discuss in the list.

Then you go about the implementation, submit some preparation patches,
clean up code, and have some more discussions. For a good example, see
Daniel Sanders' "The trouble with triples" and the associated patches.
It takes a lot of time to conjure a new feature in a complex piece of
software, but more than that, needs consensus and people designing it
beforehand.

But when someone dumps a patch that is nowhere near production
quality, or would even compile in most platforms, people should be
backing up and saying: "Wait a minute. What are you trying to do, and
how did you get here?". We had a similar issue a few months back with
the new Stride Vectorizer. Fantastic feature, one that I had worked
before, but it took us a whole month of "code review" to figure out
that the whole patch was just wrong, and had to be completely
re-written, from where we had stopped two years back.

I'm doing the same now. Let's make sure to agree on what's being
proposed, THEN we'll agree if that's worthy or not, THEN we'll agree
what's the best implementation, THEN we'll review code. We can't start
from the last iteration, as that'll invariably lead to poor code going
in.

Furthermore, you don't get TLS+TSAN+AArch64+Android support in a
single patch! The likelihood that we'll end up reverting it a hundred
times is really high, and that will completely unbalance our buildbot
infrastructure and break everyone else's workflow. Slow is faster than
ludicrous. One step at a time, please.

First, let's get TSAN working on AArch64. Adhemerval is working on
that. Then, let's see what's missing for Android. Ok, TLS emulation is
required, at least for the time being. What other architecture would
benefit from TLS emulation? None? Ok, then we set out to change the
code, so that TLS emulation can go in. Meanwhile, a group of people
discuss (in public) the design of such framework, and in time, a
number of patches will start to go in, one by one, until the feature
is on. Only then we'll get Android using it.

The time will also give buildbots enough iterations to pick any bug
that we don't find on code review. Bugs that would be obscured by a
giant patch with many other more obvious bugs exploding our CI
infrastructure and getting reverted by angry bot maintainers.

I don't know what it worth to add __thread support to android, and
whether somebody is going to address it soon.

Wouldn't it be better to know that before submitting a large change to
another code base? If there is evidence that this is never going to
happen, or that, for some very strong engineering reasons, it's not
possible to do it, then the reasons to do TLS emulation are thoroughly
justified. Right now, it's just a hunch.

But I think we can make the tls emulation _much_ cleaner. First, we
need to put all tsan per-thread data into ThreadState object, accesses
to ThreadState are already wrapped into cur_thread() function. So now
we have a single function to modify, no macros spread across the
codebase, no new files, etc.

I'm no expert in that field, but this sounds like a good plan. Doesn't
seem to be part of this patch, though. Nor it should, as it actually
should have been a preparatory patch, submitted after all interested
parties agreed that TLS emulation is the way to go.

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
#endif
}

Which is roughly the same code-wise, but faster and safer.

Dan "Eastwood" Albert has pulled the trigger. :slight_smile:

This looks reasonable to me. Does this still mean you need the
emulation, or is Android going to give you a "reasonably looking
ThreadState"?

cheers,
--renato

http://reviews.llvm.org/D12001 and related changes?

Joerg

It hasn’t been obvious to me whether people mean real ELF TLS when they talk about __thread support, or just that we need the keyword to be supported somehow. The patch set that Joerg mentioned will do the TLS emulation via pthread_getspecific for us in the compiler/rtlib. If that’s sufficient then we don’t actually need any of that in TSAN, and neither will any other platform that lacks ELF TLS.

This seems like a simple and elegant solution to me. And it also
dismiss the notion that a proper implementation is always slower than
a hack, or that cross-project discussions are always slower than
changing a single project. Thanks Dan/Joerg, that has been extremely
enlightening.

Dmitry / Jason, have you guys tried that?

It may be a bit rough on the edges for now, but if you try to use
that, with the TSAN settings from Adhemerval's patch, you may get what
you want for free. If anything, you'll probably need a tweak or two,
specific to Android, but that's expected.

cheers,
--renato

Wait, this change is not submitted yet, right? Or you mean mailing of
this change in bad shape?

I consider this change as work-in-progress where author is looking for
feedback on his ongoing progress. I guess the change description
should have been spelled this very explicitly.
This change definitely needs more work to be submitted, splitting into
smaller patches with a plan for submission order, refactoring,
cleanup, etc. That is no question.
As for the design, there is not much to design upfront, as you
discover most of the issue only when you hit them (I would not foresee
them). Also, most of the changes in the patch are just spot fixes
(e.g. if a struct definition is different on the new platform, you
just need to provide a correct definition), there is nothing to design
there.

Regarding the tls, whether we call it "emulation" or not, I believe we
can do it a very neat and laconic form.

Refactoring of tsan per-thread structures in preparation for the
"emulation" should go in independently. Agree.
Aarch64 support is already submitted to tsan as far as I see.

Wait, this change is not submitted yet, right? Or you mean mailing of
this change in bad shape?

Right.

Jason has submitted high quality patches before, so this is in no way
a reprimand to his submission. I am specifically worried about the
lack of higher level questions from the reviewers / code owners about
the general idea of the patch, not with the submission itself.

I consider this change as work-in-progress where author is looking for
feedback on his ongoing progress. I guess the change description
should have been spelled this very explicitly.

For that kind of work, we normally follow the guidelines:

1. Send an email to the list explaining the problem / design, have
some high level discussions
2. Send an "[RFC]" patch, explicitly identified as "not-for-commit"
quality, have some specific discussions
3. Send *another* patch, for commit, with loads of tests, comments,
FIXMEs, have final review

People normally ignore most spurious errors from RFC patches, but if
it's not marked as such, you'll get a lot of heat.

Also, it is good practice to have a few steps *before* sending a big
final patch:

1. Compile your code in different ways (debug/release/asserts) on at
least x86_64/ARM/AArch64 and run check-all with at least
LLVM+Clang+"your-component".
2. If you're targeting a different platform, make sure you run the
tests on them, too. (native/cross, doesn't matter much)
3. Make sure you add enough tests, or describe extensively what tests
you have made to be sure the patch is good
4. Review your diff cautiously, to make sure no left-overs remain:
  * no commented out code
  * no spurious whitespace changes
  * no wrong comments, or fixed FIXME's left

This change definitely needs more work to be submitted, splitting into
smaller patches with a plan for submission order, refactoring,
cleanup, etc. That is no question.

More importantly, this patch may be moot, because Joerg's work already
does that, and Dan's patch to Android makes the remaining step a
5-line patch.

*That* is the question. How comfortable are we to add work-arounds
before even questioning if the proper solution is possible /
available.

The submission is explicit: "This is a work-around, a better
implementation would be X". It is our job to evaluate the real cost of
X and see if it's worth putting a hack.

It seems the cost is low, or paid already, so the patch should have
been refused before any real review was made.

As for the design, there is not much to design upfront, as you
discover most of the issue only when you hit them (I would not foresee
them).

TLS emulation vs proper implementation? Emulation in TSAN or RT or
some other lib?

TSAN is hardly the place for run-time library calls, RT is a much
better place. Work arounds in LLVM is hardly a good replacement for a
proper implementation in the Android linker.

All those are design decisions. In the end, RT has already a TLS
emulation coming about and Android just got a TLS slot for TSAN. Good
design decisions stemmed from design discussions in the list.

Refactoring of tsan per-thread structures in preparation for the
"emulation" should go in independently. Agree.
Aarch64 support is already submitted to tsan as far as I see.

Not yet. This is still work in progress.

Adhemerval has implemented the functionality and we have tested on our
hardware. But there are issues with other hardware we don't have
access to, so progress has slowed down considerably. We're working on
some AArch64 buildbots to speed that up a lot, but as usual,
unforeseen complications hunt our sleep. :slight_smile:

Despite the experimental nature of our TSAN and Joerg's TLS emulation
implementations, I think it's enough to give Jason a working base to
start from scratch and use that to make TSAN work on Android (with
Dan's change).

cheers,
--renato

Sounds good to me.

Hi Renato and gang.

I guess I need to back up a few steps. I missed this thread before I took off last week.

I had been discussing the TSAN on android/aarch64 issues with the folks at thread-sanitizer@googlegroups.com for several months now. I had not explicitly been including llvm-dev (and you), and that was a mistake on my part. Palm-on-face? :frowning:

Secondly, the "early release" of the gigantic patch for tsan+android+aarch64 was due to not being familiar with how phabricator does things by default. Sorry for the noise. I expected it to add an audience only after I did an expliclit "make-public" step. I was wrong on that, but I should not have been. Bad tool! :slight_smile:

Now for what remains:

As far as I can tell, there are several big hurdles to this work landing on compiler-rt

Patch Size!
The patch in question is a bit smaller now thanks to some dovetailing with http://reviews.llvm.org/D11484 being merged.
I have already split up the patch into three chunks. I can do more as needed.

TLS!
I am willing to redo the TLS emulation part.
So there are several possible strategies for doing this -- this was explicltly mentioned in the commit message, but it seems to have gotten lost in phabricator somehow.

1. use pthread_get/setspecific() (more on this below!!)
2. use some other workaround that does not use pthread API
3. use the new TLS slot from Dan's recent push to android. https://android-review.googlesource.com/#/c/167420/
4. use the emutls work http://reviews.llvm.org/D12001

The first approach fails because pthread_key maintenance happens at thread/process destruction time, which means that
a NEW key was being generated for the thread undergoing death, resulting in a new ThreadState being allocated for the dying thread.
The second approach was what I selected (and what is on display at phabricator). I works, but isn't necessarily very pretty.
The third and fourth approaches are recent, and I have not yet had a chance to take a look.
At first blush, emutls seems the most problematic, as it uses intercepted calls like malloc().

TESTS!
Currently, about 2/3 tests for tsan fail/flake on android+aarch64.
They aren't xfailed, because some of them are flaky, and thus unexpectedly pass at times.
So I decided to mark them as unsupported on android+aarch64, and thus is "green".
These tests will need to triaged individually.

*If anyone has a simple hack to mark a set of tests as unsupported that does not touch individual test files, please let me know!*
*Whether I move them to a directory or prepend 2 lines is roughly the same size patch!*

ANDROID-BIONIC!

The required patches on bionic adds private symbols to select APIs in bionic so that TSAN can not intercept bionic-to-bionic calls. https://code.google.com/p/android/issues/detail?id=179564

Now, it turns out that some of the early workarounds specific to android (such as early addition of REAL versions of memset() and friends even before dlsym() is called()) becomes unnecessary with the bionic patches. This also implies that I should be able to go back to using the pthread API for TLS, because TSAN will no longer be intercepting calls at thread/process destruction time
I have not yet had a chance to test this out. Assuming this works, what is your opinion on the "proper" TLS emulation in TSAN?

a) always use pthread api?
b) leave the use of __thread keyword there and use some variant of arch+os conditionals to select the appropriate technique?

If it's (b), what is the appropriate technique?
i.e. use pthread API selectively,
or use some other arch+OS specific workaround?

Thanks
-Jason

p.s.
I handily dismiss the "nuclear" option, which is a complete reimplementation of tsan to avoid weak-symbol aliasing, and to do manual interception/patching of dynamically linked routines. Yeah, ignore this one, please.

I had been discussing the TSAN on android/aarch64 issues with the folks at thread-sanitizer@googlegroups.com for several months now. I had not explicitly been including llvm-dev (and you), and that was a mistake on my part. Palm-on-face? :frowning:

Yup, that would have saved a lot of emails... :slight_smile:

Patch Size!
The patch in question is a bit smaller now thanks to some dovetailing with http://reviews.llvm.org/D11484 being merged.
I have already split up the patch into three chunks. I can do more as needed.

Before I have any opinion on the current patch, we need to know what's
going to change.

1. use pthread_get/setspecific() (more on this below!!)

If the re-creation of the thread state disappears after the bionic
changes, I'd seriously investigate this one, as it seems the most
logic.

2. use some other workaround that does not use pthread API

I'm trying to avoid this at all costs. :slight_smile:

3. use the new TLS slot from Dan's recent push to android. https://android-review.googlesource.com/#/c/167420/

I asked somewhere else, but, will this give you a reasonably looking
thread state without resorting to pthread_get/specific calls? Will
this be a unique and non-redundant state for each thread?

4. use the emutls work http://reviews.llvm.org/D12001

I only suggested this because Joerg/Dan mentioned it. I haven't looked
at that patch in detail. From what I'm seeing, options 1 and 3 above
may very well suit us.

TESTS!
Currently, about 2/3 tests for tsan fail/flake on android+aarch64.
They aren't xfailed, because some of them are flaky, and thus unexpectedly pass at times.
So I decided to mark them as unsupported on android+aarch64, and thus is "green".

So, if they currently pass on the production buildbots, you can't mark
them as unstable because of your patch, as that will mean your patch
is making them unstable, and we can't just make dozens of tests
unstable for any amount of features.

If your patch introduces the instability, you'll have to fix them
before commit. That's why I suggested splitting it into many small
chunks, none of which will leave the tree in an unstable state,
because you'll have time to look at each instability individually, and
fix them before commit. The bots will also be happier, and every one
wins.

I have no idea how to split your patch so that this happens, but since
you know what the instability is, we could work together to test and
validate *before* the patches go live. I have a fast AArch64 machine
just for that purpose, so we can do this as fast as possible.

I have not yet had a chance to test this out. Assuming this works, what is your opinion on the "proper" TLS emulation in TSAN?

a) always use pthread api?
b) leave the use of __thread keyword there and use some variant of arch+os conditionals to select the appropriate technique?

I'm not familiar with pthread enough to give you a good answer. I'm
hoping Joerg/Dan could chime in. Having said that, I'd rather use
something that is arch/os agnostic, even if it slows it down a few
percent. If the slow down is too great, then we can discuss case by
case.

cheers,
-renato

I had been discussing the TSAN on android/aarch64 issues with the folks at thread-sanitizer@googlegroups.com for several months now. I had not explicitly been including llvm-dev (and you), and that was a mistake on my part. Palm-on-face? :frowning:

Yup, that would have saved a lot of emails... :slight_smile:

Patch Size!
The patch in question is a bit smaller now thanks to some dovetailing with http://reviews.llvm.org/D11484 being merged.
I have already split up the patch into three chunks. I can do more as needed.

Before I have any opinion on the current patch, we need to know what's
going to change.

1. use pthread_get/setspecific() (more on this below!!)

If the re-creation of the thread state disappears after the bionic
changes, I'd seriously investigate this one, as it seems the most
logic.

2. use some other workaround that does not use pthread API

I'm trying to avoid this at all costs. :slight_smile:

3. use the new TLS slot from Dan's recent push to android. https://android-review.googlesource.com/#/c/167420/

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

This looks like the best option to me.
(provided that we unify tsan thread-local state in ThreadState, so
that we change only cur_thread function)

From: Renato Golin [mailto:renato.golin@linaro.org]
> TESTS!

> > Currently, about 2/3 tests for tsan fail/flake on android+aarch64.
> > They aren't xfailed, because some of them are flaky, and thus
> unexpectedly pass at times.
> > So I decided to mark them as unsupported on android+aarch64, and thus
> is "green".
>
> So, if they currently pass on the production buildbots, you can't mark them
> as unstable because of your patch, as that will mean your patch is making
> them unstable, and we can't just make dozens of tests unstable for any
> amount of features.

[ jasonk --> ] The tests, AFAIK, currently "pass" for a somewhat squiggly definition of "pass".
I get occasional fails on a local x86_64 build, w/o my patches.

> If your patch introduces the instability, you'll have to fix them before

[ jasonk --> ] The instabilities were there to begin with, AFAIK. Some of the tests are run on a "deflake" regimen where they are repeated 10x, looking for a specific output.

> commit. That's why I suggested splitting it into many small chunks, none of
> which will leave the tree in an unstable state, because you'll have time to
> look at each instability individually, and fix them before commit. The bots
> will also be happier, and every one wins.

[ jasonk --> ] Sorry, but I don't understand what you want here. The tests I touched are marked unsupported (i.e. won't be run at all) ONLY for aarch64-linux-androideabi. Is there something else you want? It should not impact any other configuration of TSAN/compiler-rt as-is. I also guard TSAN with a CMAKE variable so that it only gets built for aarch64 iff COMPILER_RT_FORCE_TSAN_AARCH64=1

> cheers,
> -renato

Thanks!
-Jason

IMO having to disable 2/3 of the tests means the patch isn’t ready yet.

IMO having to disable 2/3 of the tests means the patch isn’t ready yet.

[ jasonk → ] OK, that’s a fair view. However, please keep in mind the follwing:

(*) The tests weren’t all stable to begin with.

(*) I am perfectly willing to triage the failures, but please keep in mind that any “threshold” that we cross from NOT testing TSAN on aarch64+android to actually testing on it is going to be a large hurdle.

(*) BIONIC is NOT glibc!

(*) Its pretty darn near impossible to have the perfect patch, which is [a] small [b] doesn’t break anything [c] adds major new features.

(*) What I have currently is: [a] not very big [b] doesn’t break anything VISIBLY [c] adds major new features

(*) The new feature (TSAN on aarch64-linux-androideabi) is currently turned OFF by default

(*) There are plenty of cases in prior history in llvm where a set of tests were turned off temporarily, especially when a major new platform is being introduced. I don’t think its unreasonable to have that as a starting point.

IMO having to disable 2/3 of the tests means the patch isn’t ready yet.

[ jasonk → ] OK, that’s a fair view. However, please keep in mind the follwing:

(*) The tests weren’t all stable to begin with.

(*) I am perfectly willing to triage the failures, but please keep in mind that any “threshold” that we cross from NOT testing TSAN on aarch64+android to actually testing on it is going to be a large hurdle.

(*) BIONIC is NOT glibc!

(*) Its pretty darn near impossible to have the perfect patch, which is [a] small [b] doesn’t break anything [c] adds major new features.

(*) What I have currently is: [a] not very big [b] doesn’t break anything VISIBLY [c] adds major new features

(*) The new feature (TSAN on aarch64-linux-androideabi) is currently turned OFF by default

(*) There are plenty of cases in prior history in llvm where a set of tests were turned off temporarily, especially when a major new platform is being introduced. I don’t think its unreasonable to have that as a starting point.

[ jasonk → ] I don’t know if this is getting through, but the tests I marked UNSUPPORTED, are ONLY unsupported on aarch64-linux-androideabi. They are run, as is, on other configurations of compiler-rt.