clang attributes to disable asan/tsan/msan

Hi,

Clang has two attributes to disable bug detection tools in a given function:

attribute((no_thread_safety_analysis)) disables clang’s static thread-safety analysis. (http://clang.llvm.org/docs/LanguageExtensions.html#thread-safety-annotation-checking)

attribute((no_address_safety_analysis)) disables AddressSanitizer (dynamic analysis)

http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-dynamic-analysis

Now we need two more attributes to disable

ThreadSanitizer (http://clang.llvm.org/docs/ThreadSanitizer.html)
and MemorySanitizer (http://clang.llvm.org/docs/MemorySanitizer.html)

For MemorySanitizer I propose attribute((no_uninitialized_checks))
Objections? Better naming suggestion?
Maybe attribute((no_memory_sanitizer))?
(We deliberately named no-asan attribute “no_address_safety_analysis” w/o mentioning asan
in the name to make this attribute usable for other tools, e.g. SAFECode. So,
we may not want to tie the no-msan attribute to msan)

For ThreadSanitizer the question is a bit trickier.
We can reuse attribute((no_thread_safety_analysis)) which makes sense
because it already means a similar thing.

OTOH, most of the current uses of no_thread_safety_analysis I know about
are there because of the limitations of clang’s static analysis.
And we don’t need to disable ThreadSanitizer for the majority of those cases,
which means we may better use another attribute to disable ThreadSanitizer checking.
How to name it?
attribute((no_thread_sanitizer))?

attribute((no_dynamic_thread_safety_analysis))?

attribute((thread_sanitizer_ignore_plain_memory_accesses))?

The patch discussion: http://llvm-reviews.chandlerc.com/D390

Thanks,

–kcc

Hi,

Clang has two attributes to disable bug detection tools in a given function:

attribute((no_thread_safety_analysis)) disables clang’s static thread-safety analysis. (http://clang.llvm.org/docs/LanguageExtensions.html#thread-safety-annotation-checking)

attribute((no_address_safety_analysis)) disables AddressSanitizer (dynamic analysis)

http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-dynamic-analysis

Now we need two more attributes to disable

ThreadSanitizer (http://clang.llvm.org/docs/ThreadSanitizer.html)
and MemorySanitizer (http://clang.llvm.org/docs/MemorySanitizer.html)

For MemorySanitizer I propose attribute((no_uninitialized_checks))
Objections? Better naming suggestion?
Maybe attribute((no_memory_sanitizer))?
(We deliberately named no-asan attribute “no_address_safety_analysis” w/o mentioning asan
in the name to make this attribute usable for other tools, e.g. SAFECode. So,
we may not want to tie the no-msan attribute to msan)

For ThreadSanitizer the question is a bit trickier.
We can reuse attribute((no_thread_safety_analysis)) which makes sense
because it already means a similar thing.

OTOH, most of the current uses of no_thread_safety_analysis I know about
are there because of the limitations of clang’s static analysis.
And we don’t need to disable ThreadSanitizer for the majority of those cases,
which means we may better use another attribute to disable ThreadSanitizer checking.
How to name it?
attribute((no_thread_sanitizer))?

attribute((no_dynamic_thread_safety_analysis))?

attribute((thread_sanitizer_ignore_plain_memory_accesses))

Just a though, but instead of creating new attribute for each analyze, wouldn’t it be possible to define a single parameterized attribute which would take parameters similar to the one passed to the --sanitize option ?

attribute((safety_analysis(“memory”, “thread”, …)))

– Jean-Daniel

I like Jean-Daniel's suggestion. One possible improvement: Make two such attributes. One could be called static_safety_analysis and the other dynamic_safety_analysis. The result might look like this:

__attribute__((static_safety_analysis("memory",…), dynamic_safety_analysis("thread", …)))

Even better if the arguments included both foo and nofoo versions (e.g., "memory" or "no_memory").

Someone else should suggest more felicitous spellings for both the attributes and the arguments.

Dean

[SNIP]

It would be cleaner to use plain identifiers instead of string literals.

Dmitri

It seems to me like it is going to be simpler and more transparent to
have the attribute explicitly mention the sanitizer, e.g.`
__attribute__((no_sanitize("memory")))`; then the user knows exactly
what they are getting (since the name corresponds with the command
line option). If other tools want to use those attributes it's not
hard to look for them.

It also isn't entirely clear to me that the attribute would have
exactly the same semantics for the sanitizers and some other tool.
AFAIK the term "address safety" has no independent meaning and
basically means "the things that asan checks", so the term "address"
in `__attribute__((no_address_safety_analysis))` is already asan
specific in that regard, and it would be clearer to just say
`no_sanitize("memory")`.

If we really want the attributes to be tool-agnostic, then they should
describe what the function does that is naughty, e.g.
`__attribute__((reads_unintialized_memory_on_purpose))`, and let the
tool interpret that information and behave appropriately.

-- Sean Silva

This summarizes my feelings exactly.

I think that even if we grow a set of attributes that describe the semantic
oddity of a function (such as reading uninitialized memory, etc), we would
still want an escape hatch to just turn off the sanitizer. And when we do
that, we really do want to use the exact same terminology that we use in
the flags.

I don't think it matters whether its one attribute or N attributes as long
as we get some naming consistency. I would propose (for simplicity of
implementation mostly):

__attribute__((no_sanitize_address))
__attribute__((no_sanitize_memory))
__attribute__((no_sanitize_thread))
__attribute__((no_sanitize_undefined))
...

This pattern should be easy to remember and understand, and removes a lot
of ambiguity of which attribute goes with which sanitizer. It also makes it
very clear that these are attributes pertaining to the dynamic analysis
toolset, not to any static analysis toolset.

Of course, I think we should support the existing attributes for backwards
compatibility, at least for several releases.

> Hi,
>
> Clang has two attributes to disable bug detection tools in a given
function:
>
> __attribute__((no_thread_safety_analysis)) disables clang's *static*
> thread-safety analysis.
> (
http://clang.llvm.org/docs/LanguageExtensions.html#thread-safety-annotation-checking
)
>
> __attribute__((no_address_safety_analysis)) disables AddressSanitizer
> (*dynamic* analysis)
>
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-dynamic-analysis
>
> Now we need two more attributes to disable
> ThreadSanitizer (http://clang.llvm.org/docs/ThreadSanitizer.html)
> and MemorySanitizer (http://clang.llvm.org/docs/MemorySanitizer.html)
>
> For MemorySanitizer I propose __attribute__((no_uninitialized_checks))
> Objections? Better naming suggestion?
> Maybe __attribute__((no_memory_sanitizer))?
> (We deliberately named no-asan attribute "no_address_safety_analysis"
w/o
> mentioning asan
> in the name to make this attribute usable for other tools, e.g.
SAFECode.
> So,
> we may not want to tie the no-msan attribute to msan)

It seems to me like it is going to be simpler and more transparent to
have the attribute explicitly mention the sanitizer, e.g.`
__attribute__((no_sanitize("memory")))`; then the user knows exactly
what they are getting (since the name corresponds with the command
line option). If other tools want to use those attributes it's not
hard to look for them.

It also isn't entirely clear to me that the attribute would have
exactly the same semantics for the sanitizers and some other tool.
AFAIK the term "address safety" has no independent meaning and
basically means "the things that asan checks", so the term "address"
in `__attribute__((no_address_safety_analysis))` is already asan
specific in that regard, and it would be clearer to just say
`no_sanitize("memory")`.

If we really want the attributes to be tool-agnostic, then they should
describe what the function does that is naughty, e.g.
`__attribute__((reads_unintialized_memory_on_purpose))`, and let the
tool interpret that information and behave appropriately.

This summarizes my feelings exactly.

I think that even if we grow a set of attributes that describe the
semantic oddity of a function (such as reading uninitialized memory, etc),
we would still want an escape hatch to just turn off the sanitizer. And
when we do that, we really do want to use the exact same terminology that
we use in the flags.

I don't think it matters whether its one attribute or N attributes as long
as we get some naming consistency. I would propose (for simplicity of
implementation mostly):

__attribute__((no_sanitize_address))
__attribute__((no_sanitize_memory))
__attribute__((no_sanitize_thread))
__attribute__((no_sanitize_undefined))

I like the simplicity (also because we will have to implement these
attributes in gcc too).

How about this?
__attribute__((no_sanitize_address)) is a synonym for
__attribute__((no_address_safety_analysis)), i.e. disables AddressSanitizer
checking.
(or maybe we should just leave no_address_safety_analysis?)

__attribute__((no_sanitize_memory)) disables MemorySanitizer checking, but
still keeps the instrumentation required to avoid false positives.

__attribute__((no_sanitize_thread)) disables ThreadSanitizer checking for
plain (non-atomic) loads/stores, but still keeps the instrumentation
required to avoid false positives.

--kcc

> Hi,
>
> Clang has two attributes to disable bug detection tools in a given
function:
>
> __attribute__((no_thread_safety_analysis)) disables clang's *static*
> thread-safety analysis.
> (
http://clang.llvm.org/docs/LanguageExtensions.html#thread-safety-annotation-checking
)
>
> __attribute__((no_address_safety_analysis)) disables AddressSanitizer
> (*dynamic* analysis)
>
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-dynamic-analysis
>
> Now we need two more attributes to disable
> ThreadSanitizer (http://clang.llvm.org/docs/ThreadSanitizer.html)
> and MemorySanitizer (http://clang.llvm.org/docs/MemorySanitizer.html)
>
> For MemorySanitizer I propose __attribute__((no_uninitialized_checks))
> Objections? Better naming suggestion?
> Maybe __attribute__((no_memory_sanitizer))?
> (We deliberately named no-asan attribute "no_address_safety_analysis"
w/o
> mentioning asan
> in the name to make this attribute usable for other tools, e.g.
SAFECode.
> So,
> we may not want to tie the no-msan attribute to msan)

It seems to me like it is going to be simpler and more transparent to
have the attribute explicitly mention the sanitizer, e.g.`
__attribute__((no_sanitize("memory")))`; then the user knows exactly
what they are getting (since the name corresponds with the command
line option). If other tools want to use those attributes it's not
hard to look for them.

It also isn't entirely clear to me that the attribute would have
exactly the same semantics for the sanitizers and some other tool.
AFAIK the term "address safety" has no independent meaning and
basically means "the things that asan checks", so the term "address"
in `__attribute__((no_address_safety_analysis))` is already asan
specific in that regard, and it would be clearer to just say
`no_sanitize("memory")`.

If we really want the attributes to be tool-agnostic, then they should
describe what the function does that is naughty, e.g.
`__attribute__((reads_unintialized_memory_on_purpose))`, and let the
tool interpret that information and behave appropriately.

This summarizes my feelings exactly.

I think that even if we grow a set of attributes that describe the
semantic oddity of a function (such as reading uninitialized memory, etc),
we would still want an escape hatch to just turn off the sanitizer. And
when we do that, we really do want to use the exact same terminology that
we use in the flags.

I don't think it matters whether its one attribute or N attributes as
long as we get some naming consistency. I would propose (for simplicity of
implementation mostly):

__attribute__((no_sanitize_address))
__attribute__((no_sanitize_memory))
__attribute__((no_sanitize_thread))
__attribute__((no_sanitize_undefined))

I like the simplicity (also because we will have to implement these
attributes in gcc too).

How about this?
__attribute__((no_sanitize_address)) is a synonym for
__attribute__((no_address_safety_analysis)), i.e. disables AddressSanitizer
checking.
(or maybe we should just leave no_address_safety_analysis?)

__attribute__((no_sanitize_memory)) disables MemorySanitizer checking, but
still keeps the instrumentation required to avoid false positives.

__attribute__((no_sanitize_thread)) disables ThreadSanitizer checking for
plain (non-atomic) loads/stores, but still keeps the instrumentation
required to avoid false positives.

I like it. I would add all three so that we can update code to be
consistent.

Keep an eye out for a use case for an all-inclusive 'no_sanitize' that
turns everything off.

> Hi,
>
> Clang has two attributes to disable bug detection tools in a given
function:
>
> __attribute__((no_thread_safety_analysis)) disables clang's *static*
> thread-safety analysis.
> (
http://clang.llvm.org/docs/LanguageExtensions.html#thread-safety-annotation-checking
)
>
> __attribute__((no_address_safety_analysis)) disables AddressSanitizer
> (*dynamic* analysis)
>
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-dynamic-analysis
>
> Now we need two more attributes to disable
> ThreadSanitizer (http://clang.llvm.org/docs/ThreadSanitizer.html)
> and MemorySanitizer (http://clang.llvm.org/docs/MemorySanitizer.html)
>
> For MemorySanitizer I propose __attribute__((no_uninitialized_checks))
> Objections? Better naming suggestion?
> Maybe __attribute__((no_memory_sanitizer))?
> (We deliberately named no-asan attribute "no_address_safety_analysis"
w/o
> mentioning asan
> in the name to make this attribute usable for other tools, e.g.
SAFECode.
> So,
> we may not want to tie the no-msan attribute to msan)

It seems to me like it is going to be simpler and more transparent to
have the attribute explicitly mention the sanitizer, e.g.`
__attribute__((no_sanitize("memory")))`; then the user knows exactly
what they are getting (since the name corresponds with the command
line option). If other tools want to use those attributes it's not
hard to look for them.

It also isn't entirely clear to me that the attribute would have
exactly the same semantics for the sanitizers and some other tool.
AFAIK the term "address safety" has no independent meaning and
basically means "the things that asan checks", so the term "address"
in `__attribute__((no_address_safety_analysis))` is already asan
specific in that regard, and it would be clearer to just say
`no_sanitize("memory")`.

If we really want the attributes to be tool-agnostic, then they should
describe what the function does that is naughty, e.g.
`__attribute__((reads_unintialized_memory_on_purpose))`, and let the
tool interpret that information and behave appropriately.

This summarizes my feelings exactly.

I think that even if we grow a set of attributes that describe the
semantic oddity of a function (such as reading uninitialized memory, etc),
we would still want an escape hatch to just turn off the sanitizer. And
when we do that, we really do want to use the exact same terminology that
we use in the flags.

I don't think it matters whether its one attribute or N attributes as
long as we get some naming consistency. I would propose (for simplicity of
implementation mostly):

__attribute__((no_sanitize_address))
__attribute__((no_sanitize_memory))
__attribute__((no_sanitize_thread))
__attribute__((no_sanitize_undefined))

I like the simplicity (also because we will have to implement these
attributes in gcc too).

How about this?
__attribute__((no_sanitize_address)) is a synonym for
__attribute__((no_address_safety_analysis)), i.e. disables AddressSanitizer
checking.
(or maybe we should just leave no_address_safety_analysis?)

__attribute__((no_sanitize_memory)) disables MemorySanitizer checking,
but still keeps the instrumentation required to avoid false positives.

__attribute__((no_sanitize_thread)) disables ThreadSanitizer checking for
plain (non-atomic) loads/stores, but still keeps the instrumentation
required to avoid false positives.

I like it. I would add all three so that we can update code to be
consistent.

Thanks, I'll send a patch later unless I hear objections to this plan.

Keep an eye out for a use case for an all-inclusive 'no_sanitize' that
turns everything off.

That's controversial.
On one hand, I've seen a couple of cases where we need both
no_sanitize_address and no_sanitize_memory (mostly, custom stack unwinders
that read random bytes on stack).
I can also imagine cases where no_sanitize_address and no_sanitize_thread
could be used together (e.g. deliberate out of bounds reads within aligned
8-byte word).
But we risk that no_sanitize will be abused to turn off all sanitizers when
only two should really be disabled.

I'd rather not add this for now.

--kcc

I don’t have a huge stake in this, but it kind of makes sense to me to have a single attribute that takes an argument, or list of arguments.

attribute((no_sanitize(address)))
attribute((no_sanitize(memory,thread)))

This way, if more sanitizers are added, we don’t need to add new attributes. And since these are disabling checks, they’re automatically backwards-compatible—that is, Clang could accept

attribute((no_sanitize(address,insanity)))

without warning that it doesn’t know what the insanity sanitizer is. (This is probably more important on GCC, which AFAIK still doesn’t have an equivalent of has_attribute.)

Just my two cents,
Jordan

I don’t have a huge stake in this, but it kind of makes sense to me to have a single attribute that takes an argument, or list of arguments.

attribute((no_sanitize(address)))
attribute((no_sanitize(memory,thread)))

This way, if more sanitizers are added, we don’t need to add new attributes. And since these are disabling checks, they’re automatically backwards-compatible—that is, Clang could accept

attribute((no_sanitize(address,insanity)))

without warning that it doesn’t know what the insanity sanitizer is. (This is probably more important on GCC, which AFAIK still doesn’t have an equivalent of has_attribute.)

Just my two cents,
Jordan

I am always wary about the “no-warning”. What happens if I write no_sanitize(adress) ? (In French, there is a single “d” in adresse so it’s a common mistake) I would rather the compiler warned me that discover after the whole compile-link-test cycle that I did not nail it.

If people want to disable the warning, let them.

– Matthieu

I would like for us to aim towards having some harmony between this and the sanitizer blacklist file. As a strawman, it would be nice if we could say that

attribute((no_sanitize(“foo”))) void f();

has the same effect as adding

foo:_Z1fv

to the blacklist file. (This suggests that the attribute should be allowed on globals for the global-init sanitizer, etc.)

> Hi,
>
> Clang has two attributes to disable bug detection tools in a given
function:
>
> __attribute__((no_thread_safety_analysis)) disables clang's *static*
> thread-safety analysis.
> (
http://clang.llvm.org/docs/LanguageExtensions.html#thread-safety-annotation-checking
)
>
> __attribute__((no_address_safety_analysis)) disables
AddressSanitizer
> (*dynamic* analysis)
>
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-dynamic-analysis
>
> Now we need two more attributes to disable
> ThreadSanitizer (http://clang.llvm.org/docs/ThreadSanitizer.html)
> and MemorySanitizer (
http://clang.llvm.org/docs/MemorySanitizer.html)
>
> For MemorySanitizer I propose
__attribute__((no_uninitialized_checks))
> Objections? Better naming suggestion?
> Maybe __attribute__((no_memory_sanitizer))?
> (We deliberately named no-asan attribute
"no_address_safety_analysis" w/o
> mentioning asan
> in the name to make this attribute usable for other tools, e.g.
SAFECode.
> So,
> we may not want to tie the no-msan attribute to msan)

It seems to me like it is going to be simpler and more transparent to
have the attribute explicitly mention the sanitizer, e.g.`
__attribute__((no_sanitize("memory")))`; then the user knows exactly
what they are getting (since the name corresponds with the command
line option). If other tools want to use those attributes it's not
hard to look for them.

It also isn't entirely clear to me that the attribute would have
exactly the same semantics for the sanitizers and some other tool.
AFAIK the term "address safety" has no independent meaning and
basically means "the things that asan checks", so the term "address"
in `__attribute__((no_address_safety_analysis))` is already asan
specific in that regard, and it would be clearer to just say
`no_sanitize("memory")`.

If we really want the attributes to be tool-agnostic, then they should
describe what the function does that is naughty, e.g.
`__attribute__((reads_unintialized_memory_on_purpose))`, and let the
tool interpret that information and behave appropriately.

This summarizes my feelings exactly.

I think that even if we grow a set of attributes that describe the
semantic oddity of a function (such as reading uninitialized memory, etc),
we would still want an escape hatch to just turn off the sanitizer. And
when we do that, we really do want to use the exact same terminology that
we use in the flags.

I don't think it matters whether its one attribute or N attributes as
long as we get some naming consistency. I would propose (for simplicity of
implementation mostly):

__attribute__((no_sanitize_address))
__attribute__((no_sanitize_memory))
__attribute__((no_sanitize_thread))
__attribute__((no_sanitize_undefined))

I like the simplicity (also because we will have to implement these
attributes in gcc too).

How about this?
__attribute__((no_sanitize_address)) is a synonym for
__attribute__((no_address_safety_analysis)), i.e. disables AddressSanitizer
checking.
(or maybe we should just leave no_address_safety_analysis?)

__attribute__((no_sanitize_memory)) disables MemorySanitizer checking,
but still keeps the instrumentation required to avoid false positives.

__attribute__((no_sanitize_thread)) disables ThreadSanitizer checking
for plain (non-atomic) loads/stores, but still keeps the instrumentation
required to avoid false positives.

I like it. I would add all three so that we can update code to be
consistent.

Thanks, I'll send a patch later unless I hear objections to this plan.

I would like for us to aim towards having some harmony between this and
the sanitizer blacklist file. As a strawman, it would be nice if we could
say that

__attribute__((no_sanitize("foo"))) void f();

has the same effect as adding

foo:_Z1fv

"foo" here is one of "fun", "src", "global" or "global-init".
So, what would " __attribute__((no_sanitize("glob"))) void f(); " mean?
Or even " __attribute__((no_sanitize("fun"))) void f(); " ?

--kcc

I see, the blacklist file doesn’t have the ability to control individual sanitizers other than the global-init sanitization. Never mind then. We can discuss what would work in the blacklist file if we come to add finer-grained controls to it.

Do we also want to rename the LLVM attributes?

[old] address_safety => sanitize_address
[recent] thread_safety => sanitize_thread
[recent] uninitialized_checks => sanitize_memory

–kcc

Do we also want to rename the LLVM attributes?

As long as you auto-upgrade the one which got released (address_safety),
that seems fine to me.

Err, maybe just rename the last two (not released)?
no_address_safety was supposed to be used for other tools too, anyway.

Err, maybe just rename the last two (not released)?
no_address_safety was supposed to be used for other tools too, anyway.

I don't feel particularly strongly about this, but personally I would
rename them all rather than mess with trying to support some other tool
that isn't part of the primary llvm tree with this attribute.

Especially with the advent of textual attributes, it becomes much easier
for these types of projects to set up their own attribute that can have
whatever semantics or support story they want.

But really, whatever you're comfortable with at this stage. We can always
change our minds later about attributes in the IR as long as we provide
continued support for old bitcode files.

Is an auto-upgrade necessary? We haven't released a clang/llvm with those attributes yet....

-bw

I think 3.2 had the ASan one.