[RFC] Parsing runtime flags in sanitizers (ASan/LSan/UBSan)

Hi all,

TL;DR

  1. We should change the way we parse common runtime flags in sanitizers.
  2. We should make ASan aware of the tools it can be combined with (LSan and UBSan).
  3. We may have to restrict the tools UBSan can be combined with (currently to ASan) (see [1])

Currently we have two kinds of sanitizer runtime flags: tool-specific flags and “common flags”, defined in sanitizer_common library and shared across all the sanitizers. Many of these common flags are used early during tool initialization (for example, we use “suppressions” flag to create SuppressionContext singleton object). That’s the reason why we parse both tool-specific flags and common flags as early as possible at program startup, before running the rest of initialization code. It all works fine until we have a single sanitizer - e.g. for TSan or MSan.

The situation gets crazy when we combine multiple sanitizers in a single process, for instance use ASan+LSan+UBSan (the default use case in some setups). Each tool has its own defaults for common flag values, and each of ASAN_OPTIONS, LSAN_OPTIONS and UBSAN_OPTIONS can define both tool-specific and common flags. These environment variables are parsed at different time, sometimes in undefined order. We can easily end up in situation where ASan initializes some parts of sanitizer_common assuming certain values of common runtime flags, but then these flags are overwritten by LSAN_OPTIONS. All this is very complicated and fragile.

I propose to implement the following:

ASan flag parsing:

  1. Detect all sanitizers ASan is combined with (We know that we have LSan if “CAN_SANITIZE_LEAKS” is true. We can detect if we have UBSan in the process somehow.)
  2. Setup defaults for ASan-specific flags and override them from ASAN_OPTIONS.
  3. Setup defaults for common flags and override them from all of ASAN_OPTIONS, LSAN_OPTIONS and UBSAN_OPTIONS (if corresponding sanitizers are enabled for the process).
  4. Proceed with initialization, and call LSan/UBSan initializers when appropriate.

LSan/UBSan flag parsing:

  1. Learn if LSan/UBSan are combined with ASan (the “main” tool) or run as standalone tool. This is already done for LSan and can be done for UBSan by slightly modifying the way we structure runtimes.
  2. Setup defaults for tool-specific flags and override them from LSAN/UBSAN_OPTIONS.
  3. If the tools run in standalone mode, setup the defaults for common flags and override them from LSAN/UBSAN_OPTIONS. If the tools are combined with “main” tool, do nothing.
  4. Proceed with initialization.

[1] Conjecture:

  1. We will have to add the hook to initialize UBSan that we would invoke from ASan (much like we do for LSan). It means that we would have to restrict the set of sanitizers that can be combined with UBSan (allowing it for another sanitizers would be very easy, but manual process). The alternative would be to make UBSan setup no-op except for parsing the flags, so that UBSan initializer could be safely called before main tool initializer.

Bonus:

  1. We will be able to significantly improve runtime flag parsing diagnostic. For instance, we would be able to report conflicting flag definitions (e.g. when one provides ASAN_OPTIONS=symbolize=0 LSAN_OPTIONS=symbolize=1).

Comments/objections?

Hope you're assuming there's always a single copy of common_flags in
the process.
This isn't the case for e.g. ASan+UBSan on Mac, but that's a broken setup.

What if we let the tools protect specific flags (by adding a bool to
each flag) once they set their values (when setting the default
values, parsing xSAN_OPTIONS, etc)
Generally, I think we need to protect each flag when anyone's trying
to read it (so we'll need getters and setters for the flags), unless
the code opts out of protecting a particular flag (e.g. if we want to
allow it to be changed dynamically).
This way we can prevent the flags from being overridden with conflicting values.

Irrespective of whether we protect the flags or not, I think we need
to distinguish between the flags that can be set once and flags that
may change their values (e.g. on Android).

Hope you're assuming there's always a single copy of common_flags in
the process.
This isn't the case for e.g. ASan+UBSan on Mac, but that's a broken setup.

What if we let the tools protect specific flags (by adding a bool to
each flag) once they set their values (when setting the default
values, parsing xSAN_OPTIONS, etc)
Generally, I think we need to protect each flag when anyone's trying
to read it (so we'll need getters and setters for the flags), unless
the code opts out of protecting a particular flag (e.g. if we want to
allow it to be changed dynamically).
This way we can prevent the flags from being overridden with conflicting
values.

Irrespective of whether we protect the flags or not, I think we need
to distinguish between the flags that can be set once and flags that
may change their values (e.g. on Android).

Ideally, things that may change their values should not be flags at all.

Hope you're assuming there's always a single copy of common_flags in
the process.

Yes.

This isn't the case for e.g. ASan+UBSan on Mac, but that's a broken setup.

Yes. I assume, we want to have dynamic UBSan runtime on Mac, don't we?

What if we let the tools protect specific flags (by adding a bool to
each flag) once they set their values (when setting the default
values, parsing xSAN_OPTIONS, etc)
Generally, I think we need to protect each flag when anyone's trying
to read it (so we'll need getters and setters for the flags), unless
the code opts out of protecting a particular flag (e.g. if we want to
allow it to be changed dynamically).
This way we can prevent the flags from being overridden with conflicting
values.

What is the purpose of making the flag values immutable? If you want to
protect against the case
where we accidentally reset the flag value, you can just make
common_flags() accessor return a const
reference. Then you'd only be able to modify the flag values explicitly via
ParseCommonFlagsFromString()
and similar functions. This would certainly be a positive change.

IMO it would be *very* hard to distinguish between cases when we
override flags on purpose (e.g. override defaults from env var) and when we
override them accidentally during
multiple tool initializatier calls (the case which I describe).

Irrespective of whether we protect the flags or not, I think we need
to distinguish between the flags that can be set once and flags that
may change their values (e.g. on Android).

Anyway, this suggestion is kind of orthogonal to mine. I was asking if
you're ok with having "main" tool (ASan)
steal responsibility of initializing runtime flags from "plug-in" tools
(LSan, UBSan).

Hope you're assuming there's always a single copy of common_flags in
the process.
This isn't the case for e.g. ASan+UBSan on Mac, but that's a broken setup.

What if we let the tools protect specific flags (by adding a bool to
each flag) once they set their values (when setting the default
values, parsing xSAN_OPTIONS, etc)
Generally, I think we need to protect each flag when anyone's trying
to read it (so we'll need getters and setters for the flags), unless
the code opts out of protecting a particular flag (e.g. if we want to
allow it to be changed dynamically).
This way we can prevent the flags from being overridden with conflicting
values.

Irrespective of whether we protect the flags or not, I think we need
to distinguish between the flags that can be set once and flags that
may change their values (e.g. on Android).

Ideally, things that may change their values should not be flags at all.

I think this is a good idea. We probably should enforce that flags
(especially common flags)
are only allowed to be modified when they are parsed early during tool
initialization by these methods:
1) SetCommonFlagsDefaults()
2) <some function/marco that would allow the tool to override some default
values).
3) ParseCommonFlagsFromString()

After that the flags can only be accessed via common_flags(), that would
return const pointer.
This would require some changes to runtime code, which seems like the Right
Thing. E.g.
"allocator_may_return_null" flag should really be a member of Allocator
object - we already
modify this flag in the unit tests and during ASan activation (which can
introduce a data race).

If you agree that we should follow this path now, I will fix that. Also, do
you have more
comments regarding common flags initialization in ASan/LSan/UBSan bundle I
propose?

Hope you're assuming there's always a single copy of common_flags in
the process.
This isn't the case for e.g. ASan+UBSan on Mac, but that's a broken
setup.

What if we let the tools protect specific flags (by adding a bool to
each flag) once they set their values (when setting the default
values, parsing xSAN_OPTIONS, etc)
Generally, I think we need to protect each flag when anyone's trying
to read it (so we'll need getters and setters for the flags), unless
the code opts out of protecting a particular flag (e.g. if we want to
allow it to be changed dynamically).
This way we can prevent the flags from being overridden with conflicting
values.

Irrespective of whether we protect the flags or not, I think we need
to distinguish between the flags that can be set once and flags that
may change their values (e.g. on Android).

Ideally, things that may change their values should not be flags at all.

I think this is a good idea. We probably should enforce that flags
(especially common flags)
are only allowed to be modified when they are parsed early during tool
initialization by these methods:
1) SetCommonFlagsDefaults()
2) <some function/marco that would allow the tool to override some default
values).
3) ParseCommonFlagsFromString()

After that the flags can only be accessed via common_flags(), that would
return const pointer.
This would require some changes to runtime code, which seems like the
Right Thing. E.g.
"allocator_may_return_null" flag should really be a member of Allocator
object - we already
modify this flag in the unit tests and during ASan activation (which can
introduce a data race).

If you agree that we should follow this path now, I will fix that.

Good!

Also, do you have more
comments regarding common flags initialization in ASan/LSan/UBSan bundle I
propose?

Not much. But I'd love to see the code, preferably in small incremental
chunks