RFC: Attribute to suppress coverage mapping for functions

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

  * llvm_unreachable_internal.

  * The various dump() functions for llvm Values, AST nodes, etc.

  * Methods in a base class which just call llvm_unreachable.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage for
them.

I'd like to be able to write this in C:

  void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

  void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

Here are some alternatives and why I think they're not as good:

  * Define a preprocessor macro when -fcoverage-mapping is enabled.

    Conditionally compiling code based on whether code coverage is enabled
    sounds scary. We shouldn't make it easy (or possible?) to change the
    meaning of a program by enabling coverage.

  * Pass a function blacklist to llvm-cov.

    The blacklist would have to live separately from the source code, and may
    get out of sync. We also would go through the trouble of emitting coverage
    mappings for functions even though they aren't needed.

  * Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

    We'd need some extra diagnostics to catch abuse of the pragmas. It also
    requires more typing in the common case (disabling coverage at the function
    level).

  * Look at the function CFG. If all paths through the function can be shown to
    reach __builtin_unreachable(), don't create coverage mappings for the
    function.

    I'm worried this might be complicated and inflexible. It wouldn't let us
    mark dump() functions as uncovered.

Wdyt?

vedant

[1] http://clang.llvm.org/docs/SourceBasedCodeCoverage.html

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

* llvm_unreachable_internal.

Shouldn’t LLVM optimize out the instrumentation for it anyway?

* The various dump() functions for llvm Values, AST nodes, etc.

* Methods in a base class which just call llvm_unreachable.

Same.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage for
them.

I'd like to be able to write this in C:

void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

Here are some alternatives and why I think they're not as good:

* Define a preprocessor macro when -fcoverage-mapping is enabled.

   Conditionally compiling code based on whether code coverage is enabled
   sounds scary. We shouldn't make it easy (or possible?) to change the
   meaning of a program by enabling coverage.

* Pass a function blacklist to llvm-cov.

   The blacklist would have to live separately from the source code, and may
   get out of sync. We also would go through the trouble of emitting coverage
   mappings for functions even though they aren't needed.

* Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

   We'd need some extra diagnostics to catch abuse of the pragmas. It also
   requires more typing in the common case (disabling coverage at the function
   level).

* Look at the function CFG. If all paths through the function can be shown to
   reach __builtin_unreachable(), don't create coverage mappings for the
   function.

Oh I see: even if LLVM optimizes out the counters, you still have ranges/location that will be marked as uncovered in the report, right?

   I'm worried this might be complicated and inflexible. It wouldn't let us
   mark dump() functions as uncovered.

Wdyt?

I agree it is useful to be able to exclude part of a file to make sure the percentage of coverage is more accurate!

Thanks!

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

* llvm_unreachable_internal.

Shouldn’t LLVM optimize out the instrumentation for it anyway?

* The various dump() functions for llvm Values, AST nodes, etc.

* Methods in a base class which just call llvm_unreachable.

Same.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage for
them.

I'd like to be able to write this in C:

void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

Here are some alternatives and why I think they're not as good:

* Define a preprocessor macro when -fcoverage-mapping is enabled.

  Conditionally compiling code based on whether code coverage is enabled
  sounds scary. We shouldn't make it easy (or possible?) to change the
  meaning of a program by enabling coverage.

* Pass a function blacklist to llvm-cov.

  The blacklist would have to live separately from the source code, and may
  get out of sync. We also would go through the trouble of emitting coverage
  mappings for functions even though they aren't needed.

* Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

  We'd need some extra diagnostics to catch abuse of the pragmas. It also
  requires more typing in the common case (disabling coverage at the function
  level).

* Look at the function CFG. If all paths through the function can be shown to
  reach __builtin_unreachable(), don't create coverage mappings for the
  function.

Oh I see: even if LLVM optimizes out the counters, you still have ranges/location that will be marked as uncovered in the report, right?

Yes.

  I'm worried this might be complicated and inflexible. It wouldn't let us
  mark dump() functions as uncovered.

Wdyt?

I agree it is useful to be able to exclude part of a file to make sure the percentage of coverage is more accurate!

Right, this is the main motivation.

thanks,
vedant

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users
of
the new attribute:

  * llvm_unreachable_internal.

  * The various dump() functions for llvm Values, AST nodes, etc.

  * Methods in a base class which just call llvm_unreachable.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage
for
them.

I'd like to be able to write this in C:

  void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

  void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

Here are some alternatives and why I think they're not as good:

  * Define a preprocessor macro when -fcoverage-mapping is enabled.

    Conditionally compiling code based on whether code coverage is enabled
    sounds scary. We shouldn't make it easy (or possible?) to change the
    meaning of a program by enabling coverage.

  * Pass a function blacklist to llvm-cov.

    The blacklist would have to live separately from the source code, and
may
    get out of sync. We also would go through the trouble of emitting
coverage
    mappings for functions even though they aren't needed.

  * Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

    We'd need some extra diagnostics to catch abuse of the pragmas. It also
    requires more typing in the common case (disabling coverage at the
function
    level).

  * Look at the function CFG. If all paths through the function can be
shown to
    reach __builtin_unreachable(), don't create coverage mappings for the
    function.

    I'm worried this might be complicated and inflexible. It wouldn't let
us
    mark dump() functions as uncovered.

Wdyt?

This definitely sounds like a good idea. I imagine that it will be fairly
easy to disable coverage
for regions which contains just one statement that references a declaration
with
a “nocoverage” attribute. Do you think it will be difficult to handle cases
where
regions have normal statements that are covered mixed with statements which
reference
declarations with a “nocoverage” attribute? Moreover, how will you handle
the case of a
sub-expression referencing a “nocoverage” declaration?

By the way, do you think that we should have something similar for macros?
We obviously
wouldn’t be able to use the attribute, but it seems like it would be useful
to suppress coverage
for certain macros.

Alex

I imagine that it will be fairly easy to disable coverage
for regions which contains just one statement that references a declaration with
a “nocoverage” attribute.

The way I'd imagine this working is with a new VisitCallExpr. If we grab the
callee decl and figure out that it's marked nocoverage, we could (1) terminate
the current region before the location of the call, (2) skip past the CallExpr,
and (3) and push a new region with a counter that references the old region.

I can see how this would be useful. If "foo" is marked "nocoverage", and "bar"
just calls "foo", then maybe the body of "bar" doesn't really need coverage
reporting. However, I'm not convinced this makes the reports much more useful,
and we'd have to spit out extra CounterMappingRegions to achieve this.

Do you think it will be difficult to handle cases where
regions have normal statements that are covered mixed with statements which reference
declarations with a “nocoverage” attribute? Moreover, how will you handle the case of a
sub-expression referencing a “nocoverage” declaration?

I think the scheme I hand-waved through above could work. I'm not sure whether
or not we'd also need to add the source range corresponding to the call marked
"nocoverage" to the skipped regions list.

The question is whether we really want to go through the trouble. If "foo" is
marked nocoverage, and we e.g use its return value in some expresssion, do we
really want to suppress coverage information for that code region? That sounds
like it could get sketchy.

By the way, do you think that we should have something similar for macros? We obviously
wouldn’t be able to use the attribute, but it seems like it would be useful to suppress coverage
for certain macros.

I haven't seen any requests for this. I hadn't really thought about macros.

If that's something we want, it could be a good argument for implementing
coverage suppression with pragmas. It'd be interesting to here feedback about
this.

vedant

Vedant Kumar via cfe-dev <cfe-dev@lists.llvm.org> writes:

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

  * llvm_unreachable_internal.

Sure, but see below.

  * The various dump() functions for llvm Values, AST nodes, etc.

For things like dump() you can probably just key on used/unused
attributes. We statically know at compile time it isn't actually used,
so we also know that coverage isn't interesting.

  * Methods in a base class which just call llvm_unreachable.

These are usually indicative of bad design no? Why aren't they deleted
or abstract instead? I'm not sure how valuable suppressing them from
coverage is.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage for
them.

I'd like to be able to write this in C:

  void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

  void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

I don't think this really goes far enough to be worthwhile. The number
of functions that are intentionally completely uncovered is minuscule
(or have things like __attribute__((__used__)) that we should be able to
key off of already).

I guess what you're really trying to do longer term is gain the ability
to recognize blocks in the caller that aren't supposed to be reached
(like a __builtin_unreachable after a fully covered switch) and suppress
coverage for them. Admittedly these annotations would make that easier,
but then we need to propagate them up through the call graph anyway like
your suggestion below, so recognizing __builtin_unreachable and abort is
probably enough.

Here are some alternatives and why I think they're not as good:

  * Define a preprocessor macro when -fcoverage-mapping is enabled.

    Conditionally compiling code based on whether code coverage is enabled
    sounds scary. We shouldn't make it easy (or possible?) to change the
    meaning of a program by enabling coverage.

Agreed, I don't like this either.

  * Pass a function blacklist to llvm-cov.

    The blacklist would have to live separately from the source code, and may
    get out of sync. We also would go through the trouble of emitting coverage
    mappings for functions even though they aren't needed.

  * Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

    We'd need some extra diagnostics to catch abuse of the pragmas. It also
    requires more typing in the common case (disabling coverage at the function
    level).

This seems quite a bit more awkward, so I think it wouldn't be useable
enough to be worth it, but it is strictly more flexible if you ended up
wanting to annotate unreachable blocks directly.

  * Look at the function CFG. If all paths through the function can be shown to
    reach __builtin_unreachable(), don't create coverage mappings for the
    function.

    I'm worried this might be complicated and inflexible. It wouldn't let us
    mark dump() functions as uncovered.

Like I said above, we basically need something like this anyway if we
want to do anything more interesting with the attribute, no?

Vedant Kumar via cfe-dev <cfe-dev@lists.llvm.org> writes:

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

* llvm_unreachable_internal.

Sure, but see below.

* The various dump() functions for llvm Values, AST nodes, etc.

For things like dump() you can probably just key on used/unused
attributes. We statically know at compile time it isn't actually used,
so we also know that coverage isn't interesting.

I think not having coverage for functions marked "used" is a little iffy since
users are free to apply that attribute to interesting functions. But, you raise
a great point about the unused attribute -- we should do something with it.

* Methods in a base class which just call llvm_unreachable.

These are usually indicative of bad design no? Why aren't they deleted
or abstract instead? I'm not sure how valuable suppressing them from
coverage is.

Sorry, that was a thinko. I meant to write 'derived'. Example:

  struct Base {
    virtual int foo() { return 0; }
  };
  
  struct Derived : public Base {
    virtual int foo() = delete; //< Illegal.
    virtual int foo() { unreachable(); } //< OK, but uncovered.
  };

The derived version of foo() should just be marked unused.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage for
them.

I'd like to be able to write this in C:

void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

I don't think this really goes far enough to be worthwhile. The number
of functions that are intentionally completely uncovered is minuscule
(or have things like __attribute__((__used__)) that we should be able to
key off of already).

We could bump up the function coverage of a lot of the files in llvm by
suppressing coverage reporting at the function level. Though, admittedly, not
by much, and a lot of the candidate functions *are* already marked 'used'.

I guess what you're really trying to do longer term is gain the ability
to recognize blocks in the caller that aren't supposed to be reached
(like a __builtin_unreachable after a fully covered switch) and suppress
coverage for them. Admittedly these annotations would make that easier,
but then we need to propagate them up through the call graph anyway like
your suggestion below, so recognizing __builtin_unreachable and abort is
probably enough.

Yeah, anything marked noreturn. We don't need a new attribute to do this.

Here are some alternatives and why I think they're not as good:

* Define a preprocessor macro when -fcoverage-mapping is enabled.

   Conditionally compiling code based on whether code coverage is enabled
   sounds scary. We shouldn't make it easy (or possible?) to change the
   meaning of a program by enabling coverage.

Agreed, I don't like this either.

* Pass a function blacklist to llvm-cov.

   The blacklist would have to live separately from the source code, and may
   get out of sync. We also would go through the trouble of emitting coverage
   mappings for functions even though they aren't needed.

* Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

   We'd need some extra diagnostics to catch abuse of the pragmas. It also
   requires more typing in the common case (disabling coverage at the function
   level).

This seems quite a bit more awkward, so I think it wouldn't be useable
enough to be worth it, but it is strictly more flexible if you ended up
wanting to annotate unreachable blocks directly.

Agreed. I'm taking this to mean that there isn't a lot of interest in
suppressing coverage mapping generation for specific macros.

* Look at the function CFG. If all paths through the function can be shown to
   reach __builtin_unreachable(), don't create coverage mappings for the
   function.

   I'm worried this might be complicated and inflexible. It wouldn't let us
   mark dump() functions as uncovered.

Like I said above, we basically need something like this anyway if we
want to do anything more interesting with the attribute, no?

Hm, do we really need a CFG?

I was thinking we could add an 'Unreachable' bit to SourceMappingRegion. Then
when we're emitting source regions, treat unreachable regions like skipped
ones (i.e makeSkipped(), not makeRegion()):

  void VisitCallExpr(const CallExpr *E) {
    if (E is not marked 'noreturn')
      just visit its children and return, as we normally would

    terminateRegion(E); //< Push a new region.
    getRegion().setUnreachable(true);
  }

** hand-waving intensifies **

vedant

Vedant Kumar via cfe-dev <cfe-dev@lists.llvm.org> writes:

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

* llvm_unreachable_internal.

Sure, but see below.

* The various dump() functions for llvm Values, AST nodes, etc.

For things like dump() you can probably just key on used/unused
attributes. We statically know at compile time it isn't actually used,
so we also know that coverage isn't interesting.

* Methods in a base class which just call llvm_unreachable.

These are usually indicative of bad design no? Why aren't they deleted
or abstract instead? I'm not sure how valuable suppressing them from
coverage is.

Isn’t happening when you have an "optional feature”, like if XX is enabled then this method has to be overridden/implemented, otherwise it is not expected to be called.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing coverage for
them.

I'd like to be able to write this in C:

void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

I don't think this really goes far enough to be worthwhile. The number
of functions that are intentionally completely uncovered is minuscule
(or have things like __attribute__((__used__)) that we should be able to
key off of already).

__attribute__((__used__)) is intended to be added on method called from inline assembly.
It is not clear why we should key them off?

I guess what you're really trying to do longer term is gain the ability
to recognize blocks in the caller that aren't supposed to be reached
(like a __builtin_unreachable after a fully covered switch) and suppress
coverage for them. Admittedly these annotations would make that easier,
but then we need to propagate them up through the call graph anyway like
your suggestion below, so recognizing __builtin_unreachable and abort is
probably enough.

Here are some alternatives and why I think they're not as good:

* Define a preprocessor macro when -fcoverage-mapping is enabled.

   Conditionally compiling code based on whether code coverage is enabled
   sounds scary. We shouldn't make it easy (or possible?) to change the
   meaning of a program by enabling coverage.

Agreed, I don't like this either.

* Pass a function blacklist to llvm-cov.

   The blacklist would have to live separately from the source code, and may
   get out of sync. We also would go through the trouble of emitting coverage
   mappings for functions even though they aren't needed.

* Add a pair of pragmas to arbitrarily stop/resume coverage mapping.

   We'd need some extra diagnostics to catch abuse of the pragmas. It also
   requires more typing in the common case (disabling coverage at the function
   level).

This seems quite a bit more awkward, so I think it wouldn't be useable
enough to be worth it, but it is strictly more flexible if you ended up
wanting to annotate unreachable blocks directly.

* Look at the function CFG. If all paths through the function can be shown to
   reach __builtin_unreachable(), don't create coverage mappings for the
   function.

   I'm worried this might be complicated and inflexible. It wouldn't let us
   mark dump() functions as uncovered.

Like I said above, we basically need something like this anyway if we
want to do anything more interesting with the attribute, no?

I agree, not creating coverage for unreachable blocks would be nice as well!
Not clear if a CFG is needed though, I expect that most of the time the frontend can only see a single unreachable block that should be removed from the coverage map.

To illustrate what I mean, see TargetLowering::LowerOperation for instance.

Mehdi Amini <mehdi.amini@apple.com> writes:

Vedant Kumar via cfe-dev <cfe-dev@lists.llvm.org> writes:

Hi,

I'd like to add a new attribute which can be used to suppress code coverage
reporting on the function level. This is meant to work with clang's
frontend-based coverage implementation [1]. Here are some potential users of
the new attribute:

* llvm_unreachable_internal.

Sure, but see below.

* The various dump() functions for llvm Values, AST nodes, etc.

For things like dump() you can probably just key on used/unused
attributes. We statically know at compile time it isn't actually used,
so we also know that coverage isn't interesting.

* Methods in a base class which just call llvm_unreachable.

These are usually indicative of bad design no? Why aren't they deleted
or abstract instead? I'm not sure how valuable suppressing them from
coverage is.

Isn’t happening when you have an "optional feature”, like if XX is
enabled then this method has to be overridden/implemented, otherwise
it is not expected to be called.

Fair enough, I don't really like that pattern but it does happen.

These functions are usually not covered. It's not practical to write "death
tests" for all of them, and it's also unhelpful to report missing
coverage for
them.

I'd like to be able to write this in C:

void foo() __attribute__((nocoverage)) { llvm_unreachable("boom!"); }

And this in C++11:

void foo() [[clang::nocoverage]] { llvm_unreachable("boom!"); }

I don't think this really goes far enough to be worthwhile. The number
of functions that are intentionally completely uncovered is minuscule
(or have things like __attribute__((__used__)) that we should be able to
key off of already).

__attribute__((__used__)) is intended to be added on method called
from inline assembly.
It is not clear why we should key them off?

The place where I see __attribute__((__used__)) come up pretty often is
in methods that aren't used but you want to emit in the final binary so
that people can call them in a debugger. An example of this is
LLVM_DUMP_METHOD. For that case, the programmer promises "hey this
really is used", but it isn't actually covered since its never called.

OTOH, coverage for things called by inline assembly is definitely
interesting, so it probably isn't safe to suppress coverage for all
__used__ functions.