Using source-based code coverage on baremetal

Hi all,

I think using code coverage on baremetal has come up once or twice on llvmdev, but I don't think anyone has actually written up how the workflow works, or what issues come up. This description is based on work done together with my colleague Weiming Zhao.

By "baremetal" here, I mean an embedded environment without an operating system. We specifically used a ARM target with semihosting, but similar steps should work elsewhere.

The workflow:

1. First, you need a copy of libclangrt_profile.a, stripped down for the baremetal target. (More on this below.)

2. Then, you need to change the source code to call into it; since a baremetal image doesn't exit like an operating system process, you need to insert code somewhere to write out the profile data yourself. We used __llvm_profile_get_size_for_buffer() and __llvm_profile_write_buf() for this (and semihosting APIs to transfer the resulting buffer to the host).

3. Then, you have to edit your linker script to include the necessary sections. __llvm_prf_names, __llvm_prf_data, and __llvm_prf_cnts need to be included in the final image. And __llvm_covmap needs to be in a non-allocatable section in the ELF output. And depending on how your linker behaves, you might need to explicitly mark all of these sections KEEP so parts don't get dropped. This is actually the trickiest part, in the sense that messing it up lead to obscure issues which are difficult to debug. At best, you get an error message like "No coverage data found" or "Malformed instrumentation profile data". Or, if you're using a build of LLVM more than a few months old, coverage data can be silently dropped.

4. Then, you add "-fprofile-instr-generate -fcoverage-mapping -mllvm -enable-value-profiling=false" to your CFLAGS.

5. Then, you build and run the image in the semihosted environment. If everything works, you get a file with raw profile data, and use the normal workflow to convert it into a report.

Areas that required LLVM changes:

1. The copy of libclangrt_profile.a for the target. Given that we already were using builtins from compiler-rt, the primary changes required are enabling the profile library and excluding a bunch of files from the build (since baremetal doesn't have a filesystem, system calls, etc.). I'll look into posting patches when I have time, but it might take me a little while for me to figure out how to cleanly modify the build, and verify everything actually works on trunk. It looks like there's a CMake variable COMPILER_RT_BAREMETAL_BUILD which is supposed to be turned on for this sort of environment?

2. Changing the compiler and compiler-rt to use __start and __end symbols to find the sections, rather than .init code. This isn't strictly necessary, but our linker supports __start and __end, and this was easier than changing the baremetal image to handle a .init section. See needsRuntimeRegistrationOfSectionRange in lib/Transforms/Instrumentation/InstrProfiling.cpp; we currently only whitelist a few platforms. Not sure what would be appropriate here; maybe we could assume any *-none-* triple supports __start and __end symbols? Or maybe control it with a flag somehow? Or something else I'm not thinking of?

Other problem areas:

1. We turned value profiling off because we were running into runtime issues; specifically, we had infinite recursion because we instrumented malloc. It isn't really important to have in this context (coverage reports currently don't use the data at all), but is there some way to improve this workflow to involve fewer magic command-line flags?

2. The error messages produced by llvm-profdata and llvm-cov tools could probably be improved, to describe the actual issue in more detail.

Next steps:

The next steps here depend on community interest, I guess... has anyone else tried something like this? Is anyone interested in my patches? Should we add a section to the coverage documentation?

-Eli

Hello Eli,

I think that this would be very useful for bare-metal systems,
particularly for those without trace capture or the software to
analyze it. I seem to remember an internal proof of concept experiment
to see if we could get profiling working in an embedded toolchain,
which we did after making similar, but toolchain specific,
linker/library modifications that you did.

I think the most valuable contribution would be in example
documentation such as the steps you outline above. Moreover I think
that more documentation and recipes for building and testing
compiler-rt for bare-metal would be extremely valuable.

I'm not much of an expert on profiling, or compiler-rt but I'm willing
to help out where I can.

My understanding is that COMPILER_RT_BAREMETAL_BUILD is indeed used to
exclude objects from the library that would be incompatible with a
bare-metal target. It looks like it was introduced in
https://reviews.llvm.org/D33018.

I've not looked into how this works in compiler-rt/compiler right now
so apologies if this is somewhat speculative.
As I understand it, the section __start and __end symbols is a GNU ld
convention. I think one way of baking this in would be to separate out
functions that access the __start and __end symbols into
getSectionStart(), getSectionEnd() functions (probably weak
definitions) that are in their own .o file. If a bare-metal toolchain
has a different set of linker magic/convention it can implement its
own getSectionStart() and getSectionEnd(). On the assumption that most
bare-metal toolchains have their own target and driver a flag sounds
like a good idea.

Peter

"Friedman, Eli via llvm-dev" <llvm-dev@lists.llvm.org> writes:

Hi all,

I think using code coverage on baremetal has come up once or twice on
llvmdev, but I don't think anyone has actually written up how the
workflow works, or what issues come up. This description is based on
work done together with my colleague Weiming Zhao.

By "baremetal" here, I mean an embedded environment without an
operating system. We specifically used a ARM target with semihosting,
but similar steps should work elsewhere.

The workflow:

1. First, you need a copy of libclangrt_profile.a, stripped down for
the baremetal target. (More on this below.)

2. Then, you need to change the source code to call into it; since a
baremetal image doesn't exit like an operating system process, you
need to insert code somewhere to write out the profile data yourself.
We used __llvm_profile_get_size_for_buffer() and
__llvm_profile_write_buf() for this (and semihosting APIs to transfer
the resulting buffer to the host).

3. Then, you have to edit your linker script to include the necessary
sections. __llvm_prf_names, __llvm_prf_data, and __llvm_prf_cnts need
to be included in the final image. And __llvm_covmap needs to be in a
non-allocatable section in the ELF output. And depending on how your
linker behaves, you might need to explicitly mark all of these
sections KEEP so parts don't get dropped. This is actually the
trickiest part, in the sense that messing it up lead to obscure issues
which are difficult to debug. At best, you get an error message like
"No coverage data found" or "Malformed instrumentation profile data".
Or, if you're using a build of LLVM more than a few months old,
coverage data can be silently dropped.

4. Then, you add "-fprofile-instr-generate -fcoverage-mapping -mllvm
-enable-value-profiling=false" to your CFLAGS.

5. Then, you build and run the image in the semihosted environment. If
everything works, you get a file with raw profile data, and use the
normal workflow to convert it into a report.

Areas that required LLVM changes:

1. The copy of libclangrt_profile.a for the target. Given that we
already were using builtins from compiler-rt, the primary changes
required are enabling the profile library and excluding a bunch of
files from the build (since baremetal doesn't have a filesystem,
system calls, etc.). I'll look into posting patches when I have time,
but it might take me a little while for me to figure out how to
cleanly modify the build, and verify everything actually works on
trunk. It looks like there's a CMake variable
COMPILER_RT_BAREMETAL_BUILD which is supposed to be turned on for this
sort of environment?

I didn't know about COMPILER_RT_BAREMETAL_BUILD, but wiring it up to set
up the profiling libraries in the appropriate config sounds reasonable.
As I'm sure you noticed, the library is designed to be used this way
(with parts like filesystem access in their own TUs), but there isn't
really a build target to generate a ready-made library for it. I'm a
little worried that it would be hard to set that up to be flexible
enough for the different environments that want this, but we can worry
about that if it comes up.

Most of the people I've seen do this just build normally and link in the
objects they need.

2. Changing the compiler and compiler-rt to use __start and __end
symbols to find the sections, rather than .init code. This isn't
strictly necessary, but our linker supports __start and __end, and
this was easier than changing the baremetal image to handle a .init
section. See needsRuntimeRegistrationOfSectionRange in
lib/Transforms/Instrumentation/InstrProfiling.cpp; we currently only
whitelist a few platforms. Not sure what would be appropriate here;
maybe we could assume any *-none-* triple supports __start and __end
symbols? Or maybe control it with a flag somehow? Or something else
I'm not thinking of?

The symbols used to avoid runtime registration are largely linker
specific (see the darwin versions for an example of differing ones), so
for unusual toolchains it'll definitely be tricky to set up good
defaults. It's probably safest to set up a flag to say which set up to
use. As above, I think most current users are just setting up their
build to include the ones they want manually.

Other problem areas:

1. We turned value profiling off because we were running into runtime
issues; specifically, we had infinite recursion because we
instrumented malloc. It isn't really important to have in this
context (coverage reports currently don't use the data at all), but is
there some way to improve this workflow to involve fewer magic
command-line flags?

The value profiling was never set up to avoid libc sufficiently to be
used in this kind of context. I guess if we start building a baremetal
profiling library target we'd just have it disabled there.

2. The error messages produced by llvm-profdata and llvm-cov tools
could probably be improved, to describe the actual issue in more
detail.

This is definitely true, but I don't think anyone's been spending time
on it lately.

Next steps:

The next steps here depend on community interest, I guess... has
anyone else tried something like this? Is anyone interested in my
patches? Should we add a section to the coverage documentation?

I know of several groups who've done the same, and I've given vague
advice as to how to do it on many occasions. Adding some documentation
on this would be helpful to a lot of people :slight_smile:

Hello Eli,

I think that this would be very useful for bare-metal systems,
particularly for those without trace capture or the software to
analyze it. I seem to remember an internal proof of concept experiment
to see if we could get profiling working in an embedded toolchain,
which we did after making similar, but toolchain specific,
linker/library modifications that you did.

I think the most valuable contribution would be in example
documentation such as the steps you outline above.

+ 1

Moreover I think
that more documentation and recipes for building and testing
compiler-rt for bare-metal would be extremely valuable.

I'm not much of an expert on profiling, or compiler-rt but I'm willing
to help out where I can.

My understanding is that COMPILER_RT_BAREMETAL_BUILD is indeed used to
exclude objects from the library that would be incompatible with a
bare-metal target. It looks like it was introduced in
⚙ D33018 [compiler-rt][cmake] Provide empty version of enable_execute_stack for baremetal targets.

I've not looked into how this works in compiler-rt/compiler right now
so apologies if this is somewhat speculative.
As I understand it, the section __start and __end symbols is a GNU ld
convention. I think one way of baking this in would be to separate out
functions that access the __start and __end symbols into
getSectionStart(), getSectionEnd() functions (probably weak
definitions) that are in their own .o file. If a bare-metal toolchain
has a different set of linker magic/convention it can implement its
own getSectionStart() and getSectionEnd(). On the assumption that most
bare-metal toolchains have their own target and driver a flag sounds
like a good idea.

Something like this exists in the form of __llvm_profile_begin_data, __llvm_profile_begin_data, etc helpers in the runtime. It's not quite enough because the host compiler also needs to know not to emit a call to __llvm_profile_register_function.

Peter

Hi all,

I think using code coverage on baremetal has come up once or twice on
llvmdev, but I don't think anyone has actually written up how the workflow
works, or what issues come up. This description is based on work done
together with my colleague Weiming Zhao.

By "baremetal" here, I mean an embedded environment without an operating
system. We specifically used a ARM target with semihosting, but similar
steps should work elsewhere.

The workflow:

1. First, you need a copy of libclangrt_profile.a, stripped down for the
baremetal target. (More on this below.)

2. Then, you need to change the source code to call into it; since a
baremetal image doesn't exit like an operating system process, you need to
insert code somewhere to write out the profile data yourself. We used
__llvm_profile_get_size_for_buffer() and __llvm_profile_write_buf() for this
(and semihosting APIs to transfer the resulting buffer to the host).

3. Then, you have to edit your linker script to include the necessary
sections. __llvm_prf_names, __llvm_prf_data, and __llvm_prf_cnts need to be
included in the final image. And __llvm_covmap needs to be in a
non-allocatable section in the ELF output. And depending on how your linker
behaves, you might need to explicitly mark all of these sections KEEP so
parts don't get dropped. This is actually the trickiest part, in the sense
that messing it up lead to obscure issues which are difficult to debug. At
best, you get an error message like "No coverage data found" or "Malformed
instrumentation profile data". Or, if you're using a build of LLVM more
than a few months old, coverage data can be silently dropped.

4. Then, you add "-fprofile-instr-generate -fcoverage-mapping -mllvm
-enable-value-profiling=false" to your CFLAGS.

5. Then, you build and run the image in the semihosted environment. If
everything works, you get a file with raw profile data, and use the normal
workflow to convert it into a report.

Areas that required LLVM changes:

1. The copy of libclangrt_profile.a for the target. Given that we already
were using builtins from compiler-rt, the primary changes required are
enabling the profile library and excluding a bunch of files from the build
(since baremetal doesn't have a filesystem, system calls, etc.). I'll look
into posting patches when I have time, but it might take me a little while
for me to figure out how to cleanly modify the build, and verify everything
actually works on trunk. It looks like there's a CMake variable
COMPILER_RT_BAREMETAL_BUILD which is supposed to be turned on for this sort
of environment?

I'm in favor of using that flag to configure a bare metal runtime. We should be able to adapt some existing tests to work with a stripped-down runtime. Some tests look like they would work without modifications (perhaps test/profile/instrprof-merge.c?). These tests could be moved to a baremetal subdirectory and enabled when COMPILER_RT_BAREMETAL_BUILD is on.

2. Changing the compiler and compiler-rt to use __start and __end symbols to
find the sections, rather than .init code. This isn't strictly necessary,
but our linker supports __start and __end, and this was easier than changing
the baremetal image to handle a .init section. See
needsRuntimeRegistrationOfSectionRange in
lib/Transforms/Instrumentation/InstrProfiling.cpp; we currently only
whitelist a few platforms.

Not sure what would be appropriate here; maybe
we could assume any *-none-* triple supports __start and __end symbols?

I think this is a good option. It might get us in trouble when compiling for some bare metal target which doesn't support start/end symbols. OTOH, it wouldn't be fair to assume .init handling is present either. This seems like a practical way forward.

Or
maybe control it with a flag somehow? Or something else I'm not thinking of?

Adding an -mllvm flag or a driver flag is my least-preferred solution. I don't have a hard reason for this, I just think that InstrProfiling has a lot of knobs already. This solution would make more sense to me if the situation I described above arises.

Other problem areas:

1. We turned value profiling off because we were running into runtime
issues; specifically, we had infinite recursion because we instrumented
malloc. It isn't really important to have in this context (coverage reports
currently don't use the data at all), but is there some way to improve this
workflow to involve fewer magic command-line flags?

The cleanest solution here is probably to disable value-profiling when -fcoverage-mapping is enabled.

Another option is to disable value profiling with -fprofile-instr-generate and to add a separate driver flag to enable it.

Both options drop backwards compatibility but the first option seems "less bad": I don't know of any users who use profiles gathered for coverage to do PGO. In fact the testing goals of coverage/PGO runs are usually contradictory (you want complete profiles to maximize test coverage, but representative profiles to maximize performance).

2. The error messages produced by llvm-profdata and llvm-cov tools could
probably be improved, to describe the actual issue in more detail.

I admit that getting "error: malformed data" is a bad user experience but I also don't think it's a typical one. The error-reporting logic in InstrProfiling was primarily meant to make debugging easier and by that measure it's a success. You can set a breakpoint on the ErrorInfo constructor and get to the bottom of issues relatively quickly.

I'm open to incrementally improving the error messages but I suspect that it'll take a lot of work. I also suspect that the majority of users won't benefit from this work. At this time of writing there are 46 distinct sites in lib/Profile which return 'malformed': coming up with descriptive or actionable messages for each site might take a long time.

Next steps:

The next steps here depend on community interest, I guess... has anyone else
tried something like this? Is anyone interested in my patches? Should we
add a section to the coverage documentation?

Yes to all of this :). I think that this would be an especially useful addition to the coverage docs.

thanks,
vedant

Areas that required LLVM changes:

1. The copy of libclangrt_profile.a for the target. Given that we already were using builtins from compiler-rt, the primary changes required are enabling the profile library and excluding a bunch of files from the build (since baremetal doesn't have a filesystem, system calls, etc.). I'll look into posting patches when I have time, but it might take me a little while for me to figure out how to cleanly modify the build, and verify everything actually works on trunk. It looks like there's a CMake variable COMPILER_RT_BAREMETAL_BUILD which is supposed to be turned on for this sort of environment?

Yes, that's exactly what that variable is for.

See also: clang/cmake/caches/BaremetalARM.cmake. I haven't taught this how to do the rest of the runtime bits (unwinder/libcxxabi/libcxx), but plan to at some point.

2. Changing the compiler and compiler-rt to use __start and __end symbols to find the sections, rather than .init code. This isn't strictly necessary, but our linker supports __start and __end, and this was easier than changing the baremetal image to handle a .init section. See needsRuntimeRegistrationOfSectionRange in lib/Transforms/Instrumentation/InstrProfiling.cpp; we currently only whitelist a few platforms. Not sure what would be appropriate here; maybe we could assume any *-none-* triple supports __start and __end symbols? Or maybe control it with a flag somehow? Or something else I'm not thinking of?

A flag for this sounds great.

Jon

I think that this proposal would be very useful, and I will describe our experiences of trying to do this for our embedded bare-metal target.

Recently we implemented support for just the '-fprofile-instr-generate' option and the 'compiler-rt/lib/profile' sources, and added the following to our LD scripts:

      /* Append the LLVM profiling sections */
      . = ALIGN(4);
      PROVIDE(__start___llvm_prf_cnts = .);
      *(__llvm_prf_cnts)
      PROVIDE(__stop___llvm_prf_cnts = .);

      . = ALIGN(4);
      PROVIDE(__start___llvm_prf_data = .);
      *(__llvm_prf_data)
      PROVIDE(__stop___llvm_prf_data = .);

      . = ALIGN(4);
      PROVIDE(__start___llvm_prf_names = .);
      *(__llvm_prf_names)
      PROVIDE(__stop___llvm_prf_names = .);

      . = ALIGN(4);
      PROVIDE(__start___llvm_prf_vnds = .);
      *(__llvm_prf_vnds)
      PROVIDE(__stop___llvm_prf_vnds = .);

This removed the need for the '.ctors' model for registering functions (which also reduces the run-time cost) and enabled our target to use the model described in 'InstrProfilingPlatformLinux.cpp' instead of 'InstrProfilingPlatformOther.cpp', adding our triple to 'lib/Transforms/Instrumentation/InstrProfiling.cpp'.

We use Newlib for our LibC so we have a reasonably complete ISO C library, but we do not have a file-system so the FILE based I/O cannot work. And as there is no environment, dependence on environment variables is also meaningless. I won't even bother discussing memory-mapped files :wink:

We also have to ensure that the basic instrumentation initialisation process normally handled by 'RegistrationRuntime Registration' is performed before the program is allowed execute, and that the data is subsequently dumped (taken off-chip) after execution. This is done with a bit of smoke and mirrors as many programs in the embedded environment to not have support for running the '.ctors' functions before and the 'atexit' functions after execution (especially C programs).

But the Compiler-RT profile library also integrates the automatic merging and collation of data from multiple runs within the library implementation itself, and this is a really significant problem for base-metal system with no OS and no file-system. It does this using "patterns" in the file name (derived from the environment), and the data collation performed by the system being profiled.

I think that to better facilitate bare-metal systems, this process of collating the results of multiple runs would be best provided by a separate stand-alone utility on the host system that would perform this logic offline rather than having it integrated online as it is currently defined.

Our implementation can now gather data for a single run ('default.profraw') but does not (yet) have the capability of collating the results from more than one profiling run.

We have not yet started supporting the other instrumentation such as coverage and ubsan, but hope to do so now that we have figured out how to do basic profiling for PGO.

  MartinO

We also have to ensure that the basic instrumentation initialisation process normally handled by 'RegistrationRuntime Registration' is performed before the program is allowed execute, and that the data is subsequently dumped (taken off-chip) after execution. This is done with a bit of smoke and mirrors as many programs in the embedded environment to not have support for running the '.ctors' functions before and the 'atexit' functions after execution (especially C programs).

I don't think RegisterRuntime provides any relevant functionality unless you're writing to a file (I didn't even realize that existed before now).

We've been modifying the source code to write out profile data because our images never actually "exit". Maybe there's something more clever we can do in some cases.

But the Compiler-RT profile library also integrates the automatic merging and collation of data from multiple runs within the library implementation itself, and this is a really significant problem for base-metal system with no OS and no file-system. It does this using "patterns" in the file name (derived from the environment), and the data collation performed by the system being profiled.

I think that to better facilitate bare-metal systems, this process of collating the results of multiple runs would be best provided by a separate stand-alone utility on the host system that would perform this logic offline rather than having it integrated online as it is currently defined.

The logic for merging raw profiles already exists in llvm-profdata (see llvm-profdata - Profile data tool — LLVM 18.0.0git documentation ).

-Eli

Hi Eli,

What we have done is inserted calls to:

   lprofSetupValueProfiler()
   __llvm_profile_initialize_file()

before execution, and:

   __llvm_profile_write_file()

after execution to ensure that the functionality of the 'RegisterRuntime' and 'atexit' are preserved. I must admit I did not drill deeper to see if this was necessary at all, we are still at the prototype stage of this work, although at an advanced stage (it does actually work ;-). I will have to experiment to see if these calls are necessary at all, and if not simplify our solution. We do however "fake" some file functionality to get the profiling working, but a non-file based approach would be way better.

Rather than inserting code to call these, we are using a combination of 'objcopy' to rename the entry-point to the program, and use a shim with the original name which calls the profile initialisation functions before calling the user's original renamed entry-point. Similarly another shim for termination. This is what I meant by "smoke and mirrors", but it is effective and does not require special alteration to the program source.

Needless to say, a proper bona-fide mechanism for doing this in the embedded space would be way preferred.

The essential elements (as I see it) for the various compiler-rt instrumentation support is:

1. A simple hook to ensure that the data structures
    structures involved in the instrumentation are
    properly initialised
2. Another hook to allow the system to offload the
    instrumentation data
3. Offline tools/utilities on the host system to
    aggregate and collate the data produced by
    multiple runs - this could possibly be achieved
    by extending the functionality of 'llvm-profile'.
    I didn't realise that 'llvm-profile' already had
    support for merging multiple data sets, I will
    have to learn how to do that - thanks for the tip

We still haven't experimented with incrementally offloading the instrumentation data during execution, and that would be a very neat capability.

Thanks,

    MartinO