When merging is disabled, a possible solution is to unmap the mapped file region, copy the old profile file to the new file, and then mmap the CounterBegin to the new file region. Does this sound correct?
If the instrumented program is multithreaded, it might be possible for a thread to racily update a counter in the un-mmap’d region.
I thought that the instrumented program can only update counters in the mmapped regions which are setup during the initialization. How does it access un-mmaped regions?
If the __llvm_prf_cnts section is mmap’d to a file (either directly, or from a standalone buffer as is done for Fuchsia), a possible sequence might be:
Thread 1: __llvm_profile_set_file_object(…)
Thread 1: munmap(countersRegion)
Thread 2: ++countersRegion[idx] <= Write to __llvm_prf_cnts after munmap(), not clear what this memory region contains.
Thread 1: Copy old profile…
Thread 1: countersRegion = mmap(new_fd, …)
What I meant to express earlier is that I’m certain about what the contents of __llvm_prf_cnts are after the munmap(). If it turns out that the munmap() doesn’t alter the in-memory counters [*], there should be no race. The subsequent mmap(new_fd) should write up-to-date counters to the new profile. It could be worth checking in a proof of concept (similar to darwin-proof-of-concept.c) to determine the behavior of mmap(__llvm_prf_cnts, fd1) → munmap() → mmap(__llvm_prf_cnts, fd2).
[*] The BSD manual for munmap(2) suggests that this is what happens:
[File mapping] If the mapping maps data from a file (MAP_SHARED), then the memory will eventually be written back to disk if it's
dirty. This will happen automatically at some point in the future (implementation dependent). Note: to force the
memory to be written back to the disk, use msync(2).
If there are still other references to the memory when the munmap is done, then nothing is done to the memory itself and it may be
swapped out if need be.
Do you think it’s viable to sidestep the issue by restricting how/when __llvm_profile_set_file_object can be called, e.g. either by documenting or requiring that no other threads are active at the time of the call?
This looks viable.
When merging is enabled, multiple profile files will be created. I’m not sure how to set the file object in this case as only one file descriptor is passed to __llvm_profile_set_file_object.
Can you share how __llvm_profile_set_file_object(…, EnableMerge = true) works in non-continuous mode? It’s not something I’m familiar with.
By looking at existing tests, __llvm_profile_set_file_object(…, EnableMerge = true) allows instrumented programs to accumulate profile files from different runs. My guess is that when running two instrumented programs, this function allows them to write to the same profile file in appending mode, but I’m not sure how it handles the case when multiple profile files are created during initialization for non-continuous mode.
For the merging + continuous mode case, it sounds like it’d be necessary to lock a profile, update the in-memory __llvm_prf_cnts section with its contents (the merging step), and then set up a fresh mmap().