[pre-RFC] Data races in concurrent ThinLTO processes

Hello,

I am sending the following proposal to discuss issues and solutions regarding data races in concurrent ThinLTO processes.

This caught my attention when we encountered a race condition in ThinLTO with caching.

While looking into how ThinLTO deals with the problem of cache files reads/writes/deletes I spotted a lot of problems: some of them are related to data races, others - to file/resource sharing between different processes. I wanted to point out these problems and start the discussion about potential solutions. I would like to get your feedback.

Problem #1: In certain common situations, ‘rename’ can fail, causing a data race later on.

Look at the following code in the function ‘write’ in ThinLTOCodeGenerator.cpp

std::error_code EC =

sys::fs::createTemporaryFile(“Thin”, “tmp.o”, TempFD, TempFilename);

if (EC) {

errs() << "Error: " << EC.message() << “\n”;

report_fatal_error(“ThinLTO: Can’t get a temporary file”);

}

{

raw_fd_ostream OS(TempFD, /* ShouldClose */ true);

OS << OutputBuffer.getBuffer();

}

// Rename to final destination (hopefully race condition won’t matter here)

EC = sys::fs::rename(TempFilename, EntryPath);

Compared to a race-prone direct write of a buffer to a file, this code avoids a data race by first writing a buffer to a temp file and then renaming that temp file to become the final file in the cache. After r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees atomicity of writing a file to the cache,

(except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).

However, ‘rename’ might fail and return an error code in several different situations. If this happens, we will fall back to a direct write to the file, which is neither atomic, nor avoids race conditions (causing problem #3).

An example where ‘rename’ can fail on both Windows and Unix-like systems, causing us to fall back to using non-atomic write, is problem #2.

Solution:

All solutions for problem #3 (see below) will take care of problem #1.

Problem #2 (Use of POSIX-compliant ‘rename’ function is not always suitable for ThinLTO)

We just encountered this issue in Sony. For ThinLTO, we use the function ‘rename’, which after r315079 doesn’t support renaming across the different logical volumes on Windows.

Let’s say a user has a %TEMP% directory on volume C:, but he/she passed the caching directory name located on volume D:, then ‘rename’ fails. Since we don’t support renaming across different volumes after r315079, we then fall back to writing to the cache file directly (which is not atomic) and hit a data race.

We anticipate that this will be a common situation for our users, many of whom are Windows “power users” and have multiple disks for different purposes.

I think this problem doesn’t only affect Windows. The same issue might happen on Linux/MacOS, if the user’s $TEMP/$TMP directory is located in a different file system than the user-specified caching directory.

Solution #1 (not so good):

The patch below solves this problem on Windows, however, we will lose the advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is not good.

This patch assumes that the function ‘rename’ in Windows/Path.inc is expected to work when the source and destination are located of different logical volumes or different file systems. Note that function ‘rename’ is not ThinLTO-specific and might have some other consumers who want this behavior (which was the behavior before r315079). However, it seems that this assumption has changed later to make ‘rename’ more POSIX-compliant. In this case, we should add a comment to the function ‘rename’ so that its users knew that renaming across different volumes is not supported by design. Or we could have two different functions, one POSIX-compliant, not allowing renames across different volumes, another non-POSIX-compliant.

Before r315079, the function ‘rename’, (in the case when the destination file isn’t opened), boiled down to calling the function ‘MoveFileExW’ with the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the volumes.

The current implementation of the function ‘rename’ (in ‘rename_internal’), essentially calls ‘SetFileInformationByHandle’. This function is fast and atomic, so in a sense, it’s an improvement over the previously used ‘MoveFileExW’. However, ‘SetFileInformationByHandle’ doesn’t work when the source and destination are located on the different volumes.

My fix just falls back to calling ‘MoveFileExW’ when ‘SetFileInformationByHandle’ returns an error ‘ERROR_NOT_SAME_DEVICE’.

+ // Wine doesn’t support SetFileInformationByHandle in rename_internal.

+ // Rename_internal doesn’t work accross different disks. In both of these

+ // cases fall back to using MoveFileEx.

if (EC ==

- std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category())) {

- // Wine doesn’t support SetFileInformationByHandle in rename_internal.

- // Fall back to MoveFileEx.

+ std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category()) ||

+ EC == std::error_code(ERROR_NOT_SAME_DEVICE, std::system_category())) {

SmallVector<wchar_t, MAX_PATH> WideFrom;

if (std::error_code EC2 = realPathFromHandle(FromHandle, WideFrom))

return EC2;

if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),

- MOVEFILE_REPLACE_EXISTING))

+ MOVEFILE_COPY_ALLOWED | MOVEFILE_REPLACE_EXISTING))

return std::error_code();

return mapWindowsError(GetLastError());

}

Note, that when the source and destination are located on the different logical volumes, we will effectively perform a copy, followed by a delete, which are not atomic operations. Since copying to a different volume might be quite time consuming, we also increase the probability that some other process starts to rename to the same destination file.

So, this fix for ‘rename’ is not good for ThinLTO (we should do something else in this case). ‘rename’ is a generic function and has many different users, but unless renaming across the volumes is needed by someone else, other than ThinLTO, this patch shouldn’t be accepted.

Solution #2 (better)

Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in solution #1:

For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP% directories (or $TMP/$TEMP). We can create them in the same location where the cached files are stored or in a ‘temp’ subfolder within this folder. With this approach:

  • If the patch described in Solution #1 gets accepted (for the sake of other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.

  • If the patch doesn’t get accepted, then rename won’t fail, since we only rename the files within the same directory.

Solution #3

All the solutions for problem #3 (see below) will take care of problem #2.

Problem #3 (data race)

Look at the following code in ThinLTOCodeGenerator.cpp. This code gets executed if the ‘rename’ function failed (e.g., because of the problem #1 or problem #2 described above).

EC = sys::fs::rename(TempFilename, EntryPath);

if (EC) {

sys::fs::remove(TempFilename);

raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);

if (EC)

report_fatal_error(Twine("Failed to open ") + EntryPath +

" to save cached entry\n");

OS << OutputBuffer.getBuffer(); // Two ThinLTO processes can write to the same file here

// causing data race.

Here, we have a data race problem. We actually stumbled across it while testing one of the games with ThinLTO+caching.

We need to find a generic solution to solve data race problem, while ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert and I might not be aware of some ThinLTO constraints. Let me know which problems you see with the two solutions below. Hopefully, it will trigger a further discussion.

Solution #0 (it won’t require much modifications):

(a) If ‘rename’ fails, do not write into the cache directly (we don’t want to have non-atomic writes!).

(b) If ‘rename’ fails, try to read from the cache.

auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();

(c) If reading from the cache fails simply use the file that you just compiled directly (bypassing the cache entirely).

This solution might cause cache misses (causing recompilations), but at least it should prevent data races when two ThinLTO processes write to the same file (which can produce invalid cache!). Correctness is more important than optimization. It’s worth noticing another shortage of this solution: unlike generic solution #1, #2 and #3 described below this particular solution won’t solve the problem #4.

BTW, we should do something with WINE support in ‘rename’ (which is non-atomic). I didn’t want to list it as a separate problem here, but it is a problem. I could file a separate bug, if necessary.

Solution #1 (naïve generic solution):

(a) If the process wants to write to the cache file, it opens it with ‘exclusive read/write’ access.

(b) If a process wants to write to the cache file that is ‘exclusively’ opened by some other process, we could assume that the cache file will be successfully created by the first process and simply return from ‘write’ function. Different processes writing to the cache file of the same name, are writing the same content, right?

(c) If the process needs to read from the cache file, it might wait a bit while the file is open for ‘exclusive write’. If within a certain period the cache file doesn’t get released by a writer or gets removed by a pruner – oh, well, we have a hit miss. After all, using cache is just an acceleration. Correctness is more important.

(d) If we are concerned about data integrity of a cache file (e.g., a writer started to write to a file and something bad happened in the middle), we could do additional checks (I suspect that this might be already implemented in ThinLTO). A writer could add a unique hash at the end of the cache file or a CRC32 for each section of the file, which could be an indication that the write to this file was successful. A reader checks that this unique hash or a CRC32 checksum matches its expectations.

I’m sure that there are good reasons why this “naive” solution wasn’t implemented in the first place. If this solution is picked by LLVM community, I could write a working prototype for it that we could apply to ThinLTO.

Solution #2 (better generic solution):

It looks like cache file sharing between several ThinLTO processes is a classic readers-writers problem.

https://en.wikipedia.org/wiki/Readers%E2%80%93writers_problem

We could use read-write locks for global inter-process file sharing.

https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

ThinLTO’s writer (creator/deleter) must acquire a write lock before writing to a file and release a write lock when it’s done writing.

ThinLTO’s user (reader) must acquire a read lock before reading a file and release a read lock when it’s done reading. On a Unix-like system, read-write locks are part of POSIX-threads (pthreads). There is an implementation of Slim read-write locks on Windows, but unfortunately for in-process only (i.e., the locks can’t be shared among different processes), so it won’t work for us.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).aspx

However, on Windows, we could implement read-write locks ourselves, using a combination of a named mutex and a named semaphore (any unique global name could be used for creating a named mutex/semaphore, the simplest one will be the cache file name itself).

Here is an example of an implementation:

http://www.zachburlingame.com/2011/06/simple-reader-writer-lock-in-c-on-win32/

If this solution is picked, I could write a working prototype for it that we could apply to ThinLTO.

Solution #3 (hybrid generic solution)

Basically, use solution #2 as a base and integrate some optimizations and completeness checking features from solution #1 (1.b and 1.d respectively).

Problem #4 (cache misses could have been avoided via synchronization).

With our current approach we might often race for the cache file resources and as a result have cache misses.

Of course, it’s not as dangerous as the data race problem #3, but it definitely lowers the efficiency of caching.

(a) The reader checks that a file exists, but then the pruner deletes it before the reader had a chance to read it. Cache miss.

(b) The reader starts reading a file (block by block) and at this time the pruner removes the file. The read might fail. This is OS-specific, but likely a cache miss.

(c) If the writer will rename a temp file into a file that is currently being read by a reader, I don’t expect anything good out of it (the behavior is OS-specific). In the best case scenario, the read will fail, in the worst one the reader will read the wrong content. So, it’s a cache miss or a correctness issue.

(d) This is not a very likely event, but what if two processes are trying to rename to the same file at the same time?

Solution:

Solutions #1, #2 and #3 for problem #3 will take care of problem #4.

(Potential) problem #5:

The implementation of the function ‘rename’ on Windows used by ThinLTO heavily depends on the function CreateFileW or, to be more exact, on its flag FILE_SHARE_DELETE being supported and working correctly on all the file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag supported on all non-native Windows file systems that we care about? Is it supported on WINE? On FAT32?

If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we have problem #3.

Solution:

All the solutions for problem #3 will take care of problem #5.

Please let me know what do you think about the problems/solutions discussed here. Hopefully this pre-RFC shapes into a RFC soon.

Thank you!

Katya.

Hi Katya

Thanks for investigating this. Here is my thought inline.

Hello,

I am sending the following proposal to discuss issues and solutions regarding data races in concurrent ThinLTO processes.

This caught my attention when we encountered a race condition in ThinLTO with caching.
While looking into how ThinLTO deals with the problem of cache files reads/writes/deletes I spotted a lot of problems: some of them are related to data races, others - to file/resource sharing between different processes. I wanted to point out these problems and start the discussion about potential solutions. I would like to get your feedback.

Problem #1: In certain common situations, ‘rename’ can fail, causing a data race later on.
Look at the following code in the function ‘write’ in ThinLTOCodeGenerator.cpp

std::error_code EC =
  sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD, TempFilename);
if (EC) {
  errs() << "Error: " << EC.message() << "\n";
  report_fatal_error("ThinLTO: Can't get a temporary file");
}
{
  raw_fd_ostream OS(TempFD, /* ShouldClose */ true);
  OS << OutputBuffer.getBuffer();
}
// Rename to final destination (hopefully race condition won't matter here)
EC = sys::fs::rename(TempFilename, EntryPath);

Compared to a race-prone direct write of a buffer to a file, this code avoids a data race by first writing a buffer to a temp file and then renaming that temp file to become the final file in the cache. After r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees atomicity of writing a file to the cache,
(except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).

However, ‘rename’ might fail and return an error code in several different situations. If this happens, we will fall back to a direct write to the file, which is neither atomic, nor avoids race conditions (causing problem #3).
An example where 'rename' can fail on both Windows and Unix-like systems, causing us to fall back to using non-atomic write, is problem #2.

Solution:
All solutions for problem #3 (see below) will take care of problem #1.

I don't think there is a race here but it can probably be optimized. I don't see LTOCodeGenerator fall back to non-atomic write in the code. If the rename failed, it will just fatal_error and exit. Maybe a better solution is just leave the temp file and use it as output. The file will be recompiled if needed again (turns into a cache miss).

Problem #2 (Use of POSIX-compliant ‘rename’ function is not always suitable for ThinLTO)
We just encountered this issue in Sony. For ThinLTO, we use the function ‘rename’, which after r315079 doesn’t support renaming across the different logical volumes on Windows.
Let’s say a user has a %TEMP% directory on volume C:\, but he/she passed the caching directory name located on volume D:\, then ‘rename’ fails. Since we don’t support renaming across different volumes after r315079, we then fall back to writing to the cache file directly (which is not atomic) and hit a data race.
We anticipate that this will be a common situation for our users, many of whom are Windows “power users” and have multiple disks for different purposes.

I think this problem doesn’t only affect Windows. The same issue might happen on Linux/MacOS, if the user's $TEMP/$TMP directory is located in a different file system than the user-specified caching directory.

Solution #1 (not so good):
The patch below solves this problem on Windows, however, we will lose the advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is not good.

This patch assumes that the function 'rename' in Windows/Path.inc is expected to work when the source and destination are located of different logical volumes or different file systems. Note that function ‘rename’ is not ThinLTO-specific and might have some other consumers who want this behavior (which was the behavior before r315079). However, it seems that this assumption has changed later to make ‘rename’ more POSIX-compliant. In this case, we should add a comment to the function 'rename' so that its users knew that renaming across different volumes is not supported by design. Or we could have two different functions, one POSIX-compliant, not allowing renames across different volumes, another non-POSIX-compliant.

Before r315079, the function 'rename', (in the case when the destination file isn't opened), boiled down to calling the function 'MoveFileExW' with the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the volumes.
The current implementation of the function ‘rename’ (in 'rename_internal'), essentially calls 'SetFileInformationByHandle'. This function is fast and atomic, so in a sense, it's an improvement over the previously used 'MoveFileExW'. However, 'SetFileInformationByHandle' doesn't work when the source and destination are located on the different volumes.

My fix just falls back to calling 'MoveFileExW' when 'SetFileInformationByHandle' returns an error 'ERROR_NOT_SAME_DEVICE'.

+ // Wine doesn't support SetFileInformationByHandle in rename_internal.
+ // Rename_internal doesn't work accross different disks. In both of these
+ // cases fall back to using MoveFileEx.
     if (EC ==
- std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category())) {
- // Wine doesn't support SetFileInformationByHandle in rename_internal.
- // Fall back to MoveFileEx.
+ std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category()) ||
+ EC == std::error_code(ERROR_NOT_SAME_DEVICE, std::system_category())) {
       SmallVector<wchar_t, MAX_PATH> WideFrom;
       if (std::error_code EC2 = realPathFromHandle(FromHandle, WideFrom))
         return EC2;
       if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),
- MOVEFILE_REPLACE_EXISTING))
+ MOVEFILE_COPY_ALLOWED | MOVEFILE_REPLACE_EXISTING))
         return std::error_code();
       return mapWindowsError(GetLastError());
     }

Note, that when the source and destination are located on the different logical volumes, we will effectively perform a copy, followed by a delete, which are not atomic operations. Since copying to a different volume might be quite time consuming, we also increase the probability that some other process starts to rename to the same destination file.

So, this fix for ‘rename’ is not good for ThinLTO (we should do something else in this case). ‘rename’ is a generic function and has many different users, but unless renaming across the volumes is needed by someone else, other than ThinLTO, this patch shouldn’t be accepted.

Solution #2 (better)
Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in solution #1:

For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP% directories (or $TMP/$TEMP). We can create them in the same location where the cached files are stored or in a ‘temp’ subfolder within this folder. With this approach:
- If the patch described in Solution #1 gets accepted (for the sake of other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.
- If the patch doesn’t get accepted, then rename won’t fail, since we only rename the files within the same directory.

Solution #3
All the solutions for problem #3 (see below) will take care of problem #2.

I am surprised that we are not doing #2 already. Rename across the volumes is not atomic thus it can cause issues.

Problem #3 (data race)
Look at the following code in ThinLTOCodeGenerator.cpp. This code gets executed if the ‘rename’ function failed (e.g., because of the problem #1 or problem #2 described above).

EC = sys::fs::rename(TempFilename, EntryPath);
if (EC) {
  sys::fs::remove(TempFilename);
  raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);
  if (EC)
    report_fatal_error(Twine("Failed to open ") + EntryPath +
                       " to save cached entry\n");
  OS << OutputBuffer.getBuffer(); // Two ThinLTO processes can write to the same file here
                                             // causing data race.

Here, we have a data race problem. We actually stumbled across it while testing one of the games with ThinLTO+caching.

We need to find a generic solution to solve data race problem, while ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert and I might not be aware of some ThinLTO constraints. Let me know which problems you see with the two solutions below. Hopefully, it will trigger a further discussion.

Solution #0 (it won’t require much modifications):
(a) If ‘rename’ fails, do not write into the cache directly (we don’t want to have non-atomic writes!).
(b) If ‘rename’ fails, try to read from the cache.
auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
(c) If reading from the cache fails simply use the file that you just compiled directly (bypassing the cache entirely).

This solution might cause cache misses (causing recompilations), but at least it should prevent data races when two ThinLTO processes write to the same file (which can produce invalid cache!). Correctness is more important than optimization. It’s worth noticing another shortage of this solution: unlike generic solution #1, #2 and #3 described below this particular solution won’t solve the problem #4.

BTW, we should do something with WINE support in ‘rename’ (which is non-atomic). I didn’t want to list it as a separate problem here, but it is a problem. I could file a separate bug, if necessary.

Solution #1 (naïve generic solution):
(a) If the process wants to write to the cache file, it opens it with 'exclusive read/write' access.
(b) If a process wants to write to the cache file that is ‘exclusively’ opened by some other process, we could assume that the cache file will be successfully created by the first process and simply return from ‘write’ function. Different processes writing to the cache file of the same name, are writing the same content, right?
(c) If the process needs to read from the cache file, it might wait a bit while the file is open for 'exclusive write'. If within a certain period the cache file doesn’t get released by a writer or gets removed by a pruner – oh, well, we have a hit miss. After all, using cache is just an acceleration. Correctness is more important.
(d) If we are concerned about data integrity of a cache file (e.g., a writer started to write to a file and something bad happened in the middle), we could do additional checks (I suspect that this might be already implemented in ThinLTO). A writer could add a unique hash at the end of the cache file or a CRC32 for each section of the file, which could be an indication that the write to this file was successful. A reader checks that this unique hash or a CRC32 checksum matches its expectations.

I'm sure that there are good reasons why this "naive" solution wasn't implemented in the first place. If this solution is picked by LLVM community, I could write a working prototype for it that we could apply to ThinLTO.

Solution #2 (better generic solution):
It looks like cache file sharing between several ThinLTO processes is a classic readers-writers problem.
https://en.wikipedia.org/wiki/Readers–writers_problem

We could use read-write locks for global inter-process file sharing.
https://en.wikipedia.org/wiki/Readers–writer_lock

ThinLTO's writer (creator/deleter) must acquire a write lock before writing to a file and release a write lock when it's done writing.
ThinLTO’s user (reader) must acquire a read lock before reading a file and release a read lock when it's done reading. On a Unix-like system, read-write locks are part of POSIX-threads (pthreads). There is an implementation of Slim read-write locks on Windows, but unfortunately for in-process only (i.e., the locks can’t be shared among different processes), so it won’t work for us.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).aspx

However, on Windows, we could implement read-write locks ourselves, using a combination of a named mutex and a named semaphore (any unique global name could be used for creating a named mutex/semaphore, the simplest one will be the cache file name itself).

Here is an example of an implementation:
http://www.zachburlingame.com/2011/06/simple-reader-writer-lock-in-c-on-win32/

If this solution is picked, I could write a working prototype for it that we could apply to ThinLTO.

Solution #3 (hybrid generic solution)
Basically, use solution #2 as a base and integrate some optimizations and completeness checking features from solution #1 (1.b and 1.d respectively).

Problem #4 (cache misses could have been avoided via synchronization).

With our current approach we might often race for the cache file resources and as a result have cache misses.
Of course, it's not as dangerous as the data race problem #3, but it definitely lowers the efficiency of caching.

(a) The reader checks that a file exists, but then the pruner deletes it before the reader had a chance to read it. Cache miss.
(b) The reader starts reading a file (block by block) and at this time the pruner removes the file. The read might fail. This is OS-specific, but likely a cache miss.
(c) If the writer will rename a temp file into a file that is currently being read by a reader, I don't expect anything good out of it (the behavior is OS-specific). In the best case scenario, the read will fail, in the worst one the reader will read the wrong content. So, it's a cache miss or a correctness issue.
(d) This is not a very likely event, but what if two processes are trying to rename to the same file at the same time?

Solution:
Solutions #1, #2 and #3 for problem #3 will take care of problem #4.

(Potential) problem #5:
The implementation of the function ‘rename’ on Windows used by ThinLTO heavily depends on the function CreateFileW or, to be more exact, on its flag FILE_SHARE_DELETE being supported and working correctly on *all* the file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag supported on all non-native Windows file systems that we care about? Is it supported on WINE? On FAT32?

If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we have problem #3.

Solution:
All the solutions for problem #3 will take care of problem #5.

Please let me know what do you think about the problems/solutions discussed here. Hopefully this pre-RFC shapes into a RFC soon.

I am not really concerned with all the problems, except Problem #2. From my understanding, LTO cache is for incremental build only. For every file in the cache, there should be one single writer, otherwise, it is a hash collision and will result in miscompile (or link error most likely). The cached file will only be used when you rebuild the binary. Lots of data races are in situations that are not supported.

For its current design, you cannot shared the LTO cache between two linking. For example, if you are building llc and libLTO from LLVM and they shared many same static archives with common LTO object files, you still need to use two different LTO caches.

Steven

Hi Katya,

Please take a look at how caching is implemented in the new LTO API: [http://llvm-cs.pcc.me.uk/lib/LTO/Caching.cpp](http://llvm-cs.pcc.me.uk/lib/LTO/Caching.cpp)

I believe that these problems are solved there.

I would be happy to consider a patch porting this implementation to the legacy API.

Peter

Hi Steven,

Look at my replies inline (below your comments).

Katya.

Hi Steven,

Look at my replies inline (below your comments).

Katya.

*From:* stevenwu@apple.com <stevenwu@apple.com>
*Sent:* Thursday, March 22, 2018 4:46 PM
*To:* Romanova, Katya <katya.romanova@sony.com>
*Cc:* Teresa Johnson <tejohnson@google.com>; Mehdi AMINI <
joker.eph@gmail.com>; Rafael Avila de Espindola <
rafael.espindola@gmail.com>; Peter Collingbourne <peter@pcc.me.uk>; Hans
Wennborg <hans@chromium.org>; Reid Kleckner <rnk@google.com>;
bd1976llvm@gmail.com; llvm-dev@lists.llvm.org
*Subject:* Re: [pre-RFC] Data races in concurrent ThinLTO processes

Hi Katya

Thanks for investigating this. Here is my thought inline.

Hello,

I am sending the following proposal to discuss issues and solutions
regarding data races in concurrent ThinLTO processes.

This caught my attention when we encountered a race condition in ThinLTO
with caching.

While looking into how ThinLTO deals with the problem of cache files
reads/writes/deletes I spotted a lot of problems: some of them are related
to data races, others - to file/resource sharing between different
processes. I wanted to point out these problems and start the discussion
about potential solutions. I would like to get your feedback.

*Problem #1: In certain common situations, ‘rename’ can fail, causing a
data race later on.*

Look at the following code in the function ‘write’ in
ThinLTOCodeGenerator.cpp

*std::error_code EC =*

* sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD, TempFilename);*

*if (EC) {*

* errs() << "Error: " << EC.message() << "\n";*

* report_fatal_error("ThinLTO: Can't get a temporary file");*

*}*

*{*

* raw_fd_ostream OS(TempFD, /* ShouldClose */ true);*

* OS << OutputBuffer.getBuffer();*

*}*

*// Rename to final destination (hopefully race condition won't matter
here)*

*EC = sys::fs::rename(TempFilename, EntryPath); *

Compared to a race-prone direct write of a buffer to a file, this code
avoids a data race by first writing a buffer to a temp file and then
renaming that temp file to become the final file in the cache. After
r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees
atomicity of writing a file to the cache,

(except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).

However, ‘rename’ might fail and return an error code in several different
situations. If this happens, we will fall back to a direct write to the
file, which is neither atomic, nor avoids race conditions (causing problem
#3).

An example where 'rename' can fail on both Windows and Unix-like systems,
causing us to fall back to using non-atomic write, is problem #2.

*Solution:*

All solutions for problem #3 (see below) will take care of problem #1.

I don't think there is a race here but it can probably be optimized. I
don't see LTOCodeGenerator fall back to non-atomic write in the code. If
the rename failed, it will just fatal_error and exit. Maybe a better
solution is just leave the temp file and use it as output. The file will be
recompiled if needed again (turns into a cache miss).

I think there is a non-atomic write (and race) here. See below. If rename
failed, we remove the temp file and write to the cache file directly (OS <<
OutputBuffer.getBuffer()).

    // Rename to final destination (hopefully race condition won't matter
here)

    EC = sys::fs::rename(TempFilename, EntryPath);

    if (EC) {

      sys::fs::remove(TempFilename);

      raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);

      if (EC)

        report_fatal_error(Twine("Failed to open ") + EntryPath +

                           " to save cached entry\n");

      OS << OutputBuffer.getBuffer();

    }

  }

What we can do the following way:

Option 1: If the rename fails, we could simply use OutputBuffer (which has
the same content as our temp file), bypassing the cache altogether.

Option 2: We could try reading from the existing cache first, and then, if
this read fails, we could use OutputBuffer (in this case, we will be closer
to the existing code). I don’t understand why do we have to read from the
existing cache first (even if we just created this cache). I suspect this
was done for a reason. There is a comment below with the explanation, but
it doesn’t make it more clear to me. If this reason/explanation makes
sense, then we should do Option 2. This is exactly what I previously
proposed in Solution #0 for Problem #3.

       // Commit to the cache (if enabled)

        CacheEntry.write(*OutputBuffer);

        if (SavedObjectsDirectoryPath.empty()) {

          // We need to generated a memory buffer for the linker.

          if (!CacheEntryPath.empty()) {

            // Cache is enabled, reload from the cache

            // We do this to lower memory pressuree: the buffer is on the
heap

            // and releasing it frees memory that can be used for the next
input

            // file. The final binary link will read from the VFS cache

           // (hopefully!) or from disk if the memory pressure wasn't too
high. <- I don’t understand why we are reading from the
cache file (via tryLoadingBuffer), though we already have the same content
in OutputBuffer?

            auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
<-The explanation says this is done to reduce the memory pressure. But it’s
not clear to me why this helps to reduce memory pressure, since we still
have to keep

            if (auto EC = ReloadedBufferOrErr.getError())
{ <- the content of the
cache file in a buffer, pointed to by ReloadedBufferOrErr).

              // On error, keeping the preexisting buffer and printing a

              // diagnostic is more friendly than just crashing.

              errs() << "error: can't reload cached file '" <<
CacheEntryPath

                     << "': " << EC.message() << "\n";

            } else {

              OutputBuffer = std::move(*ReloadedBufferOrErr);

            }

          }

*Problem #2 (Use of POSIX-compliant ‘rename’ function is not always
suitable for ThinLTO)*

We just encountered this issue in Sony. For ThinLTO, we use the function
‘rename’, which after r315079 doesn’t support renaming across the different
logical volumes on Windows.

Let’s say a user has a %TEMP% directory on volume C:\, but he/she passed
the caching directory name located on volume D:\, then ‘rename’ fails.
Since we don’t support renaming across different volumes after r315079, we
then fall back to writing to the cache file directly (which is not atomic)
and hit a data race.

We anticipate that this will be a common situation for our users, many of
whom are Windows “power users” and have multiple disks for different
purposes.

I think this problem doesn’t only affect Windows. The same issue might
happen on Linux/MacOS, if the user's $TEMP/$TMP directory is located in a
different file system than the user-specified caching directory.

*Solution #1 (not so good):*

The patch below solves this problem on Windows, however, we will lose the
advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is
not good.

This patch assumes that the function 'rename' in Windows/Path.inc is
expected to work when the source and destination are located of different
logical volumes or different file systems. Note that function ‘rename’ is
not ThinLTO-specific and might have some other consumers who want this
behavior (which was the behavior before r315079). However, it seems that
this assumption has changed later to make ‘rename’ more POSIX-compliant. In
this case, we should add a comment to the function 'rename' so that its
users knew that renaming across different volumes is not supported by
design. Or we could have two different functions, one POSIX-compliant, not
allowing renames across different volumes, another non-POSIX-compliant.

Before r315079, the function 'rename', (in the case when the destination
file isn't opened), boiled down to calling the function 'MoveFileExW' with
the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the
volumes.

The current implementation of the function ‘rename’ (in
'rename_internal'), essentially calls 'SetFileInformationByHandle'. This
function is fast and atomic, so in a sense, it's an improvement over the
previously used 'MoveFileExW'. However, 'SetFileInformationByHandle'
doesn't work when the source and destination are located on the different
volumes.

My fix just falls back to calling 'MoveFileExW' when
'SetFileInformationByHandle' returns an error 'ERROR_NOT_SAME_DEVICE'.

*+ // Wine doesn't support SetFileInformationByHandle in
rename_internal.*

*+ // Rename_internal doesn't work accross different disks. In both of
these*

*+ // cases fall back to using MoveFileEx.*

* if (EC ==*

*- std::error_code(ERROR_CALL_NOT_IMPLEMENTED,
std::system_category())) {*

*- // Wine doesn't support SetFileInformationByHandle in
rename_internal.*

*- // Fall back to MoveFileEx.*

*+ std::error_code(ERROR_CALL_NOT_IMPLEMENTED,
std::system_category()) ||*

*+ EC == std::error_code(ERROR_NOT_SAME_DEVICE,
std::system_category())) {*

* SmallVector<wchar_t, MAX_PATH> WideFrom;*

* if (std::error_code EC2 = realPathFromHandle(FromHandle,
WideFrom))*

* return EC2;*

* if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),*

*- MOVEFILE_REPLACE_EXISTING))*

*+ MOVEFILE_COPY_ALLOWED |
MOVEFILE_REPLACE_EXISTING))*

* return std::error_code();*

* return mapWindowsError(GetLastError());*

* }*

Note, that when the source and destination are located on the different
logical volumes, we will effectively perform a copy, followed by a delete,
which are not atomic operations. Since copying to a different volume might
be quite time consuming, we also increase the probability that some other
process starts to rename to the same destination file.

So, this fix for ‘rename’ is not good for ThinLTO (we should do something
else in this case). ‘rename’ is a generic function and has many different
users, but unless renaming across the volumes is needed by someone else,
other than ThinLTO, this patch shouldn’t be accepted.

*Solution #2 (better)*

Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in
solution #1:

For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP%
directories (or $TMP/$TEMP). We can create them in the same location where
the cached files are stored or in a ‘temp’ subfolder within this folder.
With this approach:

- If the patch described in Solution #1 gets accepted (for the sake of
other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.

- If the patch doesn’t get accepted, then rename won’t fail, since we
only rename the files within the same directory.

*Solution #3*

All the solutions for problem #3 (see below) will take care of problem #2.

I am surprised that we are not doing #2 already. Rename across the volumes
is not atomic thus it can cause issues.

Yes, we are calling createTemporaryFile, which creates a file in the
location pointed by $TEMP, $TMP, etc.

        sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD,
TempFilename);

So, it looks like this problem is applicable to Unix-like systems too
(assuming that your $TEMP and cache directory are in the different file
systems).

*Problem #3 (data race)*

Look at the following code in ThinLTOCodeGenerator.cpp. This code gets
executed if the ‘rename’ function failed (e.g., because of the problem #1
or problem #2 described above).

*EC = sys::fs::rename(TempFilename, EntryPath);*

*if (EC) {*

* sys::fs::remove(TempFilename);*

* raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);*

* if (EC)*

* report_fatal_error(Twine("Failed to open ") + EntryPath +*

* " to save cached entry\n");*

* OS << OutputBuffer.getBuffer(); **// Two ThinLTO processes
can write to the same file here*

* // causing data race.*

Here, we have a data race problem. We actually stumbled across it while
testing one of the games with ThinLTO+caching.

We need to find a generic solution to solve data race problem, while
ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert
and I might not be aware of some ThinLTO constraints. Let me know which
problems you see with the two solutions below. Hopefully, it will trigger a
further discussion.

*Solution #0 (it won’t require much modifications):*

(a) If ‘rename’ fails, do not write into the cache directly (we don’t
want to have non-atomic writes!).

(b) If ‘rename’ fails, try to read from the cache.

*auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();*

*(c)* If reading from the cache fails simply use the file that you just
compiled directly (bypassing the cache entirely).

This solution might cause cache misses (causing recompilations), but at
least it should prevent data races when two ThinLTO processes write to the
same file (which can produce invalid cache!). Correctness is more important
than optimization. It’s worth noticing another shortage of this solution:
unlike generic solution #1, #2 and #3 described below this particular
solution won’t solve the problem #4.

BTW, we should do something with WINE support in ‘rename’ (which is
non-atomic). I didn’t want to list it as a separate problem here, but it is
a problem. I could file a separate bug, if necessary.

*Solution #1 (naïve generic solution):*

(a) If the process wants to write to the cache file, it opens it with
'exclusive read/write' access.

(b) If a process wants to write to the cache file that is ‘exclusively’
opened by some other process, we could assume that the cache file will be
successfully created by the first process and simply return from ‘write’
function. Different processes writing to the cache file of the same name,
are writing the same content, right?

(c) If the process needs to read from the cache file, it might wait a bit
while the file is open for 'exclusive write'. If within a certain period
the cache file doesn’t get released by a writer or gets removed by a pruner
– oh, well, we have a hit miss. After all, using cache is just an
acceleration. Correctness is more important.

(d) If we are concerned about data integrity of a cache file (e.g., a
writer started to write to a file and something bad happened in the
middle), we could do additional checks (I suspect that this might be
already implemented in ThinLTO). A writer could add a unique hash at the
end of the cache file or a CRC32 for each section of the file, which could
be an indication that the write to this file was successful. A reader
checks that this unique hash or a CRC32 checksum matches its expectations.

I'm sure that there are good reasons why this "naive" solution wasn't
implemented in the first place. If this solution is picked by LLVM
community, I could write a working prototype for it that we could apply to
ThinLTO.

*Solution #2 (better generic solution):*

It looks like cache file sharing between several ThinLTO processes is a
classic readers-writers problem.

https://en.wikipedia.org/wiki/Readers–writers_problem

We could use read-write locks for global inter-process file sharing.

https://en.wikipedia.org/wiki/Readers–writer_lock

ThinLTO's writer (creator/deleter) must acquire a write lock before
writing to a file and release a write lock when it's done writing.

ThinLTO’s user (reader) must acquire a read lock before reading a file and
release a read lock when it's done reading. On a Unix-like system,
read-write locks are part of POSIX-threads (pthreads). There is an
implementation of Slim read-write locks on Windows, but unfortunately for
in-process only (i.e., the locks can’t be shared among different
processes), so it won’t work for us.

https://msdn.microsoft.com/en-us/library/windows/desktop/
aa904937(v=vs.85).aspx

However, on Windows, we could implement read-write locks ourselves, using
a combination of a named mutex and a named semaphore (any unique global
name could be used for creating a named mutex/semaphore, the simplest one
will be the cache file name itself).

Here is an example of an implementation:

http://www.zachburlingame.com/2011/06/simple-reader-writer-
lock-in-c-on-win32/

If this solution is picked, I could write a working prototype for it that
we could apply to ThinLTO.

*Solution #3 (hybrid generic solution)*

Basically, use solution #2 as a base and integrate some optimizations and
completeness checking features from solution #1 (1.b and 1.d respectively).

*Problem #4 (cache misses could have been avoided via synchronization).*

With our current approach we might often race for the cache file resources
and as a result have cache misses.

Of course, it's not as dangerous as the data race problem #3, but it
definitely lowers the efficiency of caching.

(a) The reader checks that a file exists, but then the pruner deletes it
before the reader had a chance to read it. Cache miss.

(b) The reader starts reading a file (block by block) and at this time the
pruner removes the file. The read might fail. This is OS-specific, but
likely a cache miss.

(c) If the writer will rename a temp file into a file that is currently
being read by a reader, I don't expect anything good out of it (the
behavior is OS-specific). In the best case scenario, the read will fail, in
the worst one the reader will read the wrong content. So, it's a cache miss
or a correctness issue.

(d) This is not a very likely event, but what if two processes are trying
to rename to the same file at the same time?

*Solution:*

Solutions #1, #2 and #3 for problem #3 will take care of problem #4.

*(Potential) problem #5:*

The implementation of the function ‘rename’ on Windows used by ThinLTO
heavily depends on the function CreateFileW or, to be more exact, on its
flag FILE_SHARE_DELETE being supported and working correctly on *all* the
file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag
supported on all non-native Windows file systems that we care about? Is it
supported on WINE? On FAT32?

If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we
have problem #3.

*Solution:*

All the solutions for problem #3 will take care of problem #5.

Please let me know what do you think about the problems/solutions
discussed here. Hopefully this pre-RFC shapes into a RFC soon.

I am not really concerned with all the problems, except Problem #2. From
my understanding, LTO cache is for incremental build only.

Yes.

For every file in the cache, there should be one single writer, otherwise,
it is a hash collision and will result in miscompile (or link error most
likely). The cached file will only be used when you rebuild the binary.
Lots of data races are in situations that are not supported.

Are you aware of any other data races when we share LTO cache between two
links, except the one that we identified above? Please let me know. I’d
rather know all the potential problems/limitation in advance.

For its current design, you cannot shared the LTO cache between two
linking. For example, if you are building llc and libLTO from LLVM and they
shared many same static archives with common LTO object files, you still
need to use two different LTO caches.

But what if we fix this particular problem? We could either do it the way
Peter implemented it for the new LTO API or the way we discussed above. I
assume that after fixing this, will we be able to share the same cache
between 2 links.

The question is, how exactly we should fix it for C LTO API? I’m aware of
only two major C LTO API stakeholders (Apple and Sony), so most likely
nobody else cares much about legacy C API. Peter generously proposed to
port the non-racy algorithm from new LTO API to the legacy C API.

What I proposed was that I would review such a patch, not that I would
write one. Apologies for the miscommunication.

Peter

I looked into new LTO API today and it looks that the data race is fixed

Hi Steven,
Look at my replies inline (below your comments).
Katya.

From: stevenwu@apple.com <mailto:stevenwu@apple.com> <stevenwu@apple.com <mailto:stevenwu@apple.com>>
Sent: Thursday, March 22, 2018 4:46 PM
To: Romanova, Katya <katya.romanova@sony.com <mailto:katya.romanova@sony.com>>
Subject: Re: [pre-RFC] Data races in concurrent ThinLTO processes

Hi Katya

Thanks for investigating this. Here is my thought inline.

Hello,

I am sending the following proposal to discuss issues and solutions regarding data races in concurrent ThinLTO processes.

This caught my attention when we encountered a race condition in ThinLTO with caching.
While looking into how ThinLTO deals with the problem of cache files reads/writes/deletes I spotted a lot of problems: some of them are related to data races, others - to file/resource sharing between different processes. I wanted to point out these problems and start the discussion about potential solutions. I would like to get your feedback.

Problem #1: In certain common situations, ‘rename’ can fail, causing a data race later on.
Look at the following code in the function ‘write’ in ThinLTOCodeGenerator.cpp

std::error_code EC =
  sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD, TempFilename);
if (EC) {
  errs() << "Error: " << EC.message() << "\n";
  report_fatal_error("ThinLTO: Can't get a temporary file");
}
{
  raw_fd_ostream OS(TempFD, /* ShouldClose */ true);
  OS << OutputBuffer.getBuffer();
}
// Rename to final destination (hopefully race condition won't matter here)
EC = sys::fs::rename(TempFilename, EntryPath);

Compared to a race-prone direct write of a buffer to a file, this code avoids a data race by first writing a buffer to a temp file and then renaming that temp file to become the final file in the cache. After r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees atomicity of writing a file to the cache,
(except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).

However, ‘rename’ might fail and return an error code in several different situations. If this happens, we will fall back to a direct write to the file, which is neither atomic, nor avoids race conditions (causing problem #3).
An example where 'rename' can fail on both Windows and Unix-like systems, causing us to fall back to using non-atomic write, is problem #2.

Solution:
All solutions for problem #3 (see below) will take care of problem #1.

I don't think there is a race here but it can probably be optimized. I don't see LTOCodeGenerator fall back to non-atomic write in the code. If the rename failed, it will just fatal_error and exit. Maybe a better solution is just leave the temp file and use it as output. The file will be recompiled if needed again (turns into a cache miss).

I think there is a non-atomic write (and race) here. See below. If rename failed, we remove the temp file and write to the cache file directly (OS << OutputBuffer.getBuffer()).

    // Rename to final destination (hopefully race condition won't matter here)
    EC = sys::fs::rename(TempFilename, EntryPath);
    if (EC) {
      sys::fs::remove(TempFilename);
      raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);
      if (EC)
        report_fatal_error(Twine("Failed to open ") + EntryPath +
                           " to save cached entry\n");
      OS << OutputBuffer.getBuffer();
    }
  }

What we can do the following way:
Option 1: If the rename fails, we could simply use OutputBuffer (which has the same content as our temp file), bypassing the cache altogether.

Option 1 is much better for the current model. To be safe, we should avoid non-atomic write (which is directly writing the cache file) to the cache. When we perform such write, the cache is in an invalid state and we have nothing to prevent reading from it.

Option 2: We could try reading from the existing cache first, and then, if this read fails, we could use OutputBuffer (in this case, we will be closer to the existing code). I don’t understand why do we have to read from the existing cache first (even if we just created this cache). I suspect this was done for a reason. There is a comment below with the explanation, but it doesn’t make it more clear to me. If this reason/explanation makes sense, then we should do Option 2. This is exactly what I previously proposed in Solution #0 for Problem #3.

       // Commit to the cache (if enabled)
        CacheEntry.write(*OutputBuffer);

        if (SavedObjectsDirectoryPath.empty()) {
          // We need to generated a memory buffer for the linker.
          if (!CacheEntryPath.empty()) {
            // Cache is enabled, reload from the cache
            // We do this to lower memory pressuree: the buffer is on the heap
            // and releasing it frees memory that can be used for the next input
            // file. The final binary link will read from the VFS cache
           // (hopefully!) or from disk if the memory pressure wasn't too high. <- I don’t understand why we are reading from the cache file (via tryLoadingBuffer), though we already have the same content in OutputBuffer?
            auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer(); <-The explanation says this is done to reduce the memory pressure. But it’s not clear to me why this helps to reduce memory pressure, since we still have to keep
            if (auto EC = ReloadedBufferOrErr.getError()) { <- the content of the cache file in a buffer, pointed to by ReloadedBufferOrErr).
              // On error, keeping the preexisting buffer and printing a
              // diagnostic is more friendly than just crashing.
              errs() << "error: can't reload cached file '" << CacheEntryPath
                     << "': " << EC.message() << "\n";
            } else {
              OutputBuffer = std::move(*ReloadedBufferOrErr);
            }
          }

Problem #2 (Use of POSIX-compliant ‘rename’ function is not always suitable for ThinLTO)
We just encountered this issue in Sony. For ThinLTO, we use the function ‘rename’, which after r315079 doesn’t support renaming across the different logical volumes on Windows.
Let’s say a user has a %TEMP% directory on volume C:\, but he/she passed the caching directory name located on volume D:\, then ‘rename’ fails. Since we don’t support renaming across different volumes after r315079, we then fall back to writing to the cache file directly (which is not atomic) and hit a data race.
We anticipate that this will be a common situation for our users, many of whom are Windows “power users” and have multiple disks for different purposes.

I think this problem doesn’t only affect Windows. The same issue might happen on Linux/MacOS, if the user's $TEMP/$TMP directory is located in a different file system than the user-specified caching directory.

Solution #1 (not so good):
The patch below solves this problem on Windows, however, we will lose the advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is not good.

This patch assumes that the function 'rename' in Windows/Path.inc is expected to work when the source and destination are located of different logical volumes or different file systems. Note that function ‘rename’ is not ThinLTO-specific and might have some other consumers who want this behavior (which was the behavior before r315079). However, it seems that this assumption has changed later to make ‘rename’ more POSIX-compliant. In this case, we should add a comment to the function 'rename' so that its users knew that renaming across different volumes is not supported by design. Or we could have two different functions, one POSIX-compliant, not allowing renames across different volumes, another non-POSIX-compliant.

Before r315079, the function 'rename', (in the case when the destination file isn't opened), boiled down to calling the function 'MoveFileExW' with the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the volumes.
The current implementation of the function ‘rename’ (in 'rename_internal'), essentially calls 'SetFileInformationByHandle'. This function is fast and atomic, so in a sense, it's an improvement over the previously used 'MoveFileExW'. However, 'SetFileInformationByHandle' doesn't work when the source and destination are located on the different volumes.

My fix just falls back to calling 'MoveFileExW' when 'SetFileInformationByHandle' returns an error 'ERROR_NOT_SAME_DEVICE'.

+ // Wine doesn't support SetFileInformationByHandle in rename_internal.
+ // Rename_internal doesn't work accross different disks. In both of these
+ // cases fall back to using MoveFileEx.
     if (EC ==
- std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category())) {
- // Wine doesn't support SetFileInformationByHandle in rename_internal.
- // Fall back to MoveFileEx.
+ std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category()) ||
+ EC == std::error_code(ERROR_NOT_SAME_DEVICE, std::system_category())) {
       SmallVector<wchar_t, MAX_PATH> WideFrom;
       if (std::error_code EC2 = realPathFromHandle(FromHandle, WideFrom))
         return EC2;
       if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),
- MOVEFILE_REPLACE_EXISTING))
+ MOVEFILE_COPY_ALLOWED | MOVEFILE_REPLACE_EXISTING))
         return std::error_code();
       return mapWindowsError(GetLastError());
     }

Note, that when the source and destination are located on the different logical volumes, we will effectively perform a copy, followed by a delete, which are not atomic operations. Since copying to a different volume might be quite time consuming, we also increase the probability that some other process starts to rename to the same destination file.

So, this fix for ‘rename’ is not good for ThinLTO (we should do something else in this case). ‘rename’ is a generic function and has many different users, but unless renaming across the volumes is needed by someone else, other than ThinLTO, this patch shouldn’t be accepted.

Solution #2 (better)
Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in solution #1:

For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP% directories (or $TMP/$TEMP). We can create them in the same location where the cached files are stored or in a ‘temp’ subfolder within this folder. With this approach:
- If the patch described in Solution #1 gets accepted (for the sake of other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.
- If the patch doesn’t get accepted, then rename won’t fail, since we only rename the files within the same directory.

Solution #3
All the solutions for problem #3 (see below) will take care of problem #2.

I am surprised that we are not doing #2 already. Rename across the volumes is not atomic thus it can cause issues.

Yes, we are calling createTemporaryFile, which creates a file in the location pointed by $TEMP, $TMP, etc.
        sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD, TempFilename);
So, it looks like this problem is applicable to Unix-like systems too (assuming that your $TEMP and cache directory are in the different file systems).

I don't think there is a file system that supports atomic rename between volumes. This is something we can fix easily.

Problem #3 (data race)
Look at the following code in ThinLTOCodeGenerator.cpp. This code gets executed if the ‘rename’ function failed (e.g., because of the problem #1 or problem #2 described above).

EC = sys::fs::rename(TempFilename, EntryPath);
if (EC) {
  sys::fs::remove(TempFilename);
  raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);
  if (EC)
    report_fatal_error(Twine("Failed to open ") + EntryPath +
                       " to save cached entry\n");
  OS << OutputBuffer.getBuffer(); // Two ThinLTO processes can write to the same file here
                                             // causing data race.

Here, we have a data race problem. We actually stumbled across it while testing one of the games with ThinLTO+caching.

We need to find a generic solution to solve data race problem, while ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert and I might not be aware of some ThinLTO constraints. Let me know which problems you see with the two solutions below. Hopefully, it will trigger a further discussion.

Solution #0 (it won’t require much modifications):
(a) If ‘rename’ fails, do not write into the cache directly (we don’t want to have non-atomic writes!).
(b) If ‘rename’ fails, try to read from the cache.
auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
(c) If reading from the cache fails simply use the file that you just compiled directly (bypassing the cache entirely).

This solution might cause cache misses (causing recompilations), but at least it should prevent data races when two ThinLTO processes write to the same file (which can produce invalid cache!). Correctness is more important than optimization. It’s worth noticing another shortage of this solution: unlike generic solution #1, #2 and #3 described below this particular solution won’t solve the problem #4.

BTW, we should do something with WINE support in ‘rename’ (which is non-atomic). I didn’t want to list it as a separate problem here, but it is a problem. I could file a separate bug, if necessary.

Solution #1 (naïve generic solution):
(a) If the process wants to write to the cache file, it opens it with 'exclusive read/write' access.
(b) If a process wants to write to the cache file that is ‘exclusively’ opened by some other process, we could assume that the cache file will be successfully created by the first process and simply return from ‘write’ function. Different processes writing to the cache file of the same name, are writing the same content, right?
(c) If the process needs to read from the cache file, it might wait a bit while the file is open for 'exclusive write'. If within a certain period the cache file doesn’t get released by a writer or gets removed by a pruner – oh, well, we have a hit miss. After all, using cache is just an acceleration. Correctness is more important.
(d) If we are concerned about data integrity of a cache file (e.g., a writer started to write to a file and something bad happened in the middle), we could do additional checks (I suspect that this might be already implemented in ThinLTO). A writer could add a unique hash at the end of the cache file or a CRC32 for each section of the file, which could be an indication that the write to this file was successful. A reader checks that this unique hash or a CRC32 checksum matches its expectations.

I'm sure that there are good reasons why this "naive" solution wasn't implemented in the first place. If this solution is picked by LLVM community, I could write a working prototype for it that we could apply to ThinLTO.

Solution #2 (better generic solution):
It looks like cache file sharing between several ThinLTO processes is a classic readers-writers problem.
https://en.wikipedia.org/wiki/Readers–writers_problem

We could use read-write locks for global inter-process file sharing.
https://en.wikipedia.org/wiki/Readers–writer_lock

ThinLTO's writer (creator/deleter) must acquire a write lock before writing to a file and release a write lock when it's done writing.
ThinLTO’s user (reader) must acquire a read lock before reading a file and release a read lock when it's done reading. On a Unix-like system, read-write locks are part of POSIX-threads (pthreads). There is an implementation of Slim read-write locks on Windows, but unfortunately for in-process only (i.e., the locks can’t be shared among different processes), so it won’t work for us.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).aspx

However, on Windows, we could implement read-write locks ourselves, using a combination of a named mutex and a named semaphore (any unique global name could be used for creating a named mutex/semaphore, the simplest one will be the cache file name itself).

Here is an example of an implementation:
http://www.zachburlingame.com/2011/06/simple-reader-writer-lock-in-c-on-win32/

If this solution is picked, I could write a working prototype for it that we could apply to ThinLTO.

Solution #3 (hybrid generic solution)
Basically, use solution #2 as a base and integrate some optimizations and completeness checking features from solution #1 (1.b and 1.d respectively).

Problem #4 (cache misses could have been avoided via synchronization).

With our current approach we might often race for the cache file resources and as a result have cache misses.
Of course, it's not as dangerous as the data race problem #3, but it definitely lowers the efficiency of caching.

(a) The reader checks that a file exists, but then the pruner deletes it before the reader had a chance to read it. Cache miss.
(b) The reader starts reading a file (block by block) and at this time the pruner removes the file. The read might fail. This is OS-specific, but likely a cache miss.
(c) If the writer will rename a temp file into a file that is currently being read by a reader, I don't expect anything good out of it (the behavior is OS-specific). In the best case scenario, the read will fail, in the worst one the reader will read the wrong content. So, it's a cache miss or a correctness issue.
(d) This is not a very likely event, but what if two processes are trying to rename to the same file at the same time?

Solution:
Solutions #1, #2 and #3 for problem #3 will take care of problem #4.

(Potential) problem #5:
The implementation of the function ‘rename’ on Windows used by ThinLTO heavily depends on the function CreateFileW or, to be more exact, on its flag FILE_SHARE_DELETE being supported and working correctly on *all* the file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag supported on all non-native Windows file systems that we care about? Is it supported on WINE? On FAT32?

If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we have problem #3.

Solution:
All the solutions for problem #3 will take care of problem #5.

Please let me know what do you think about the problems/solutions discussed here. Hopefully this pre-RFC shapes into a RFC soon.

I am not really concerned with all the problems, except Problem #2. From my understanding, LTO cache is for incremental build only.
Yes.

For every file in the cache, there should be one single writer, otherwise, it is a hash collision and will result in miscompile (or link error most likely). The cached file will only be used when you rebuild the binary. Lots of data races are in situations that are not supported.
Are you aware of any other data races when we share LTO cache between two links, except the one that we identified above? Please let me know. I’d rather know all the potential problems/limitation in advance.

The most common issue you are going to hit if you try to share LTO cache is that the file can be removed/updated when the other process is half way reading the file. I can consistently reproduce the issue if you try to build all LLVM/Clang tools with a shared LTO cache.

For its current design, you cannot shared the LTO cache between two linking. For example, if you are building llc and libLTO from LLVM and they shared many same static archives with common LTO object files, you still need to use two different LTO caches.
But what if we fix this particular problem? We could either do it the way Peter implemented it for the new LTO API or the way we discussed above. I assume that after fixing this, will we be able to share the same cache between 2 links.

The question is, how exactly we should fix it for C LTO API? I’m aware of only two major C LTO API stakeholders (Apple and Sony), so most likely nobody else cares much about legacy C API. Peter generously proposed to port the non-racy algorithm from new LTO API to the legacy C API. I looked into new LTO API today and it looks that the data race is fixed there, though the code is more heavy C++ and will be harder to read. Alternatively, I could write a simpler patch for legacy C LTO API, doing what we discussed above (i.e. creating temp files in cache directory and if renaming of the temp into a cache file failed, bypass reading from the cache file).

What do you think?

I haven't read the C++ API implementation but I am all up to improve C LTO API to allow sharing. If nothing else, we can get faster ThinLTO build time for LLVM.

Steven

Hi Peter,

Thank you for the clarification J.

I’m sure you have a very good understanding of how much efforts it will take to write a patch for legacy C LTO to implement caching the same way it’s done in new C++ LTO API. How easy/difficult do you think it will be (very roughly, in LOC)? Do you anticipate that a lot of existing legacy C LTO infrastructure will have to be rewritten? Could this also be done in 6.0 branch with reasonable amount of efforts? We live downstream, so for us the work needs to be done in 6.0 branch. I want to make sure that all the necessary required patches for new C++ LTO API (related to caching) are already there in 6.0 branch or at least it will take a reasonable enough to cherry-pick them from ToT.

I could implement a pretty straightforward patch in the existing legacy C LTO code, which will do the same (at least with respect of caching/fixing data races problems) as the new C++ LTO API. Namely, (a) create a temp file in the same directory as the cache directory; (b) if renaming fails, do not do direct write to the cache file, just exit; (c) if renaming fails, use the buffer that we planned to write to the cache file, and bypass reading from cache altogether. Will this work? Does new C++ LTO API has other advantages or fixes other issues with respect of caching/avoiding data races?

The reason why I’m asking all these questions is because I need to decide whether I should write my own straightforward patch or to change legacy C LTO API to do what new C++ LTO API is doing (which I suspect will much more work). I appreciate your and Steven’s feedback/comments/advice that could help me to do this decision. Thank you!

Katya.

Why do we need to use two different caches?
In my memory a ThinLTO build of LLVM does use a single cache and share and reuse objects across binaries if possible.

Best,

Hi Katya

Thanks for investigating this. Here is my thought inline.

Hello,

I am sending the following proposal to discuss issues and solutions regarding data races in concurrent ThinLTO processes.

This caught my attention when we encountered a race condition in ThinLTO with caching.
While looking into how ThinLTO deals with the problem of cache files reads/writes/deletes I spotted a lot of problems: some of them are related to data races, others - to file/resource sharing between different processes. I wanted to point out these problems and start the discussion about potential solutions. I would like to get your feedback.

Problem #1: In certain common situations, ‘rename’ can fail, causing a data race later on.
Look at the following code in the function ‘write’ in ThinLTOCodeGenerator.cpp

std::error_code EC =
sys::fs::createTemporaryFile(“Thin”, “tmp.o”, TempFD, TempFilename);
if (EC) {
errs() << "Error: " << EC.message() << “\n”;
report_fatal_error(“ThinLTO: Can’t get a temporary file”);
}
{
raw_fd_ostream OS(TempFD, /* ShouldClose */ true);
OS << OutputBuffer.getBuffer();
}
// Rename to final destination (hopefully race condition won’t matter here)
EC = sys::fs::rename(TempFilename, EntryPath);

Compared to a race-prone direct write of a buffer to a file, this code avoids a data race by first writing a buffer to a temp file and then renaming that temp file to become the final file in the cache. After r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees atomicity of writing a file to the cache,
(except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).

However, ‘rename’ might fail and return an error code in several different situations. If this happens, we will fall back to a direct write to the file, which is neither atomic, nor avoids race conditions (causing problem #3).
An example where ‘rename’ can fail on both Windows and Unix-like systems, causing us to fall back to using non-atomic write, is problem #2.

Solution:
All solutions for problem #3 (see below) will take care of problem #1.

I don’t think there is a race here but it can probably be optimized. I don’t see LTOCodeGenerator fall back to non-atomic write in the code. If the rename failed, it will just fatal_error and exit. Maybe a better solution is just leave the temp file and use it as output. The file will be recompiled if needed again (turns into a cache miss).

Problem #2 (Use of POSIX-compliant ‘rename’ function is not always suitable for ThinLTO)
We just encountered this issue in Sony. For ThinLTO, we use the function ‘rename’, which after r315079 doesn’t support renaming across the different logical volumes on Windows.
Let’s say a user has a %TEMP% directory on volume C:, but he/she passed the caching directory name located on volume D:, then ‘rename’ fails. Since we don’t support renaming across different volumes after r315079, we then fall back to writing to the cache file directly (which is not atomic) and hit a data race.
We anticipate that this will be a common situation for our users, many of whom are Windows “power users” and have multiple disks for different purposes.

I think this problem doesn’t only affect Windows. The same issue might happen on Linux/MacOS, if the user’s $TEMP/$TMP directory is located in a different file system than the user-specified caching directory.

Solution #1 (not so good):
The patch below solves this problem on Windows, however, we will lose the advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is not good.

This patch assumes that the function ‘rename’ in Windows/Path.inc is expected to work when the source and destination are located of different logical volumes or different file systems. Note that function ‘rename’ is not ThinLTO-specific and might have some other consumers who want this behavior (which was the behavior before r315079). However, it seems that this assumption has changed later to make ‘rename’ more POSIX-compliant. In this case, we should add a comment to the function ‘rename’ so that its users knew that renaming across different volumes is not supported by design. Or we could have two different functions, one POSIX-compliant, not allowing renames across different volumes, another non-POSIX-compliant.

Before r315079, the function ‘rename’, (in the case when the destination file isn’t opened), boiled down to calling the function ‘MoveFileExW’ with the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the volumes.
The current implementation of the function ‘rename’ (in ‘rename_internal’), essentially calls ‘SetFileInformationByHandle’. This function is fast and atomic, so in a sense, it’s an improvement over the previously used ‘MoveFileExW’. However, ‘SetFileInformationByHandle’ doesn’t work when the source and destination are located on the different volumes.

My fix just falls back to calling ‘MoveFileExW’ when ‘SetFileInformationByHandle’ returns an error ‘ERROR_NOT_SAME_DEVICE’.

+ // Wine doesn’t support SetFileInformationByHandle in rename_internal.
+ // Rename_internal doesn’t work accross different disks. In both of these
+ // cases fall back to using MoveFileEx.
if (EC ==
- std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category())) {
- // Wine doesn’t support SetFileInformationByHandle in rename_internal.
- // Fall back to MoveFileEx.
+ std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category()) ||
+ EC == std::error_code(ERROR_NOT_SAME_DEVICE, std::system_category())) {
SmallVector<wchar_t, MAX_PATH> WideFrom;
if (std::error_code EC2 = realPathFromHandle(FromHandle, WideFrom))
return EC2;
if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),
- MOVEFILE_REPLACE_EXISTING))
+ MOVEFILE_COPY_ALLOWED | MOVEFILE_REPLACE_EXISTING))
return std::error_code();
return mapWindowsError(GetLastError());
}

Note, that when the source and destination are located on the different logical volumes, we will effectively perform a copy, followed by a delete, which are not atomic operations. Since copying to a different volume might be quite time consuming, we also increase the probability that some other process starts to rename to the same destination file.

So, this fix for ‘rename’ is not good for ThinLTO (we should do something else in this case). ‘rename’ is a generic function and has many different users, but unless renaming across the volumes is needed by someone else, other than ThinLTO, this patch shouldn’t be accepted.

Solution #2 (better)
Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in solution #1:

For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP% directories (or $TMP/$TEMP). We can create them in the same location where the cached files are stored or in a ‘temp’ subfolder within this folder. With this approach:

  • If the patch described in Solution #1 gets accepted (for the sake of other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.
  • If the patch doesn’t get accepted, then rename won’t fail, since we only rename the files within the same directory.

Solution #3
All the solutions for problem #3 (see below) will take care of problem #2.

I am surprised that we are not doing #2 already. Rename across the volumes is not atomic thus it can cause issues.

Problem #3 (data race)
Look at the following code in ThinLTOCodeGenerator.cpp. This code gets executed if the ‘rename’ function failed (e.g., because of the problem #1 or problem #2 described above).

EC = sys::fs::rename(TempFilename, EntryPath);
if (EC) {
sys::fs::remove(TempFilename);
raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);
if (EC)
report_fatal_error(Twine("Failed to open ") + EntryPath +
" to save cached entry\n");
OS << OutputBuffer.getBuffer(); // Two ThinLTO processes can write to the same file here
// causing data race.

Here, we have a data race problem. We actually stumbled across it while testing one of the games with ThinLTO+caching.

We need to find a generic solution to solve data race problem, while ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert and I might not be aware of some ThinLTO constraints. Let me know which problems you see with the two solutions below. Hopefully, it will trigger a further discussion.

Solution #0 (it won’t require much modifications):
(a) If ‘rename’ fails, do not write into the cache directly (we don’t want to have non-atomic writes!).
(b) If ‘rename’ fails, try to read from the cache.
auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
(c) If reading from the cache fails simply use the file that you just compiled directly (bypassing the cache entirely).

This solution might cause cache misses (causing recompilations), but at least it should prevent data races when two ThinLTO processes write to the same file (which can produce invalid cache!). Correctness is more important than optimization. It’s worth noticing another shortage of this solution: unlike generic solution #1, #2 and #3 described below this particular solution won’t solve the problem #4.

BTW, we should do something with WINE support in ‘rename’ (which is non-atomic). I didn’t want to list it as a separate problem here, but it is a problem. I could file a separate bug, if necessary.

Solution #1 (naïve generic solution):
(a) If the process wants to write to the cache file, it opens it with ‘exclusive read/write’ access.
(b) If a process wants to write to the cache file that is ‘exclusively’ opened by some other process, we could assume that the cache file will be successfully created by the first process and simply return from ‘write’ function. Different processes writing to the cache file of the same name, are writing the same content, right?
(c) If the process needs to read from the cache file, it might wait a bit while the file is open for ‘exclusive write’. If within a certain period the cache file doesn’t get released by a writer or gets removed by a pruner – oh, well, we have a hit miss. After all, using cache is just an acceleration. Correctness is more important.
(d) If we are concerned about data integrity of a cache file (e.g., a writer started to write to a file and something bad happened in the middle), we could do additional checks (I suspect that this might be already implemented in ThinLTO). A writer could add a unique hash at the end of the cache file or a CRC32 for each section of the file, which could be an indication that the write to this file was successful. A reader checks that this unique hash or a CRC32 checksum matches its expectations.

I’m sure that there are good reasons why this “naive” solution wasn’t implemented in the first place. If this solution is picked by LLVM community, I could write a working prototype for it that we could apply to ThinLTO.

Solution #2 (better generic solution):
It looks like cache file sharing between several ThinLTO processes is a classic readers-writers problem.
https://en.wikipedia.org/wiki/Readers%E2%80%93writers_problem

We could use read-write locks for global inter-process file sharing.
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

ThinLTO’s writer (creator/deleter) must acquire a write lock before writing to a file and release a write lock when it’s done writing.
ThinLTO’s user (reader) must acquire a read lock before reading a file and release a read lock when it’s done reading. On a Unix-like system, read-write locks are part of POSIX-threads (pthreads). There is an implementation of Slim read-write locks on Windows, but unfortunately for in-process only (i.e., the locks can’t be shared among different processes), so it won’t work for us.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).aspx

However, on Windows, we could implement read-write locks ourselves, using a combination of a named mutex and a named semaphore (any unique global name could be used for creating a named mutex/semaphore, the simplest one will be the cache file name itself).

Here is an example of an implementation:
http://www.zachburlingame.com/2011/06/simple-reader-writer-lock-in-c-on-win32/

If this solution is picked, I could write a working prototype for it that we could apply to ThinLTO.

Solution #3 (hybrid generic solution)
Basically, use solution #2 as a base and integrate some optimizations and completeness checking features from solution #1 (1.b and 1.d respectively).

Problem #4 (cache misses could have been avoided via synchronization).

With our current approach we might often race for the cache file resources and as a result have cache misses.
Of course, it’s not as dangerous as the data race problem #3, but it definitely lowers the efficiency of caching.

(a) The reader checks that a file exists, but then the pruner deletes it before the reader had a chance to read it. Cache miss.
(b) The reader starts reading a file (block by block) and at this time the pruner removes the file. The read might fail. This is OS-specific, but likely a cache miss.
(c) If the writer will rename a temp file into a file that is currently being read by a reader, I don’t expect anything good out of it (the behavior is OS-specific). In the best case scenario, the read will fail, in the worst one the reader will read the wrong content. So, it’s a cache miss or a correctness issue.
(d) This is not a very likely event, but what if two processes are trying to rename to the same file at the same time?

Solution:
Solutions #1, #2 and #3 for problem #3 will take care of problem #4.

(Potential) problem #5:
The implementation of the function ‘rename’ on Windows used by ThinLTO heavily depends on the function CreateFileW or, to be more exact, on its flag FILE_SHARE_DELETE being supported and working correctly on all the file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag supported on all non-native Windows file systems that we care about? Is it supported on WINE? On FAT32?

If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we have problem #3.

Solution:
All the solutions for problem #3 will take care of problem #5.

Please let me know what do you think about the problems/solutions discussed here. Hopefully this pre-RFC shapes into a RFC soon.

I am not really concerned with all the problems, except Problem #2. From my understanding, LTO cache is for incremental build only. For every file in the cache, there should be one single writer, otherwise, it is a hash collision and will result in miscompile (or link error most likely). The cached file will only be used when you rebuild the binary. Lots of data races are in situations that are not supported.

For its current design, you cannot shared the LTO cache between two linking. For example, if you are building llc and libLTO from LLVM and they shared many same static archives with common LTO object files, you still need to use two different LTO caches.

Why do we need to use two different caches?
In my memory a ThinLTO build of LLVM does use a single cache and share and reuse objects across binaries if possible.

I think you are correct. We can actually share the LTO cache. I thought I once investigated an issue about sharing lto_cache but when I dig it out, I realize it is about sharing object_path_lto. My mistake.

On UNIX, the cached output is valid once it is opened. Pruning the cache or rename the file will not invalidate the content. Then I guess Katya’s summary is accurate. We do have Problem #1, #2, #3. Problem #4 will only cause cache misses not really failures.

I think the easiest fix (without complicated lock across processes) is do the following two fixes:

  1. when rename failed, just bypass the cache
  2. the temp file should be in the same directory or in the subdirectory of LTO_CACHE.

Does that sound correct? Do you have better proposals?

Steven

Hi Katya

Thanks for investigating this. Here is my thought inline.

Hi Katya

Thanks for investigating this. Here is my thought inline.

From: stevenwu@apple.com <mailto:stevenwu@apple.com> <stevenwu@apple.com <mailto:stevenwu@apple.com>>
Sent: Monday, March 26, 2018 11:58 PM
To: Mehdi AMINI <joker.eph@gmail.com <mailto:joker.eph@gmail.com>>
Cc: Romanova, Katya <katya.romanova@sony.com <mailto:katya.romanova@sony.com>>; Teresa Johnson <tejohnson@google.com <mailto:tejohnson@google.com>>; Rafael Espíndola <rafael.espindola@gmail.com <mailto:rafael.espindola@gmail.com>>; Peter Collingbourne <peter@pcc.me.uk <mailto:peter@pcc.me.uk>>; Hans Wennborg <hans@chromium.org <mailto:hans@chromium.org>>; Reid Kleckner <rnk@google.com <mailto:rnk@google.com>>; bd1976llvm@gmail.com <mailto:bd1976llvm@gmail.com>; llvm-dev <llvm-dev@lists.llvm.org <mailto:llvm-dev@lists.llvm.org>>
Subject: Re: [pre-RFC] Data races in concurrent ThinLTO processes

Hi Katya

Thanks for investigating this. Here is my thought inline.

Hello,

I am sending the following proposal to discuss issues and solutions regarding data races in concurrent ThinLTO processes.

This caught my attention when we encountered a race condition in ThinLTO with caching.
While looking into how ThinLTO deals with the problem of cache files reads/writes/deletes I spotted a lot of problems: some of them are related to data races, others - to file/resource sharing between different processes. I wanted to point out these problems and start the discussion about potential solutions. I would like to get your feedback.

Problem #1: In certain common situations, ‘rename’ can fail, causing a data race later on.
Look at the following code in the function ‘write’ in ThinLTOCodeGenerator.cpp

std::error_code EC =
  sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD, TempFilename);
if (EC) {
  errs() << "Error: " << EC.message() << "\n";
  report_fatal_error("ThinLTO: Can't get a temporary file");
}
{
  raw_fd_ostream OS(TempFD, /* ShouldClose */ true);
  OS << OutputBuffer.getBuffer();
}
// Rename to final destination (hopefully race condition won't matter here)
EC = sys::fs::rename(TempFilename, EntryPath);

Compared to a race-prone direct write of a buffer to a file, this code avoids a data race by first writing a buffer to a temp file and then renaming that temp file to become the final file in the cache. After r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees atomicity of writing a file to the cache,
(except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).

However, ‘rename’ might fail and return an error code in several different situations. If this happens, we will fall back to a direct write to the file, which is neither atomic, nor avoids race conditions (causing problem #3).
An example where 'rename' can fail on both Windows and Unix-like systems, causing us to fall back to using non-atomic write, is problem #2.

Solution:
All solutions for problem #3 (see below) will take care of problem #1.

I don't think there is a race here but it can probably be optimized. I don't see LTOCodeGenerator fall back to non-atomic write in the code. If the rename failed, it will just fatal_error and exit. Maybe a better solution is just leave the temp file and use it as output. The file will be recompiled if needed again (turns into a cache miss).

Problem #2 (Use of POSIX-compliant ‘rename’ function is not always suitable for ThinLTO)
We just encountered this issue in Sony. For ThinLTO, we use the function ‘rename’, which after r315079 doesn’t support renaming across the different logical volumes on Windows.
Let’s say a user has a %TEMP% directory on volume C:\, but he/she passed the caching directory name located on volume D:\, then ‘rename’ fails. Since we don’t support renaming across different volumes after r315079, we then fall back to writing to the cache file directly (which is not atomic) and hit a data race.
We anticipate that this will be a common situation for our users, many of whom are Windows “power users” and have multiple disks for different purposes.

I think this problem doesn’t only affect Windows. The same issue might happen on Linux/MacOS, if the user's $TEMP/$TMP directory is located in a different file system than the user-specified caching directory.

Solution #1 (not so good):
The patch below solves this problem on Windows, however, we will lose the advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is not good.

This patch assumes that the function 'rename' in Windows/Path.inc is expected to work when the source and destination are located of different logical volumes or different file systems. Note that function ‘rename’ is not ThinLTO-specific and might have some other consumers who want this behavior (which was the behavior before r315079). However, it seems that this assumption has changed later to make ‘rename’ more POSIX-compliant. In this case, we should add a comment to the function 'rename' so that its users knew that renaming across different volumes is not supported by design. Or we could have two different functions, one POSIX-compliant, not allowing renames across different volumes, another non-POSIX-compliant.

Before r315079, the function 'rename', (in the case when the destination file isn't opened), boiled down to calling the function 'MoveFileExW' with the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the volumes.
The current implementation of the function ‘rename’ (in 'rename_internal'), essentially calls 'SetFileInformationByHandle'. This function is fast and atomic, so in a sense, it's an improvement over the previously used 'MoveFileExW'. However, 'SetFileInformationByHandle' doesn't work when the source and destination are located on the different volumes.

My fix just falls back to calling 'MoveFileExW' when 'SetFileInformationByHandle' returns an error 'ERROR_NOT_SAME_DEVICE'.

+ // Wine doesn't support SetFileInformationByHandle in rename_internal.
+ // Rename_internal doesn't work accross different disks. In both of these
+ // cases fall back to using MoveFileEx.
     if (EC ==
- std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category())) {
- // Wine doesn't support SetFileInformationByHandle in rename_internal.
- // Fall back to MoveFileEx.
+ std::error_code(ERROR_CALL_NOT_IMPLEMENTED, std::system_category()) ||
+ EC == std::error_code(ERROR_NOT_SAME_DEVICE, std::system_category())) {
       SmallVector<wchar_t, MAX_PATH> WideFrom;
       if (std::error_code EC2 = realPathFromHandle(FromHandle, WideFrom))
         return EC2;
       if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),
- MOVEFILE_REPLACE_EXISTING))
+ MOVEFILE_COPY_ALLOWED | MOVEFILE_REPLACE_EXISTING))
         return std::error_code();
       return mapWindowsError(GetLastError());
     }

Note, that when the source and destination are located on the different logical volumes, we will effectively perform a copy, followed by a delete, which are not atomic operations. Since copying to a different volume might be quite time consuming, we also increase the probability that some other process starts to rename to the same destination file.

So, this fix for ‘rename’ is not good for ThinLTO (we should do something else in this case). ‘rename’ is a generic function and has many different users, but unless renaming across the volumes is needed by someone else, other than ThinLTO, this patch shouldn’t be accepted.

Solution #2 (better)
Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in solution #1:

For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP% directories (or $TMP/$TEMP). We can create them in the same location where the cached files are stored or in a ‘temp’ subfolder within this folder. With this approach:
- If the patch described in Solution #1 gets accepted (for the sake of other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.
- If the patch doesn’t get accepted, then rename won’t fail, since we only rename the files within the same directory.

Solution #3
All the solutions for problem #3 (see below) will take care of problem #2.

I am surprised that we are not doing #2 already. Rename across the volumes is not atomic thus it can cause issues.

Problem #3 (data race)
Look at the following code in ThinLTOCodeGenerator.cpp. This code gets executed if the ‘rename’ function failed (e.g., because of the problem #1 or problem #2 described above).

EC = sys::fs::rename(TempFilename, EntryPath);
if (EC) {
  sys::fs::remove(TempFilename);
  raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);
  if (EC)
    report_fatal_error(Twine("Failed to open ") + EntryPath +
                       " to save cached entry\n");
  OS << OutputBuffer.getBuffer(); // Two ThinLTO processes can write to the same file here
                                             // causing data race.

Here, we have a data race problem. We actually stumbled across it while testing one of the games with ThinLTO+caching.

We need to find a generic solution to solve data race problem, while ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert and I might not be aware of some ThinLTO constraints. Let me know which problems you see with the two solutions below. Hopefully, it will trigger a further discussion.

Solution #0 (it won’t require much modifications):
(a) If ‘rename’ fails, do not write into the cache directly (we don’t want to have non-atomic writes!).
(b) If ‘rename’ fails, try to read from the cache.
auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
(c) If reading from the cache fails simply use the file that you just compiled directly (bypassing the cache entirely).

This solution might cause cache misses (causing recompilations), but at least it should prevent data races when two ThinLTO processes write to the same file (which can produce invalid cache!). Correctness is more important than optimization. It’s worth noticing another shortage of this solution: unlike generic solution #1, #2 and #3 described below this particular solution won’t solve the problem #4.

BTW, we should do something with WINE support in ‘rename’ (which is non-atomic). I didn’t want to list it as a separate problem here, but it is a problem. I could file a separate bug, if necessary.

Solution #1 (naïve generic solution):
(a) If the process wants to write to the cache file, it opens it with 'exclusive read/write' access.
(b) If a process wants to write to the cache file that is ‘exclusively’ opened by some other process, we could assume that the cache file will be successfully created by the first process and simply return from ‘write’ function. Different processes writing to the cache file of the same name, are writing the same content, right?
(c) If the process needs to read from the cache file, it might wait a bit while the file is open for 'exclusive write'. If within a certain period the cache file doesn’t get released by a writer or gets removed by a pruner – oh, well, we have a hit miss. After all, using cache is just an acceleration. Correctness is more important.
(d) If we are concerned about data integrity of a cache file (e.g., a writer started to write to a file and something bad happened in the middle), we could do additional checks (I suspect that this might be already implemented in ThinLTO). A writer could add a unique hash at the end of the cache file or a CRC32 for each section of the file, which could be an indication that the write to this file was successful. A reader checks that this unique hash or a CRC32 checksum matches its expectations.

I'm sure that there are good reasons why this "naive" solution wasn't implemented in the first place. If this solution is picked by LLVM community, I could write a working prototype for it that we could apply to ThinLTO.

Solution #2 (better generic solution):
It looks like cache file sharing between several ThinLTO processes is a classic readers-writers problem.
https://en.wikipedia.org/wiki/Readers–writers_problem

We could use read-write locks for global inter-process file sharing.
https://en.wikipedia.org/wiki/Readers–writer_lock

ThinLTO's writer (creator/deleter) must acquire a write lock before writing to a file and release a write lock when it's done writing.
ThinLTO’s user (reader) must acquire a read lock before reading a file and release a read lock when it's done reading. On a Unix-like system, read-write locks are part of POSIX-threads (pthreads). There is an implementation of Slim read-write locks on Windows, but unfortunately for in-process only (i.e., the locks can’t be shared among different processes), so it won’t work for us.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).aspx

However, on Windows, we could implement read-write locks ourselves, using a combination of a named mutex and a named semaphore (any unique global name could be used for creating a named mutex/semaphore, the simplest one will be the cache file name itself).

Here is an example of an implementation:
http://www.zachburlingame.com/2011/06/simple-reader-writer-lock-in-c-on-win32/

If this solution is picked, I could write a working prototype for it that we could apply to ThinLTO.

Solution #3 (hybrid generic solution)
Basically, use solution #2 as a base and integrate some optimizations and completeness checking features from solution #1 (1.b and 1.d respectively).

Problem #4 (cache misses could have been avoided via synchronization).

With our current approach we might often race for the cache file resources and as a result have cache misses.
Of course, it's not as dangerous as the data race problem #3, but it definitely lowers the efficiency of caching.

(a) The reader checks that a file exists, but then the pruner deletes it before the reader had a chance to read it. Cache miss.
(b) The reader starts reading a file (block by block) and at this time the pruner removes the file. The read might fail. This is OS-specific, but likely a cache miss.
(c) If the writer will rename a temp file into a file that is currently being read by a reader, I don't expect anything good out of it (the behavior is OS-specific). In the best case scenario, the read will fail, in the worst one the reader will read the wrong content. So, it's a cache miss or a correctness issue.
(d) This is not a very likely event, but what if two processes are trying to rename to the same file at the same time?

Solution:
Solutions #1, #2 and #3 for problem #3 will take care of problem #4.

(Potential) problem #5:
The implementation of the function ‘rename’ on Windows used by ThinLTO heavily depends on the function CreateFileW or, to be more exact, on its flag FILE_SHARE_DELETE being supported and working correctly on *all* the file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag supported on all non-native Windows file systems that we care about? Is it supported on WINE? On FAT32?

If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we have problem #3.

Solution:
All the solutions for problem #3 will take care of problem #5.

Please let me know what do you think about the problems/solutions discussed here. Hopefully this pre-RFC shapes into a RFC soon.

I am not really concerned with all the problems, except Problem #2. From my understanding, LTO cache is for incremental build only. For every file in the cache, there should be one single writer, otherwise, it is a hash collision and will result in miscompile (or link error most likely). The cached file will only be used when you rebuild the binary. Lots of data races are in situations that are not supported.

For its current design, you cannot shared the LTO cache between two linking. For example, if you are building llc and libLTO from LLVM and they shared many same static archives with common LTO object files, you still need to use two different LTO caches.

Why do we need to use two different caches?
In my memory a ThinLTO build of LLVM does use a single cache and share and reuse objects across binaries if possible.

I think you are correct. We can actually share the LTO cache. I thought I once investigated an issue about sharing lto_cache but when I dig it out, I realize it is about sharing object_path_lto. My mistake.

On UNIX, the cached output is valid once it is opened. Pruning the cache or rename the file will not invalidate the content. Then I guess Katya's summary is accurate. We do have Problem #1, #2, #3. Problem #4 will only cause cache misses not really failures.

I think the easiest fix (without complicated lock across processes) is do the following two fixes:
1. when rename failed, just bypass the cache
2. the temp file should be in the same directory or in the subdirectory of LTO_CACHE.

Does that sound correct? Do you have better proposals?

I’m sorry, Steven. I saw your latest reply only after I sent out a response to your previous email.
Basically, If you can double check with MacOS VFS developers that if one process is in the middle of reading a file while another deletes this file (or renames *to* this file), then the reader could still successfully finish reading the original file's content. If my interpretation of documentation is correct, this should work on Windows for all the file systems (except the ones that do not support FILE_SHARE_DELETE flag).

If MacOS VFS developers will give you a positive response, we could implement a straightforward solution that we discussed in this email thread all that time (Solution #0) and that you just summarized above. Personally, I prefer that. Alternatively, we could leverage new C++ LTO API functionality. I suspect it's significantly more work and the new code is much harder to read. Let’s wait for additional comments from Peter about the additional advantage of this solution.

If you will get a negative response from VFS guys, we will re-consider heavyweight full-fledged solutions (e.g., previously discussed solutions #1 and #2).

Sorry I completely misunderstood the situation in the beginning. I don't see any problem modifying the cache when one process already has the file open. It is actually part of the POSIX standard that the file should only be removed after all the references to the file are removed. So as long as our implementation is POSIX-compliant, this is not an issue.

Steven

Hi Peter,

Thank you for the clarification J.

I’m sure you have a very good understanding of how much efforts it will
take to write a patch for legacy C LTO to implement caching the same way
it’s done in new C++ LTO API. How easy/difficult do you think it will be
(very roughly, in LOC)? Do you anticipate that a lot of existing legacy C
LTO infrastructure will have to be rewritten?

The legacy API, as far as I can tell, returns object files to the linker in
memory buffers, which is pretty much how the new API returns them, so you
probably won't have problems there.

Could this also be done in 6.0 branch with reasonable amount of efforts?
We live downstream, so for us the work needs to be done in 6.0 branch. I
want to make sure that all the necessary required patches for new C++ LTO
API (related to caching) are already there in 6.0 branch or at least it
will take a reasonable enough to cherry-pick them from ToT.

I think they should be there (the last substantial change to the cache
implementation happened last year and the 6.0 branch happened in January).

I could implement a pretty straightforward patch in the existing legacy C
LTO code, which will do the same (at least with respect of caching/fixing
data races problems) as the new C++ LTO API. Namely, (a) create a temp file
in the same directory as the cache directory; (b) if renaming fails, do not
do direct write to the cache file, just exit; (c) if renaming fails, use
the buffer that we planned to write to the cache file, and bypass reading
from cache altogether. Will this work? Does new C++ LTO API has other
advantages or fixes other issues with respect of caching/avoiding data
races?

You probably want to reimplement what Caching.cpp is doing in the legacy
API. I wouldn't try to reuse the lto::localCache function from the legacy
API or anything like that.

Peter

The reason why I’m asking all these questions is because I need to decide

Hi Steven and Peter,

I think we resolved all the misunderstanding/concerns that we had with the proposal and decided that we don’t have to implement heavy-weight synchronization solutions (such as read-write locks, etc). Lightweight solution is expected to work on MacOS and Windows (however, there might be issues with Windows supporting non-NTFS file systems).

There are two options for the lightweight solution:

(a) an easy fix:

a. place temp file to the same place where the caching directory is;

b. if rename failed do not write to the cache file directly;

c. if rename failed, bypass using/reading the cache altogether.

(b) Retrofitting existing functionality from new C++ LTO API into legacy C API.

I prefer to implement the first option unless there are other advantages of using a second one. Steven mentioned this morning that he prefers it too (very straightforward and easy/fast to implement fix). I’m not aware of any other stakeholders in legacy C API other than Sony and Apple.

Peter, are you OK with our decision?

If everyone agrees, I will implement an “easy” fix. Please let me know.

Katya.

Sounds good. As I mentioned, that’s pretty much what the new API is doing already.

Peter