See Parallel input file parsing for some parallelism work on lld.
Parallelism can easily make diagnostic order (warnings and errors) non-deterministic[1] and @dblaikie raised a question/concern on rGe45a5696bb2a . I am therefore creating this post to make users aware and discuss.
Note: linker output needs to be predicatable. There is no excuse. (When the user specified an invalid linker script to overlap output sections, I think not detecting the case is acceptable, at least temporarily.)
[1]:
- The user will see every line exactly once, but the order is not guaranteed. The next invocation with the same input may get a different permutation.
- Due to
--error-limit
and our default value 20, lld will quit early after reaching 20 errors. This means in the presence of so many errors, the next invocation may give a different set of errors.
Some non-determinism diagnostics have been a long time.
–gdb-index: parsing DWARF may lead to diagnostics. We don’t do anything to ensure the diagnostics are emitted in order.
Some non-determinism diagnostics are new. (There is technically a trend of introducing more, but right now I do not find anything easily parallelizable. The next thing, if possible, is diagnostics from relocations.)
The ELF port now initializes local symbols and sections in parallel. Invalid input like too large section indexes, non-local symbols found at index < .symtab’s sh_info, invalid symbol name offset, etc will lead to (usually) errors or (sometimes) warnings.
We simply call warn/error/fatal
. If multiple input (very rare in practice) has invalid input, the diagnostics do not have a guaranteed order.
I think it’s a fairly important issue & worth broader consensus (discourse thread, likely) before this moves further forward, potentially reverting or disabling the nondeterministic parallelism (my understanding is that lld had parallelism previous to this work that didn’t cause nondeterministic output - that doesn’t need to be disabled) until this is addressed.
I strongly object to this.
Linker diagnostics which may be parallel are rare. Linker diagnostics are different from compiler diagnostics in that they are genrally not ignorable. Some common linker diagnostics are non-parallel (e.g. non-existent entry point, incompatible options).
For the diagnostics which have values to be parallel, they tend to be fatal and suggest broken input. I don’t find any ignorable and parallel diagnostics.
For invalid input, we sometimes choose warn
just so that a link can give the user more information or there was a widely misuse we want to work around a bit.
It is rare to have multiple invalid input. Say an input file has a fatal error like invalid symbol table, broken relocations.
Typically it is one input file/archive which the problem. When this happens, giving information for this file suffices.
There is no requirement that the error need to interleave with other types of errors in a determinic way.
Somtimes a large number of input has the same types of errors (e.g. invalid symbol table). When this happens, this suggests that the tool producing these input files have a problem.
The user is typically not interested in knowning every input file. One example suffices. This point is strengthed by the fact that we decided to default to --error-limit=20
.
Fixing one input will likely fix all other input in the next invocation of the linker.
The parallelism patches have improved ld.lld performance greatly. It’s not fair to revert them to have the low-value (as explained previously) deterministic and uncommon diagnostics.
Last, mold does not do anything making diagnostics deterministic.
It is possible to make diagnostics deterministic. We need to invent new sets of functions for warn/error/fatal which record messages in a vector, assign them an ID (sometimes not easily derivable), and sort the messages when we decide to emit them. This is implementable but may come with a large overhead. We have a question whether we should implement it.