RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Hello folks (and sorry if I’ve forgotten to CC anyone with particular interest to this discussion…):

I’ve been thinking a lot about how best to build advanced runtime libraries like ASan, and scale them up. Note that this does not try to address any licensing issues. For now, I’ll consider those orthogonal / solvable w/o technical contortions. =]

My primary motivation: we really, really need runtime libraries to be able to use common, shared libraries.

This starts with libraries such as the C++ standard library – a runtime shouldn’t need to re-implement std::vector. It includes other primitive libraries that have had significant effort put into them in LLVM such as the ADT and Support libraries. But, IMO, it has even more importance as we start looking at libraries such as ELF readers, DWARF readers, symbolizers, etc. This code should shared, and shared easily, with other LLVM projects.

However, clearly the runtime must at some point be linked against a program, and indeed programs which may be using the same set of libraries. It is crucially important that the runtime uses a separate implementation of the libraries from the ones used by the program itself: we will often compile the program’s libraries with instrumentation and other features which we explicitly wish to avoid in the runtime. Even simple name clashes can cause problems, leading to the current practice of putting all of these runtime libraries into a ‘__sanitizer’ or other specially spelled namespace.

A final unusual requirement is that at least some of the code for the runtime libraries must be statically linked to have reasonable efficiency. We also have several use cases where it would be very convenient to link all of the runtime statically, so I prefer a solution that preserves this option.

So how can we effectively share code? Here is my proposal, and a few alternate strategies.

I suggest that we build the runtime library as-if it were not a runtime library at all, and just a normal library. No strange namespaces, no restrictions on what other libraries it uses with one exception: they must all be statically linkable. We build this as a normal archive library, nothing special. One nice property is that testing the runtime library becomes the same as testing any other library.

Then, we have a special build step to produce a final archive which is actually used as the runtime library. This step works not dissimilarly to the step to link an executable: we build the list of archive libraries depended on, but instead of linking an executable, we run a linker script over them. This script will re-link each ‘.o’ file from the transitive closure of archives, prepending a ‘asan’ (or other runtime library prefix) onto each symbol; effectively mangling each symbol. All of these processed ‘.o’ files would go into a single, final archive that would be the installed runtime library. The only functions not processed in this manner are a white list of “exported” functions from the runtime (C-library routines provided by the runtime, and runtime entry points, et.).

The result should be a runtime library that is essentially hermetic, and should have no clashes with binaries it links against. It would be free to use standard libraries, LLVM libraries, whatever it needs. That said, there are some clear disadvantages:

  • Bizarre name mangling, especially for C++
  • Potentially incompatible with C++ EH, libunwind, or other tools (I just don’t know, haven’t done enough research here)
  • Requires “relinking” the final runtime
  • Definitely implementable on Linux & ELF-based BSDs, I think do-able on Darwin, but I have no idea about Windows.
  • Other downsides? I’m probably missing some big problems here… ;]

However, if we can make this (possibly with tweaks/modifications) work, I think the upside is quite large – the runtime library stops having to be written in such a strange special sub-set of the language, etc.

Note that this proposal is orthogonal to the issue of minimizing the binary size and cost of the runtime library – that is clearly still an important concern, but that can be addressed both with or without using other libraries. LLVM has lots of libraries specifically engineered to be lightweight in situations like this.

Other alternatives that have been discussed:

  • Require isolating all shared code into a shared library (.so) than is loaded as-needed. This helps some, but it doesn’t seem to fully solve the issues (where does the shared code go? the .so? What happens when it is loaded into a program that already has copies of the same code? What happens when one is instrumented and the other isn’t). It also requires us to ship the ‘.so’ with the binary to get full functionality, something that would be at least somewhat undesirable. It also requires the runtime library developers to carefully partition the code into that which can go in the .a and that which can go in the .so.

  • The current strategy of re-implementing everything needed from (essentially) the ground up inside the runtime library. I think that this has serious long-term maintenance problems… but who knows, maybe?

  • Other ideas?

+dvyukov

Hello folks (and sorry if I’ve forgotten to CC anyone with particular interest to this discussion…):

I’ve been thinking a lot about how best to build advanced runtime libraries like ASan, and scale them up. Note that this does not try to address any licensing issues. For now, I’ll consider those orthogonal / solvable w/o technical contortions. =]

My primary motivation: we really, really need runtime libraries to be able to use common, shared libraries.

I am not sure you understand the problem as we do.

In short, asan/tsan/msan/etc can not use any function which is also called from the instrumented binary.
E.g. it can not use malloc() for internal allocations because malloc is intercepted/replaced. We use raw mmap.
It can not use functions like strlen, memset, etc, because those functions generate memory access events. We use our own implementations or sometimes steal them from libc using dlsym.
Ideally, asan/etc should not even use libc functions like read() – on linux we currently use raw system call for some of those.

In Valgrind, they struggled with the same problem and made 2 or 3 attempts to reuse the system libc.
Every time it ended with a maintenance nightmare; so currently valgrind has its own private subset of libc.
In PIN, they have a private copy of system libc/libstdc++ and, afaict, it is constantly causing pain for the maintainers.
In the previous version of ThreadSanitizer we used a private copy of STLport in a separate namespace and a custom libc (small subset). This worked, but had problems too (Dmitry was very angry at STLport for code bloat, stack size increase and some direct libc calls).

Until recently this was not causing too much pain in asan/tsan, but our attempts to use the LLVM DWARF readers made it worse.
When tsan finds a race, we need to symbolize it online to be able to match against a suppression and decide whether we want to emit the warning. Today we do it in a separate addr2line process (ugly and slow).
But if we start calling the LLVM dwarf reader we end up with all possible dependency problems (Dmitry and Alexey will know the exact ones) because the LLVM code calls to malloc, memcpy, etc.

Frankly, I don’t have any solution other than to change the code such that it does not call libc/libc++.
Some of that may be solved by a private copy of STLport + a bit of custom libc (but see above about STLport)

–kcc

+dvyukov

Hello folks (and sorry if I’ve forgotten to CC anyone with particular interest to this discussion…):

I’ve been thinking a lot about how best to build advanced runtime libraries like ASan, and scale them up. Note that this does not try to address any licensing issues. For now, I’ll consider those orthogonal / solvable w/o technical contortions. =]

My primary motivation: we really, really need runtime libraries to be able to use common, shared libraries.

I am not sure you understand the problem as we do.

In short, asan/tsan/msan/etc can not use any function which is also called from the instrumented binary.

Well, I can’t be sure, but this description certainly agrees with my understanding – you need every part of the runtime to be completely separate from every part of the instrumented binary. I’m with you there.

In particular, I think the current strategy for libc & system calls makes perfect sense, and I’m not trying to suggest changing it.

I think the most similar situation is is this one:

In the previous version of ThreadSanitizer we used a private copy of STLport in a separate namespace and a custom libc (small subset).

My proposal is very similar except without the need to modify the C++ standard library in use. Instead, I’m suggesting post-processing the library to ensure that the standard C++ library code in the runtime is kept complete distinct from that in the instrumented binary – everything would in fact be mangled differently.

The goal would be to avoid the maintenance overhead of a custom C++ standard library, and instead use a normal one. My understanding is that both GCC’s libstdc++ and LLVM’s libc++ are significantly higher quality than STLport, and if we’re doing static linking, the code bloat should be greatly reduced. We could reduce it still further by doing LTO of the runtime library, which should be very straight forward given the rest of my proposal.

It would still require a very small subset of libc, likely not much more than you already have.

This worked, but had problems too (Dmitry was very angry at STLport for code bloat, stack size increase and some direct libc calls).

I would be interested to know if the above addresses most of the problems or not.

Until recently this was not causing too much pain in asan/tsan, but our attempts to use the LLVM DWARF readers made it worse.
When tsan finds a race, we need to symbolize it online to be able to match against a suppression and decide whether we want to emit the warning. Today we do it in a separate addr2line process (ugly and slow).
But if we start calling the LLVM dwarf reader we end up with all possible dependency problems (Dmitry and Alexey will know the exact ones) because the LLVM code calls to malloc, memcpy, etc.

Frankly, I don’t have any solution other than to change the code such that it does not call libc/libc++.
Some of that may be solved by a private copy of STLport + a bit of custom libc (but see above about STLport)

I think my proposal is essentially in between these two:

  • Avoid the need for a low quality STL by using a normal C++ standard library implementation, and avoid maintenance burden by doing a link-time mangling of the symbols.
  • Provide the minimal custom libc, and do the same to it
  • Link the LLVM libraries against these, and munge their symbols as well
  • LTO the whole thing if needed to get the code bloat down

I think this is actually easier than changing the LLVM libraries to not use the C++ standard libraries. I also think it is easier than re-implementing the LLVM libraries in question. But that doesn’t mean I think it is easy. ;] I think it is quite hard, but it is the best solution I can come up with.

+dvyukov

Hello folks (and sorry if I’ve forgotten to CC anyone with particular interest to this discussion…):

I’ve been thinking a lot about how best to build advanced runtime libraries like ASan, and scale them up. Note that this does not try to address any licensing issues. For now, I’ll consider those orthogonal / solvable w/o technical contortions. =]

My primary motivation: we really, really need runtime libraries to be able to use common, shared libraries.

I am not sure you understand the problem as we do.

In short, asan/tsan/msan/etc can not use any function which is also called from the instrumented binary.

Well, I can’t be sure, but this description certainly agrees with my understanding – you need every part of the runtime to be completely separate from every part of the instrumented binary. I’m with you there.

In particular, I think the current strategy for libc & system calls makes perfect sense, and I’m not trying to suggest changing it.

I think the most similar situation is is this one:

In the previous version of ThreadSanitizer we used a private copy of STLport in a separate namespace and a custom libc (small subset).

My proposal is very similar except without the need to modify the C++ standard library in use. Instead, I’m suggesting post-processing the library to ensure that the standard C++ library code in the runtime is kept complete distinct from that in the instrumented binary – everything would in fact be mangled differently.

The goal would be to avoid the maintenance overhead of a custom C++ standard library, and instead use a normal one. My understanding is that both GCC’s libstdc++ and LLVM’s libc++ are significantly higher quality than STLport, and if we’re doing static linking, the code bloat should be greatly reduced. We could reduce it still further by doing LTO of the runtime library, which should be very straight forward given the rest of my proposal.

It would still require a very small subset of libc, likely not much more than you already have.

This worked, but had problems too (Dmitry was very angry at STLport for code bloat, stack size increase and some direct libc calls).

I would be interested to know if the above addresses most of the problems or not.

Until recently this was not causing too much pain in asan/tsan, but our attempts to use the LLVM DWARF readers made it worse.
When tsan finds a race, we need to symbolize it online to be able to match against a suppression and decide whether we want to emit the warning. Today we do it in a separate addr2line process (ugly and slow).
But if we start calling the LLVM dwarf reader we end up with all possible dependency problems (Dmitry and Alexey will know the exact ones) because the LLVM code calls to malloc, memcpy, etc.

Frankly, I don’t have any solution other than to change the code such that it does not call libc/libc++.
Some of that may be solved by a private copy of STLport + a bit of custom libc (but see above about STLport)

I think my proposal is essentially in between these two:

  • Avoid the need for a low quality STL by using a normal C++ standard library implementation, and avoid maintenance burden by doing a link-time mangling of the symbols.

re-linking might be too platform specific.
How about compiling the library into LLVM bitcode and adding namespaces/prefixes to that bitcode?

–kcc

+dvyukov

Hello folks (and sorry if I’ve forgotten to CC anyone with particular interest to this discussion…):

I’ve been thinking a lot about how best to build advanced runtime libraries like ASan, and scale them up. Note that this does not try to address any licensing issues. For now, I’ll consider those orthogonal / solvable w/o technical contortions. =]

My primary motivation: we really, really need runtime libraries to be able to use common, shared libraries.

I am not sure you understand the problem as we do.

In short, asan/tsan/msan/etc can not use any function which is also called from the instrumented binary.

Well, I can’t be sure, but this description certainly agrees with my understanding – you need every part of the runtime to be completely separate from every part of the instrumented binary. I’m with you there.

In particular, I think the current strategy for libc & system calls makes perfect sense, and I’m not trying to suggest changing it.

I think the most similar situation is is this one:

In the previous version of ThreadSanitizer we used a private copy of STLport in a separate namespace and a custom libc (small subset).

My proposal is very similar except without the need to modify the C++ standard library in use. Instead, I’m suggesting post-processing the library to ensure that the standard C++ library code in the runtime is kept complete distinct from that in the instrumented binary – everything would in fact be mangled differently.

The goal would be to avoid the maintenance overhead of a custom C++ standard library, and instead use a normal one. My understanding is that both GCC’s libstdc++ and LLVM’s libc++ are significantly higher quality than STLport, and if we’re doing static linking, the code bloat should be greatly reduced. We could reduce it still further by doing LTO of the runtime library, which should be very straight forward given the rest of my proposal.

It would still require a very small subset of libc, likely not much more than you already have.

This worked, but had problems too (Dmitry was very angry at STLport for code bloat, stack size increase and some direct libc calls).

I would be interested to know if the above addresses most of the problems or not.

Until recently this was not causing too much pain in asan/tsan, but our attempts to use the LLVM DWARF readers made it worse.
When tsan finds a race, we need to symbolize it online to be able to match against a suppression and decide whether we want to emit the warning. Today we do it in a separate addr2line process (ugly and slow).
But if we start calling the LLVM dwarf reader we end up with all possible dependency problems (Dmitry and Alexey will know the exact ones) because the LLVM code calls to malloc, memcpy, etc.

Frankly, I don’t have any solution other than to change the code such that it does not call libc/libc++.
Some of that may be solved by a private copy of STLport + a bit of custom libc (but see above about STLport)

I think my proposal is essentially in between these two:

  • Avoid the need for a low quality STL by using a normal C++ standard library implementation, and avoid maintenance burden by doing a link-time mangling of the symbols.

re-linking might be too platform specific.
How about compiling the library into LLVM bitcode and adding namespaces/prefixes to that bitcode?

Re-linking is a bit platform specific…

It would definitely work on ELF platforms, and likely on Darwin, but Windows is tricky.

On windows we would at least need a custom tool, but such a tool would be quite easy to write I suspect. We could even use the very LLVM libraries in question to write it! ;] Amusingly, I think with the LLVM libraries we could very easily write a custom tool just to mangle the symbol names in a collection of object files very easily and have it work on most platforms!

Still, the bitcode idea is interesting. Doing this entirely in bitcode has some advantages as these types of runtimes are among the best uses for things like LTO: they’re small, performance sensitive, can enumerate the entry points easily, and are likely to have a particular need for dead code elimination.

One nice thing is that I suspect we could do any of these three options, and get equivalent output for them. It may not matter what strategy is used long term, we can use the easiest to implement short term.

One reason to want to have some support for doing this w/o bitcode: we may not have the bitcode. Specifically, the goal would be to use the “normal” C++ standard library, provided it is available to link statically (libstdc++ and libc++ certainly are, I don’t know about MSVC). That would be much easier if we can actually use the existing archive file, and just “fix” the .o files inside it.

It seems likely to be the equivalent of an ‘ld -r’ run with a linker script to munge the symbol names, or potentially a custom tool written with the LLVM object file libraries.

Hi,

Yes, stlport was a pain to deploy and maintain + it calls normal operator new/delete (there is no way to put them into a separate namespace).

Note that in some codebases we build asan/tsan runtimes from source. How the build process will look with that object file mangling? How easy it is to integrate it into a custom build process?

Soon I will start integrating tsan into Go language. For the Go language we need very simple object files. No global ctors, no thread-local storage, no weak symbols and other trickery. Basically what a portable C compiler could have produced.

Hi,

Yes, stlport was a pain to deploy and maintain + it calls normal operator new/delete (there is no way to put them into a separate namespace).

Ok, but putting the raw symbols into a “namespace” with the linker shouldn’t be subject to these limitations.

Note that in some codebases we build asan/tsan runtimes from source. How the build process will look with that object file mangling? How easy it is to integrate it into a custom build process?

Well, I don’t know yet. ;] It was an idea, I don’t have an implementation at this point. That said, I had only really imagined building the runtimes from source? Maybe I don’t understand what you mean by this?

The vague strategy I am imagining for the build proces is this:

  1. compile runtime into a static library, just like any other static library

  2. collect all the ‘.o’ files in the static archive, and in any dependencies’ static archive libraries

  3. for each ‘foo.o’ build a ‘foo_munged.o’ using $tool, the _munged version has all symbols not on the whitelist for export to the instrumented binary

  4. put all of the _munged ‘.o’ files into a single runtime archive

The $tool here could be “ld -r” with a linker script, or (likely necessary on windows) a very simple, dedicated tool built around the LLVM object libraries to copy each symbol, munging the name.

Soon I will start integrating tsan into Go language. For the Go language we need very simple object files.

Ok… I’m not sure whether this should really constrain the way we build the core runtime system here though. If you need some logic on the tsan side factored out into a separate library for use with Go, that would seem simpler than trying to make one sanitizer runtime library to support frontends, middle ends, and programming languages with totally separate models.

No global ctors, no thread-local storage, no weak symbols and other trickery. Basically what a portable C compiler could have produced.

These also don’t seem insurmountable, even in the existing use cases. But maybe I’m not considering the actual restrictions you are, or I’ve misunderstood. Here is how I’m breaking down the things you’ve mentioned:

  1. It seems reasonable to avoid global constructors, and do-able in C++ even when using the standard library and parts of LLVM. LLVM itself specifically works to avoid them.

  2. TLS doesn’t seem to be required by anything I’m suggesting… is there something that worries you about this?

  3. I don’t understand the requirement to have no weak symbols. Even a portable C compiler might produce weak symbols? Still, during the re-linking phase above, it should be possible to resolve any weak symbols?

Hi,

Yes, stlport was a pain to deploy and maintain + it calls normal operator new/delete (there is no way to put them into a separate namespace).

Ok, but putting the raw symbols into a “namespace” with the linker shouldn’t be subject to these limitations.

OK

Note that in some codebases we build asan/tsan runtimes from source. How the build process will look with that object file mangling? How easy it is to integrate it into a custom build process?

Well, I don’t know yet. ;] It was an idea, I don’t have an implementation at this point. That said, I had only really imagined building the runtimes from source? Maybe I don’t understand what you mean by this?

The vague strategy I am imagining for the build proces is this:

  1. compile runtime into a static library, just like any other static library

  2. collect all the ‘.o’ files in the static archive, and in any dependencies’ static archive libraries

  3. for each ‘foo.o’ build a ‘foo_munged.o’ using $tool, the _munged version has all symbols not on the whitelist for export to the instrumented binary

  4. put all of the _munged ‘.o’ files into a single runtime archive

The $tool here could be “ld -r” with a linker script, or (likely necessary on windows) a very simple, dedicated tool built around the LLVM object libraries to copy each symbol, munging the name.

Soon I will start integrating tsan into Go language. For the Go language we need very simple object files.

Ok… I’m not sure whether this should really constrain the way we build the core runtime system here though. If you need some logic on the tsan side factored out into a separate library for use with Go, that would seem simpler than trying to make one sanitizer runtime library to support frontends, middle ends, and programming languages with totally separate models.

Yes, it will be a separate runtime library. But if tsan sources are deeply dependent on llvm sources, this may be significantly harder to do.

No global ctors, no thread-local storage, no weak symbols and other trickery. Basically what a portable C compiler could have produced.

These also don’t seem insurmountable, even in the existing use cases. But maybe I’m not considering the actual restrictions you are, or I’ve misunderstood. Here is how I’m breaking down the things you’ve mentioned:

  1. It seems reasonable to avoid global constructors, and do-able in C++ even when using the standard library and parts of LLVM. LLVM itself specifically works to avoid them.

Is it the case for C++ library that llvm uses?

  1. TLS doesn’t seem to be required by anything I’m suggesting… is there something that worries you about this?

I suspect that C/C++ library can use them.

  1. I don’t understand the requirement to have no weak symbols. Even a portable C compiler might produce weak symbols?

The linker does not understand them.

Still, during the re-linking phase above, it should be possible to resolve any weak symbols?

Well, most likely yes.

There may be additional limitations that I don’t know yet.

Hi,

Yes, stlport was a pain to deploy and maintain + it calls normal operator new/delete (there is no way to put them into a separate namespace).

Ok, but putting the raw symbols into a “namespace” with the linker shouldn’t be subject to these limitations.

OK

Note that in some codebases we build asan/tsan runtimes from source. How the build process will look with that object file mangling? How easy it is to integrate it into a custom build process?

Well, I don’t know yet. ;] It was an idea, I don’t have an implementation at this point. That said, I had only really imagined building the runtimes from source? Maybe I don’t understand what you mean by this?

The vague strategy I am imagining for the build proces is this:

  1. compile runtime into a static library, just like any other static library

  2. collect all the ‘.o’ files in the static archive, and in any dependencies’ static archive libraries

  3. for each ‘foo.o’ build a ‘foo_munged.o’ using $tool, the _munged version has all symbols not on the whitelist for export to the instrumented binary

  4. put all of the _munged ‘.o’ files into a single runtime archive

The $tool here could be “ld -r” with a linker script, or (likely necessary on windows) a very simple, dedicated tool built around the LLVM object libraries to copy each symbol, munging the name.

Soon I will start integrating tsan into Go language. For the Go language we need very simple object files.

Ok… I’m not sure whether this should really constrain the way we build the core runtime system here though. If you need some logic on the tsan side factored out into a separate library for use with Go, that would seem simpler than trying to make one sanitizer runtime library to support frontends, middle ends, and programming languages with totally separate models.

Yes, it will be a separate runtime library. But if tsan sources are deeply dependent on llvm sources, this may be significantly harder to do.

I think we should cross this bridge when we get there.

When we do, I suspect it will be reasonable, in a worst case situation, to abstract the business logic into an isolated shared component. My hope is that we won’t even need to…

No global ctors, no thread-local storage, no weak symbols and other trickery. Basically what a portable C compiler could have produced.

These also don’t seem insurmountable, even in the existing use cases. But maybe I’m not considering the actual restrictions you are, or I’ve misunderstood. Here is how I’m breaking down the things you’ve mentioned:

  1. It seems reasonable to avoid global constructors, and do-able in C++ even when using the standard library and parts of LLVM. LLVM itself specifically works to avoid them.

Is it the case for C++ library that llvm uses?

LLVM is extremely resistent to growing external dependencies specifically because it cannot control them. In particular the parts that a runtime is likely to use are very unlikely to grow any problematic dependencies here. Essentially, it is reasonable to assert that we have control over all of LLVM’s dependencies and can arrange for them to be very conservative here.

  1. TLS doesn’t seem to be required by anything I’m suggesting… is there something that worries you about this?

I suspect that C/C++ library can use them.

I would be very surprised if these parts of LLVM use them. If they did, I think it would be reasonable to make it optional and disable it in some circumstances.

  1. I don’t understand the requirement to have no weak symbols. Even a portable C compiler might produce weak symbols?

The linker does not understand them.

Still, during the re-linking phase above, it should be possible to resolve any weak symbols?

Well, most likely yes.

There may be additional limitations that I don’t know yet.

Sure, time will tell. That said, I don’t think future work to support Go should be the top priority in getting this system well integrated, and I don’t think there are any huge road blocks already clear at this stage related to Go.

Can we alter the build system so that when building a run-time library it modifies all .cpp files like this:
namespace FOO {

}
This will give us essentially the same thing, but w/o system dependent object file hackery.
Maybe we can add a Clang flag to add such a namespace for us?
(This approach, as well as Chandler’s original approach will have to deal with malloc, memset, strlen, etc which still need to reside in the global namespace)

–kcc

I think this is essentially what Dmitry was talking about w/ past STLport experience. It has lots of limitations:

  • You can’t use the normal system standard library
  • You have to build the standard library from source
  • You can’t wrap certain parts of it (operator new, delete, a few other things)
  • You can’t re-use any C libraries (zlib for example)

Can we alter the build system so that when building a run-time library it modifies all .cpp files like this:
namespace FOO {

}
This will give us essentially the same thing, but w/o system dependent object file hackery.
Maybe we can add a Clang flag to add such a namespace for us?

I think this is essentially what Dmitry was talking about w/ past STLport experience. It has lots of limitations:

Patching object files still sounds much scarier and harder to port.
I’d prefer to find a solution that involves only source files and maybe clang.
Pondering…

Perhaps you are solving a broader problem. But as for asan/tsan, we currently need only symbolizer, it’s separable from everything else, and can be made to not use STL.

Can we alter the build system so that when building a run-time library it modifies all .cpp files like this:
namespace FOO {

}
This will give us essentially the same thing, but w/o system dependent object file hackery.
Maybe we can add a Clang flag to add such a namespace for us?

I think this is essentially what Dmitry was talking about w/ past STLport experience. It has lots of limitations:

Patching object files still sounds much scarier and harder to port.
I’d prefer to find a solution that involves only source files and maybe clang.
Pondering…

  • You can’t use the normal system standard library
  • You have to build the standard library from source
  • You can’t wrap certain parts of it (operator new, delete, a few other things)
  • You can’t re-use any C libraries (zlib for example)

Perhaps you are solving a broader problem. But as for asan/tsan, we currently need only symbolizer,

Not just currently.
I really hope that we won’t need anything else.

If you want to share LLVM code for the object and dwarf reading, I do not believe this to be true at all.

I’ve already removed code for the object reading for exactly that reason, so now it’s just dwarf parsing :slight_smile: There are some CTL containers involved, but I think they can be replaced.

STL

Agree here. I hope to modify/extend this code soon anyway.

Folks, this is not the path to sharing code. This is the path to forking code.

Let’s go back to the very premise: I think it is highly desirable to be capable of building runtimes such as ASan and TSan and share code rather than forking it.

I have reasons: I have seen the creation of at least three separate ELF and/or DWARF parsing libraries thus far. I have seen a long series of bugs found and fixed in them over the course of years, often the same bug, often with great expense in debugging to understand why. I don’t want us to keep paying this cost. I don’t think these pieces of code are likely to be alone in this.

Now, perhaps I am wrong, and it is not worth it. Thus far, I don’t hear any convincing arguments to that effect, but I’m very willing to believe I’m wrong as I don’t work on one of these runtimes, and so don’t have a direct appreciation for all of the costs involved.

But let’s be extremely clear on what you are suggesting: you are specifically doing away with the very idea of sharing code with the rest of the LLVM project, and instead deciding to fork and write custom code in the runtime for all functionality.