libFuzzer: issue with weak symbols on Mac

I’d like to discuss the following change: https://reviews.llvm.org/D37526

For the context, there is a comment in compiler-rt/lib/fuzzer/FuzzerExtFunctionsWeak.cpp:

// Implementation for Linux. This relies on the linker’s support for weak
// symbols. We don’t use this approach on Apple platforms because it requires
// clients of LibFuzzer to pass -U _<symbol_name> to the linker to allow
// weak symbols to be undefined. That is a complication we don’t want to expose
// to clients right now.

That makes sense, but with current implementation, you cannot use libFuzzer’s interface functions other than LLVMFuzzerTestOneInput. Below is a small example to verify that LLVMFuzzerInitialize is not being called on Mac:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv) {
  printf("Hello from LLVMFuzzerInitialize, argc: %i\n", *argc);
  return *argc;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  printf("Hello from LLVMFuzzerTestOneInput, size: %zu\n", size);
  if (size) {
  	return data[0];
  }
  return size;
}

Assuming that there are libFuzzer customers who don’t mind to specify “-U,_%function_name%” explicitly (e.g. https://chromium-review.googlesource.com/c/chromium/src/+/653846/1/testing/libfuzzer/BUILD.gn), we need to have a way to use FuzzerExtFunctionsWeak.cpp instead of FuzzerExtFunctionsDlsym.cpp on Mac.

The CL I’ve uploaded feels a bit hacky to me, but I don’t see any less intrusive solution that would still comply with existing implementation and would also support weak symbols to be explicitly allowed if needed.

Thanks!

Max Moroz via llvm-dev <llvm-dev@lists.llvm.org> writes:

I'd like to discuss the following change: https://reviews.llvm.org/D37526

For the context, there is a comment
in compiler-rt/lib/fuzzer/FuzzerExtFunctionsWeak.cpp:

// Implementation for Linux. This relies on the linker's support for weak
// symbols. We don't use this approach on Apple platforms because it
requires
// clients of LibFuzzer to pass ``-U _<symbol_name>`` to the linker to allow
// weak symbols to be undefined. That is a complication we don't want to
expose
// to clients right now.

That makes sense, but with current implementation, you cannot use
libFuzzer's interface functions other than LLVMFuzzerTestOneInput. Below is
a small example to verify that LLVMFuzzerInitialize is not being called on
Mac:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv) {
  printf("Hello from LLVMFuzzerInitialize, argc: %i\n", *argc);
  return *argc;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  printf("Hello from LLVMFuzzerTestOneInput, size: %zu\n", size);
  if (size) {
    return data[0];
  }
  return size;
}

I suspect you might be mistaken about the problem, and what's actually
happening is that the linker is dead stripping your hook functions. At
least, I've had plenty of success with fuzzers on macOS with
LLVMFuzzerInitialize and LLVMFuzzerCustomMutator.

Try adding __attribute__((__used__)) to LLVMFuzzerInitialize and see if
that fixes the problem for you:

  extern "C" __attribute__((__used__)) int LLVMFuzzerInitialize(...)

Assuming that there are libFuzzer customers who don't mind to specify
"-U,_%function_name%" explicitly (e.g.
https://chromium-review.googlesource.com/c/chromium/src/+/653846/1/testing/libfuzzer/BUILD.gn),
we need to have a way to use FuzzerExtFunctionsWeak.cpp instead
of FuzzerExtFunctionsDlsym.cpp on Mac.

All of this seems unnecessarily awkward - the correct way to use weak
symbols on macOS is just to provide a default implementation that does
nothing. The function call overhead isn't that much worse than the
branch overhead to avoid calling it.

Max Moroz via llvm-dev <llvm-dev@lists.llvm.org> writes:

I'd like to discuss the following change: https://reviews.llvm.org/D37526

For the context, there is a comment
in compiler-rt/lib/fuzzer/FuzzerExtFunctionsWeak.cpp:

// Implementation for Linux. This relies on the linker's support for weak
// symbols. We don't use this approach on Apple platforms because it
requires
// clients of LibFuzzer to pass ``-U _<symbol_name>`` to the linker to allow
// weak symbols to be undefined. That is a complication we don't want to
expose
// to clients right now.

That makes sense, but with current implementation, you cannot use
libFuzzer's interface functions other than LLVMFuzzerTestOneInput. Below is
a small example to verify that LLVMFuzzerInitialize is not being called on
Mac:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv) {
printf("Hello from LLVMFuzzerInitialize, argc: %i\n", *argc);
return *argc;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
printf("Hello from LLVMFuzzerTestOneInput, size: %zu\n", size);
if (size) {
   return data[0];
}
return size;
}

I suspect you might be mistaken about the problem, and what's actually
happening is that the linker is dead stripping your hook functions. At
least, I've had plenty of success with fuzzers on macOS with
LLVMFuzzerInitialize and LLVMFuzzerCustomMutator.

Try adding __attribute__((__used__)) to LLVMFuzzerInitialize and see if
that fixes the problem for you:

extern "C" __attribute__((__used__)) int LLVMFuzzerInitialize(…)

Moreover, libFuzzer tests do run on Mac,
and they do use other interface functions.

Another way to avoid that is simply not to request dead stripping.

Thanks Justin and George. Your answers are helpful. I took a look at the tests (also uploaded a little change: https://reviews.llvm.org/D37721).

It looks like everything works fine on mac when using -fsanitize=fuzzer. If I switch to manual linking against libFuzzer, -Wl,-dead_strip is a culprit indeed. However, removing -Wl,-dead_strip from compilation flags would be a regression rather than an improvement.

As for __attribute__((__used__)), that also works, but doesn’t scale. If I have hundreds of fuzz targets, I should go through all of them and append that attribute. Also, I have to make sure that every new fuzz target has the attribute specified.

Anyway, it feels like the best way now is to migrate to -fsanitize=fuzzer, as it works well and simplifies other things as well. Thanks for the help!