moving libfuzzer to compiler-rt?

Hi All,

Currently libfuzzer depends on (often freshly built) clang, yet the dependency is not explicitly specified
in cmake.
That leads to various issues: for instance, it’s not possible to check out LLVM repo and run libfuzzer
tests: one would often need to compile fresh clang first, and then create a separate build directory,
where libfuzzer could be tested.
For the buildbot this problem is approached by grabbing a freshly built binary from another buildbot
and using that for testing.

Needless to say, that could be quite annoying.
Additionally, my recent changes start using libfuzzer from Clang driver: and finding the actual archive file
requires some hardcoding of directory paths, as one would need to go up the tree from the Clang binary
(in swift, for example, the situation is even worse, as the path to Clang is a symlink, and getting an archive file
from the LLVM tree would require going quite a few levels up).

by moving libfuzzer to compiler-rt, where (other) sanitizers already reside.

Any thoughts on the suggestion?

It would be still possible to compile just libfuzzer with no dependencies, by simply making a partial checkout from SVN,
and only the repo path would change.

George

Hi All,

Currently libfuzzer depends on (often freshly built) clang, yet the
dependency is not explicitly specified
in cmake.

Correct.

That leads to various issues: for instance, it’s not possible to check out
LLVM repo and run libfuzzer
tests: one would often need to compile fresh clang first, and then create
a separate build directory,
where libfuzzer could be tested.
For the buildbot this problem is approached by grabbing a freshly built
binary from another buildbot
and using that for testing.

And correct again.

Needless to say, that could be quite annoying.
Additionally, my recent changes start using libfuzzer from Clang driver:
and finding the actual archive file
requires some hardcoding of directory paths, as one would need to go up
the tree from the Clang binary
(in swift, for example, the situation is even worse, as the path to Clang
is a symlink, and getting an archive file
from the LLVM tree would require going quite a few levels up).

From my understanding, all these problems can be solved entirely
by moving libfuzzer to compiler-rt, where (other) sanitizers already
reside.

Yes, that might be a reasonable thing to do.
I am trying to remember the reasons why we've put libFuzzer into llvm and
not into compiler-rt in the first place, and failing to do so.

Any thoughts on the suggestion?

It would be still possible to compile just libfuzzer with no dependencies,
by simply making a partial checkout from SVN,
and only the repo path would change.

This would cause some annoyance for me and maybe some users, but nothing we
can't tolerate.
Still what are our other options?

* wait for the mono repo to happen, then we'll be able to make libFuzzer
depend on clang. (Or no?)
* move libFuzzer to a separate repo, parallel to compiler-rt? (not a large
win, just listing as a choice)
* anything else?

Does anyone see good reasons why libFuzzer should remain in llvm repo (as
opposed to moving it to compiler-rt)?

--kcc

From my understanding, all these problems can be solved entirely

by moving libfuzzer to compiler-rt, where (other) sanitizers already

reside.

Yes, that might be a reasonable thing to do.
I am trying to remember the reasons why we've put libFuzzer into llvm and
not into compiler-rt in the first place, and failing to do so.

IIRC you thought libFuzzer would be more tightly coupled to LLVM
instrumentation, so the goal was to reap monorepo-like productivity
developments without waiting for it to happen?

Maybe it was a licensing concern, UIUC vs dual UIUC/MIT?

Any thoughts on the suggestion?

It would be still possible to compile just libfuzzer with no
dependencies, by simply making a partial checkout from SVN,
and only the repo path would change.

This would cause some annoyance for me and maybe some users, but nothing
we can't tolerate.
Still what are our other options?

* wait for the mono repo to happen, then we'll be able to make libFuzzer
depend on clang. (Or no?)
* move libFuzzer to a separate repo, parallel to compiler-rt? (not a large
win, just listing as a choice)
* anything else?

Does anyone see good reasons why libFuzzer should remain in llvm repo (as
opposed to moving it to compiler-rt)?

I'd like to move libFuzzer out of llvm/lib/, since it's definitely a
runtime library, and not compiler library infrastructure. compiler-rt seems
like the best place to put it for now, unless there are licensing concerns.

Mechanically, we can do this in one history-preserving commit by checking
out the entire SVN repo the way that the git-llvm script does. SVN users
and users of https://github.com/llvm-project/llvm-project will still see
the right history. The final monorepo will presumably also have accurate
history.

Reid Kleckner via llvm-dev <llvm-dev@lists.llvm.org> writes:

From my understanding, all these problems can be solved entirely

by moving libfuzzer to compiler-rt, where (other) sanitizers already

reside.

Yes, that might be a reasonable thing to do.
I am trying to remember the reasons why we've put libFuzzer into llvm and
not into compiler-rt in the first place, and failing to do so.

IIRC you thought libFuzzer would be more tightly coupled to LLVM
instrumentation, so the goal was to reap monorepo-like productivity
developments without waiting for it to happen?

Maybe it was a licensing concern, UIUC vs dual UIUC/MIT?

Any thoughts on the suggestion?

It would be still possible to compile just libfuzzer with no
dependencies, by simply making a partial checkout from SVN,
and only the repo path would change.

This would cause some annoyance for me and maybe some users, but nothing
we can't tolerate.
Still what are our other options?

* wait for the mono repo to happen, then we'll be able to make libFuzzer
depend on clang. (Or no?)
* move libFuzzer to a separate repo, parallel to compiler-rt? (not a large
win, just listing as a choice)
* anything else?

Does anyone see good reasons why libFuzzer should remain in llvm repo (as
opposed to moving it to compiler-rt)?

I'd like to move libFuzzer out of llvm/lib/, since it's definitely a
runtime library, and not compiler library infrastructure. compiler-rt seems
like the best place to put it for now, unless there are licensing concerns.

Maybe we should just move it under llvm/runtimes and set it up with a
fairly simple cmake config? I think this makes it pretty trivial to
build it with the just-built clang. It doesn't have terribly much to do
with compiler-rt IMO - this isn't something that enables a compiler
feature per se, it's more of its own thing.

Reid Kleckner via llvm-dev <llvm-dev@lists.llvm.org> writes:
>
>>
>>> From my understanding, all these problems can be solved entirely
>>>
>> by moving libfuzzer to compiler-rt, where (other) sanitizers already
>>> reside.
>>>
>>
>> Yes, that might be a reasonable thing to do.
>> I am trying to remember the reasons why we've put libFuzzer into llvm
and
>> not into compiler-rt in the first place, and failing to do so.
>>
>
> IIRC you thought libFuzzer would be more tightly coupled to LLVM
> instrumentation, so the goal was to reap monorepo-like productivity
> developments without waiting for it to happen?
>
> Maybe it was a licensing concern, UIUC vs dual UIUC/MIT?
>
>
>> Any thoughts on the suggestion?
>>>
>>> It would be still possible to compile just libfuzzer with no
>>> dependencies, by simply making a partial checkout from SVN,
>>> and only the repo path would change.
>>>
>>
>> This would cause some annoyance for me and maybe some users, but nothing
>> we can't tolerate.
>> Still what are our other options?
>>
>> * wait for the mono repo to happen, then we'll be able to make libFuzzer
>> depend on clang. (Or no?)
>> * move libFuzzer to a separate repo, parallel to compiler-rt? (not a
large
>> win, just listing as a choice)
>> * anything else?
>>
>> Does anyone see good reasons why libFuzzer should remain in llvm repo
(as
>> opposed to moving it to compiler-rt)?
>>
>
> I'd like to move libFuzzer out of llvm/lib/, since it's definitely a
> runtime library, and not compiler library infrastructure. compiler-rt
seems
> like the best place to put it for now, unless there are licensing
concerns.

Maybe we should just move it under llvm/runtimes and set it up with a
fairly simple cmake config?

can we?
If we can, then we probably can also leave the code where it is, and tweak
the cmake to use just-built-clang. No?

Kostya Serebryany <kcc@google.com> writes:

>
> can we?
> If we can, then we probably can also leave the code where it is, and
tweak
> the cmake to use just-built-clang. No?

Absolutely - this shouldn't be all that hard. I'd thought about taking a
crack at it a while ago but I haven't had the time.

The only complicated part is deciding what to do when we're not building
clang - should we build libFuzzer with the host compiler in this case,
or not build it at all?

not build at all

IANAL, but I have a few licensing related concerns about a some of the things being discussed on this thread.

(1) We can’t move libFuzzer to compiler-rt without re-licesning the code or making the compiler-rt repository contain code covered under different licenses. This is because compiler-rt is licensed differently from LLVM. IMO, this should make moving the code into compiler-rt untenable at least until after the project’s general licensing discussion reaches some form of resolution.

(2) libFuzzer is licensed today under LLVM’s UIUC license which requires binary attribution. If we’re adding options to the clang driver, and potentially including libFuzzer in distributions of LLVM, this licensing requirement becomes problematic for our users. We should consider how bet to approach this.

I do believe that Justin’s solution of structuring libFuzzer as a standalone project and moving it under llvm/runtimes does side-step some of the licensing complications, and it is a good technical solution to the problem at least in the interim.

I think waiting for the monorepo to happen is a bad idea. If only from the perspective that we haven’t actually had that decision made yet, so waiting for something that may not actually happen isn’t a great solution.

-Chris

From my understanding, the entire file `compiler-rt/cmake/modules/CompileRTCompile.cmake` is dedicated to using the trick

of using the just-built-compiler for compiling the sources.
Copying it seems suboptimal.
I can move it to `llvm/cmake/modules`, but that would be one step away from the ability to built `compiler-rt` without the parent repository.
Should that matter?

not build at all

Why not? libfuzzer itself does not require instrumentation (at least from my understanding),
the actual problem arises with tests.

From my understanding, the entire file `compiler-rt/cmake/modules/CompileRTCompile.cmake`
is dedicated to using the trick
of using the just-built-compiler for compiling the sources.

Copying it seems suboptimal.

Yes, copying large chunks of cmake is bad.

I can move it to `llvm/cmake/modules`, but that would be one step away
from the ability to built `compiler-rt` without the parent repository.
Should that matter?

This is where my cmake knowledge ends, perhaps more knowledgeable folks
could comment.

> not build at all

Why not? libfuzzer itself does not require instrumentation (at least from
my understanding),

libFuzzer usually does not need instrumentation, but if one wants to couple
it with msan -- then it has to be instrumented.
Maybe it's ok to build libFuzzer w/o its tests -- just need to remember
that the result is untested.

Does anyone see good reasons why libFuzzer should remain in llvm repo (as
opposed to moving it to compiler-rt)?

Does moving LibFuzzer to compiler-rt imply that it is compiled as part
of compiler-rt and shipped with it?

How does that fit with LibFuzzer's model of allowing the user to
provide their own `main()`. Would you just build two different
libraries. One with a `main()` and one without?

Thanks,
Dan.

> Does anyone see good reasons why libFuzzer should remain in llvm repo (as
> opposed to moving it to compiler-rt)?

Does moving LibFuzzer to compiler-rt imply that it is compiled as part
of compiler-rt and shipped with it?

How does that fit with LibFuzzer's model of allowing the user to
provide their own `main()`.

libFuzzer doesn't allow users to use their own main (not any more).
Although I am not sure how that's related to moving libFuzzer somewhere.

That’s neither here nor there, but at least in my experience on Darwin,
somehow if I link something which already has its main with libFuzzer,
user’s main is executed, and libFuzzer's main is ignored.

Oops. That shows how long it's been since I looked at the source code.

It was related in that if LibFuzzer was shipped as part of compiler-rt
I presumed we would need to supply both libraries to end users.
Given that this feature was removed it is a non-issue.

From my understanding, the entire file `compiler-rt/cmake/modules/CompileRTCompile.cmake` is dedicated to using the trick

Actually no. The important bits of CMake to use the just-built compiler are in Clang and LLVM's CMake not compiler-rt's. The stuff in there is for the tests not the actual libraries.

This is why Bogner's suggestion of using LLVM/runtimes makes sense because that is where all the CMake goop in LLVM is exposed.

-Chris

Again, after offline conversation with Chris Bieneman:

  • move to compiler-rt would be too complicated due to change in licenses
  • it would make much more sense to move to “tools” folder instead, for the following reasons:
  • conceptually, it’s a tool, not a library
  • all other projects in “lib” depend on LLVM and can not build without LLVM, libFuzzer does not
  • practically speaking, CMake has no way of knowing whether Clang is being built when
    “lib” is compiled, yet it does know for projects in tools.

Using a freshly built clang for projects in “tools” is embarrassingly easy and only requires a couple of lines
of configuration change.

Kostya, what about moving to “tools” then?

Again, after offline conversation with Chris Bieneman:

- move to compiler-rt would be too complicated due to change in licenses
- it would make much more sense to move to “tools” folder instead, for
the following reasons:
    * conceptually, it’s a tool, not a library
    * all other projects in “lib” depend on LLVM and can not build without
LLVM, libFuzzer does not
    * practically speaking, CMake has no way of knowing whether Clang is
being built when
      “lib” is compiled, yet it does know for projects in tools.

Using a freshly built clang for projects in “tools” is embarrassingly easy
and only requires a couple of lines
of configuration change.

Kostya, what about moving to “tools” then?

Well, ok, this sounds cool.
But can we make one more step and try to preserve the code where it is, for
the sake of compatibility?
E.g. can we have the CMake in tools while still keeping the code in lib?
Or a link of some kind?

My worry is that there are already quite a few places that know where
libFuzzer code is,
and I don't control all of them.

And, finally, I really don't get why we can do something in tools and can't
do the same in lib.
Or we simply don't want to do it to keep things simple?

--kcc

+Chris.

My understanding was that it is technically impossible for things in “lib”, as they are built first, and there’s no way to tell them to do that before “clang”.
I’m not a CMake expert, and I might be wrong.

+Chris.

My understanding was that it is technically impossible for things in “lib”, as they are built first, and there’s no way to tell them to do that before “clang”.
I’m not a CMake expert, and I might be wrong.

It is not impossible, it would just involve excessive hacks. Since it seems like this isn’t a short-term solution we’re talking about I am very opposed to throwing hacks into the build system. I’d rather we actually fix the problem(s). More below.

Again, after offline conversation with Chris Bieneman:

  • move to compiler-rt would be too complicated due to change in licenses
  • it would make much more sense to move to “tools” folder instead, for the following reasons:
  • conceptually, it’s a tool, not a library
  • all other projects in “lib” depend on LLVM and can not build without LLVM, libFuzzer does not
  • practically speaking, CMake has no way of knowing whether Clang is being built when
    “lib” is compiled, yet it does know for projects in tools.

Using a freshly built clang for projects in “tools” is embarrassingly easy and only requires a couple of lines
of configuration change.

Kostya, what about moving to “tools” then?

Well, ok, this sounds cool.
But can we make one more step and try to preserve the code where it is, for the sake of compatibility?

Please no. This code doesn’t actually belong in lib, it has never fit the model of an LLVM library, we really need to pull it out of there.

E.g. can we have the CMake in tools while still keeping the code in lib?

Could we contrive a hack in the build system to do it? Yes, but I will fight violently against allowing that change into the build system because the right answer here is to move the code.

Or a link of some kind?

Links are incredibly fragile on Windows, and they trip up a lot of SCM tools. We have one in LLDB’s repo that causes me nothing but problems, so I am also strongly opposed to that.

My worry is that there are already quite a few places that know where libFuzzer code is,
and I don’t control all of them.

Downstream clients will have to update. That is kinda how these things work, I can’t imagine re-pointing an SCM checkout being a huge burden.

And, finally, I really don’t get why we can do something in tools and can’t do the same in lib.
Or we simply don’t want to do it to keep things simple?

Not all functionality in CMake is order-independent. Specifically the detection of targets is not. In order to support what you’re trying to do you are going to change behavior based on the presence of the clang target. Which means the clang target must be added before your CMake is processed.

To support this our build system has strict ordering requirements such that things in lib cannot depend on things in tools. If you need to depend on clang, you need to not be in lib.

Also, generally speaking Fuzzer is a library under lib that also has nested tests, which is not how the lib directory is supposed to be structured. It never should have been allowed to be structured like that. If you want the tests next to the library, it is a tool or a runtime, but not a lib.

As I see there are two options to move forward with, and it really depends on how you intend to use the just-built clang.

(1) If you want to use just-built clang to build libFuzzer and its tests, it should be a runtime.
(2) If you want to use just-built clang to only build libFuzzer’s tests, it should be a tool.

I think that since it is a runtime library, it should be a runtime, and I expect it would mostly work to just copy the Fuzzer directory into llvm/runtimes.

-Chris

Thanks for the explanations! (it was worth asking)

I do want to build libFuzzer itself (and its tests) using the just-built clang. So, llvm/runtimes then.
I’d name the directory llvm/runtimes/libFuzzer, if possible (the old path was lib/Fuzzer which is how the tool got it’s name, actually)
George, would you like to send the change for review?

–kcc