RFC: LTO should use -disable-llvm-verifier

The verifier takes ~5% of link time when using LTO. I think we
should add a `-disable-llvm-verifier` option to the LTO plugins, and
change the clang driver to pass the option through in release builds.
In asserts builds, the clang driver would not pass the option.

This would match the way the driver passes -disable-llvm-verifier to
-cc1.

Everyone on board?

Why is it so slow?

Joerg

> The verifier takes ~5% of link time when using LTO.

Why is it so slow?

I would guess it's because of the seer size of post-LTO bins.

That would explain a large absolute number, but not a large relative
number.

Joerg

For cc1 we disable it for IR generated immediately by clang. We don’t disable it for IR or bitcode inouts provided by the user.

So I’d be a bit hesitant to disable it for lto because it is an unrestricted input at lto time.

(But this is hardly my wheelhouse. So grain of salt and all)

Random thoughts.

  • Could we implement lightweight verifier?
  • Verifier can run in parallel in some cases.
  • Any mechanism for credit emitted by same version of LLVM bitwriter, like signature.

I think we should be formal in user inputs, too.

2015年8月30日(日) 9:59 David Blaikie via llvm-dev <llvm-dev@lists.llvm.org>:

SInce Bitcode is well-structured, Verifier might overwork for one.
Could we introduce “Verifier for BitReader” and “Verifier for IRParser” as input verifier?

The verifier takes ~5% of link time when using LTO. I think we
should add a `-disable-llvm-verifier` option to the LTO plugins, and
change the clang driver to pass the option through in release builds.
In asserts builds, the clang driver would not pass the option.

This would match the way the driver passes -disable-llvm-verifier to
-cc1.

Everyone on board?

I would prefer to keep it on by default; in the cc1 case we know that the
input was immediately produced by clang (so we have a pretty high
confidence that it is correct), which is not the case in LTO.

The verifier should be pretty much the cheapest of all passes. If it is
taking 5%, then that means that we can be running a max of 20 passes, which
seems off (especially considering that some passes should be *enormously*
more expensive, like codegen). Is 5% a number you measured in a profiler or
an empirical difference when running with/without -disable-llvm-verifier?
Do you have a breakdown of where that 5% is coming from? Is the number
consistent across different programs (e.g. of different sizes)?

-- Sean Silva

The verifier takes ~5% of link time when using LTO. I think we
should add a `-disable-llvm-verifier` option to the LTO plugins, and
change the clang driver to pass the option through in release builds.
In asserts builds, the clang driver would not pass the option.

This would match the way the driver passes -disable-llvm-verifier to
-cc1.

Everyone on board?

I would prefer to keep it on by default; in the cc1 case we know that the
input was immediately produced by clang (so we have a pretty high
confidence that it is correct), which is not the case in LTO.

The verifier should be pretty much the cheapest of all passes. If it is
taking 5%, then that means that we can be running a max of 20 passes,

Not quite sure I understand this conclusion - not all passes take similar
execution time (& especially not on larger inputs where they scale
differently, etc), right?

Having it off by default makes sense to me. We just need an easy way of enabling it from the clang driver.

Having it off by default makes sense to me. We just need an easy way of
enabling it from the clang driver.

Not sure I follow? Generally LTO inputs are going to be "user provided" (in
the sense that they're not produced immediately prior by the same process -
or you'd have just produced a single Module in the first place, I would
imagine) so changing the default still seems problematic in the sense of
allowing 'unbounded' input without verification...

Not sure I follow? Generally LTO inputs are going to be “user provided” (in the sense that they’re not produced immediately prior by the same process - or you’d have just produced a single Module in the first place, I would imagine) so changing the default still seems problematic in the sense of allowing ‘unbounded’ input without verification…

The common case is for the bitcode to be generated by a paired clang. Even when it is an old bitcode compiled module, the Module itself is created by the bitcode reader.

Cheers, Rafael

Sure, but it is not uncommon to LTO with old bitcode. We all know it's
pretty easy to crash LLVM with bad bitcode or bad IR. These interfaces are
not thoroughly tested.

I think verifying the result of the bitcode reader by default during LTO is
probably the right thing for the foreseeable future. It's the only thing
that has any hope of telling the user something useful when things go wrong.

I'd like it if we spent a little effort understanding why it's slow before
flipping it off. Maybe the verifier is running multiple times, instead of
after deserialization. We shouldn't need that in release builds.

LTO runs the verifier three times:

1. On each input module after it's parsed from bitcode.
2. At the beginning of the optimization pipeline (post lib/Linker).
3. At the end of the optimization pipeline.

If we're worried about user input, then I agree we should still run it
at (1). But I don't think we need to run it at (2) and (3). Maybe we
agree on this?

Someone asked elsewhere in the thread about numbers: I had a look at a
CPU profile (for linking verify-uselistorder with debug info). For
this (not-necessarily-representative) sample, (1) takes 1.6% of ld64
runtime, and (2) and (3) combined take 3.2% of ld64 runtime. Total of
4.8%.

Not sure I follow? Generally LTO inputs are going to be “user provided” (in the sense that they’re not produced immediately prior by the same process - or you’d have just produced a single Module in the first place, I would imagine) so changing the default still seems problematic in the sense of allowing ‘unbounded’ input without verification…

The common case is for the bitcode to be generated by a paired clang. Even when it is an old bitcode compiled module, the Module itself is created by the bitcode reader.

Sure, but it is not uncommon to LTO with old bitcode. We all know it’s pretty easy to crash LLVM with bad bitcode or bad IR. These interfaces are not thoroughly tested.

I think verifying the result of the bitcode reader by default during LTO is probably the right thing for the foreseeable future. It’s the only thing that has any hope of telling the user something useful when things go wrong.

I’d like it if we spent a little effort understanding why it’s slow before flipping it off. Maybe the verifier is running multiple times, instead of after deserialization. We shouldn’t need that in release builds.

LTO runs the verifier three times:

  1. On each input module after it’s parsed from bitcode.
  2. At the beginning of the optimization pipeline (post lib/Linker).
  3. At the end of the optimization pipeline.

If we’re worried about user input, then I agree we should still run it
at (1). But I don’t think we need to run it at (2) and (3). Maybe we
agree on this?

I agree. The last two in debug mode perhaps?

-eric

>
>
> > Not sure I follow? Generally LTO inputs are going to be "user provided" (in the sense that they're not produced immediately prior by the same process - or you'd have just produced a single Module in the first place, I would imagine) so changing the default still seems problematic in the sense of allowing 'unbounded' input without verification...
>
> The common case is for the bitcode to be generated by a paired clang. Even when it is an old bitcode compiled module, the Module itself is created by the bitcode reader.
>
> Sure, but it is not uncommon to LTO with old bitcode. We all know it's pretty easy to crash LLVM with bad bitcode or bad IR. These interfaces are not thoroughly tested.
>
> I think verifying the result of the bitcode reader by default during LTO is probably the right thing for the foreseeable future. It's the only thing that has any hope of telling the user something useful when things go wrong.
>
> I'd like it if we spent a little effort understanding why it's slow before flipping it off. Maybe the verifier is running multiple times, instead of after deserialization. We shouldn't need that in release builds.

LTO runs the verifier three times:

1. On each input module after it's parsed from bitcode.
2. At the beginning of the optimization pipeline (post lib/Linker).
3. At the end of the optimization pipeline.

If we're worried about user input, then I agree we should still run it
at (1). But I don't think we need to run it at (2) and (3). Maybe we
agree on this?

I agree. The last two in debug mode perhaps?

I figure it'd still be nice to be able to control it explicitly, so have
the clang driver pass -disable-llvm-verifier in no-asserts, but *always*
verify (1) regardless of that option. IOW, that makes sense to me :).

>
>
> > Not sure I follow? Generally LTO inputs are going to be "user
provided" (in the sense that they're not produced immediately prior by the
same process - or you'd have just produced a single Module in the first
place, I would imagine) so changing the default still seems problematic in
the sense of allowing 'unbounded' input without verification...
>
> The common case is for the bitcode to be generated by a paired clang.
Even when it is an old bitcode compiled module, the Module itself is
created by the bitcode reader.
>
> Sure, but it is not uncommon to LTO with old bitcode. We all know it's
pretty easy to crash LLVM with bad bitcode or bad IR. These interfaces are
not thoroughly tested.
>
> I think verifying the result of the bitcode reader by default during LTO
is probably the right thing for the foreseeable future. It's the only thing
that has any hope of telling the user something useful when things go wrong.
>
> I'd like it if we spent a little effort understanding why it's slow
before flipping it off. Maybe the verifier is running multiple times,
instead of after deserialization. We shouldn't need that in release builds.

LTO runs the verifier three times:

1. On each input module after it's parsed from bitcode.
2. At the beginning of the optimization pipeline (post lib/Linker).
3. At the end of the optimization pipeline.

If we're worried about user input, then I agree we should still run it
at (1). But I don't think we need to run it at (2) and (3). Maybe we
agree on this?

I would agree on that.

>
>
>
>
> >
> >
> > > Not sure I follow? Generally LTO inputs are going to be "user
provided" (in the sense that they're not produced immediately prior by the
same process - or you'd have just produced a single Module in the first
place, I would imagine) so changing the default still seems problematic in
the sense of allowing 'unbounded' input without verification...
> >
> > The common case is for the bitcode to be generated by a paired clang.
Even when it is an old bitcode compiled module, the Module itself is
created by the bitcode reader.
> >
> > Sure, but it is not uncommon to LTO with old bitcode. We all know it's
pretty easy to crash LLVM with bad bitcode or bad IR. These interfaces are
not thoroughly tested.
> >
> > I think verifying the result of the bitcode reader by default during
LTO is probably the right thing for the foreseeable future. It's the only
thing that has any hope of telling the user something useful when things go
wrong.
> >
> > I'd like it if we spent a little effort understanding why it's slow
before flipping it off. Maybe the verifier is running multiple times,
instead of after deserialization. We shouldn't need that in release builds.
>
> LTO runs the verifier three times:
>
> 1. On each input module after it's parsed from bitcode.
> 2. At the beginning of the optimization pipeline (post lib/Linker).
> 3. At the end of the optimization pipeline.
>
> If we're worried about user input, then I agree we should still run it
> at (1). But I don't think we need to run it at (2) and (3). Maybe we
> agree on this?
>
>
> I agree. The last two in debug mode perhaps?

I figure it'd still be nice to be able to control it explicitly, so have
the clang driver pass -disable-llvm-verifier in no-asserts, but *always*
verify (1) regardless of that option. IOW, that makes sense to me :).

Sure, something like that could be reasonable.

(essentially anything that parallels clang's behavior when given bitcode/IR
as direct input I think is the right model (they should be consistent, and
perhaps there's stuff we need to discuss about what set of things will be
the consistent behavior, but something like "sanitize inputs, assume
intermediate results are valid (if they aren't, it's bugs) and verify
intermediate results in +Asserts builds", maybe.. *shrug*))

>
>
> > Not sure I follow? Generally LTO inputs are going to be "user
provided" (in the sense that they're not produced immediately prior by the
same process - or you'd have just produced a single Module in the first
place, I would imagine) so changing the default still seems problematic in
the sense of allowing 'unbounded' input without verification...
>
> The common case is for the bitcode to be generated by a paired clang.
Even when it is an old bitcode compiled module, the Module itself is
created by the bitcode reader.
>
> Sure, but it is not uncommon to LTO with old bitcode. We all know it's
pretty easy to crash LLVM with bad bitcode or bad IR. These interfaces are
not thoroughly tested.
>
> I think verifying the result of the bitcode reader by default during LTO
is probably the right thing for the foreseeable future. It's the only thing
that has any hope of telling the user something useful when things go wrong.
>
> I'd like it if we spent a little effort understanding why it's slow
before flipping it off. Maybe the verifier is running multiple times,
instead of after deserialization. We shouldn't need that in release builds.

LTO runs the verifier three times:

1. On each input module after it's parsed from bitcode.
2. At the beginning of the optimization pipeline (post lib/Linker).
3. At the end of the optimization pipeline.

If we're worried about user input, then I agree we should still run it
at (1). But I don't think we need to run it at (2) and (3). Maybe we
agree on this?

That sounds fine.

Someone asked elsewhere in the thread about numbers: I had a look at a
CPU profile (for linking verify-uselistorder with debug info). For
this (not-necessarily-representative) sample, (1) takes 1.6% of ld64
runtime, and (2) and (3) combined take 3.2% of ld64 runtime. Total of
4.8%.

Okay, this makes more sense where the 5% is coming from. Thanks for digging
in!

-- Sean Silva

Yep. This is where I was going :slight_smile:

-eric