Disable certain llvm optimizations at clang frontend

Hi,

This is to seek some advice on how to disable certain llvm
optimizations when compiled with clang for BPF backend. For example,
we want to disable two optimizations, instcombine and simplifyCFG. I
would like to get some advice about what is the best way to do it.

Some Background on Why We want to do this

Hi,

[resend with proper cfe-dev tag]

This is to seek some advice on how to disable certain llvm
optimizations when compiled with clang for BPF backend. For example,
we want to disable two optimizations, instcombine and simplifyCFG. I
would like to get some advice about what is the best way to do it.

Some Background on Why We want to do this

Hi,

[resend with proper cfe-dev tag]

This is to seek some advice on how to disable certain llvm
optimizations when compiled with clang for BPF backend. For example,
we want to disable two optimizations, instcombine and simplifyCFG. I
would like to get some advice about what is the best way to do it.

Some Background on Why We want to do this

Any BPF code needs to go through a kernel verifier
    https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c
before it can be executed in the kernel. verifier tries to ensure the safety
of the program, no illegal memory access, etc. In certain cases, clang
generated optimized codes make verification very hard, see pending
commits with another approach focused on BPF backend and some core
optimizations.
   https://reviews.llvm.org/D72787
   https://reviews.llvm.org/D82112
   https://reviews.llvm.org/D81859
To workaround the issue, people have to write inline asm or tweak
their program in a very user-unfriendly way to disable certain
transformations, in particular, instcombine and simplifyCFG.

Alternative Method in clang Frontend

The above approach tried to disable/undo/influence specific
optimizations. Another approach is to disable instcombine and
simplifyCFG during compilation process. This is a little bit heavy
weight. But it may still be fine as this will ease the pain for
development. For most customers, a little performance loss vs.
development gain probably will be fine.

How to disable instcombine/simplifyCFG in clang frontends? It looks
like function attributes and insn metadatas are used to influence
optimizations.
The following approach might work, I think:
    BPF backend may add the following two attributes to all functions:
    "disable-instcombine" = "true"
    "disable-simplifyCFG" = "true"
And during instcombine and simplifyCFG, the above function attribute will
be checked, it is true, the pass will be skipped.

Do you think the above method could work? Any alternative suggestions
are also appreciated.

This is certainly an interesting problem. A few thoughts:

* A generalized mechanism to disable parts of the pass pipeline, or maybe specify a pass pipeline, using function attributes, may be of broad utility. That having been said, it's not clearly the right answer to this problem.

* If we're going to disable things, I would much prefer attributes with a semantic meaning. It's not like those passes are the only passes that might perform transformations that would be problematic for your backend.

* How difficult would it be for your backend to try harder to coerce the IR into the form accepted by the BPF verifier? The current compilation problem seems unsound, unfortunately, and I'm wondering how we might move the overall system to be better grounded.

-Hal

> Hi,
>
> [resend with proper cfe-dev tag]
>
> This is to seek some advice on how to disable certain llvm
> optimizations when compiled with clang for BPF backend. For example,
> we want to disable two optimizations, instcombine and simplifyCFG. I
> would like to get some advice about what is the best way to do it.
>
> Some Background on Why We want to do this
> ===================================
>
> Any BPF code needs to go through a kernel verifier
> https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c
> before it can be executed in the kernel. verifier tries to ensure the safety
> of the program, no illegal memory access, etc. In certain cases, clang
> generated optimized codes make verification very hard, see pending
> commits with another approach focused on BPF backend and some core
> optimizations.
> https://reviews.llvm.org/D72787
> https://reviews.llvm.org/D82112
> https://reviews.llvm.org/D81859
> To workaround the issue, people have to write inline asm or tweak
> their program in a very user-unfriendly way to disable certain
> transformations, in particular, instcombine and simplifyCFG.
>
> Alternative Method in clang Frontend
> ============================
>
> The above approach tried to disable/undo/influence specific
> optimizations. Another approach is to disable instcombine and
> simplifyCFG during compilation process. This is a little bit heavy
> weight. But it may still be fine as this will ease the pain for
> development. For most customers, a little performance loss vs.
> development gain probably will be fine.
>
> How to disable instcombine/simplifyCFG in clang frontends? It looks
> like function attributes and insn metadatas are used to influence
> optimizations.
> The following approach might work, I think:
> BPF backend may add the following two attributes to all functions:
> "disable-instcombine" = "true"
> "disable-simplifyCFG" = "true"
> And during instcombine and simplifyCFG, the above function attribute will
> be checked, it is true, the pass will be skipped.
>
> Do you think the above method could work? Any alternative suggestions
> are also appreciated.

This is certainly an interesting problem. A few thoughts:

  * A generalized mechanism to disable parts of the pass pipeline, or
maybe specify a pass pipeline, using function attributes, may be of
broad utility. That having been said, it's not clearly the right answer
to this problem.

  * If we're going to disable things, I would much prefer attributes
with a semantic meaning. It's not like those passes are the only passes
that might perform transformations that would be problematic for your
backend.

Indeed specifying attributes with intention is better than disabling individual
passes. Based on intention, we can tailor related passes, disabling
the whole pass or
only some specific transformations in that particular pass.

Just to be more specific about what transformations we want to disable:
  (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
        (https://reviews.llvm.org/D72787)
        This transformation created a new variable "var.off" for comparison but
        using the original variable "var" later on. The kernel verifier does not
        have a better range of "var" at its use place and this may cause
        verification failure.

        To generalize, BPF prefers the single variable is refined and used later
        for each verification. New variable can be created and used for further
        value range refinement, but all later usage of old variable should be
        replaced with the new variable.
   (2). Prevent speculative code motion
          (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
         If the code in the original program may not execute under
certain conditions,
         we could like the code not to execute in the final byte code
under the same
         condition, regardless of whether it is safe to execute or not.

    I guess we could have two attributes here:
       - "refine-value-range-with-original-var" (false by default, true for BPF)
       - "no-speculative-code-motion" (false by default, true for BPF)

  * How difficult would it be for your backend to try harder to coerce
the IR into the form accepted by the BPF verifier? The current
compilation problem seems unsound, unfortunately, and I'm wondering how
we might move the overall system to be better grounded.

The above two kinds of optimizations are totally okay for other
architectures. It probably
just BPF has particular problems due to its extra verification phase
before executing.

For the case "refine-value-range-with-original-var",
https://reviews.llvm.org/D72787,
we have moderate success. Essentially, it is an pattern-match undo to
the previous instcombine
optimizations. But I am thinking that preventing the transformation
from happening might be
a better approach. https://reviews.llvm.org/D70372 is an unsuccessful
attempt. I think function
attribute might be a better approach here than undoing the transformation.

For the case "no-speculative-code-motion", the BPF backend is going to implement
getUserCost() (https://reviews.llvm.org/D82112) and needs core
simplifyCFG change
as well (https://reviews.llvm.org/D81859).

All the above bpf and core changes resolved the test case I am seeing.
But my fear is that if in the future, we found some other
optimizations needs undo or disabled,
bpf will need to do pattern matching again or having special
arrangement with that optimization
pass like in https://reviews.llvm.org/D81859.
Having a generic attribute which describes intention and later used by
optimization passes
seem better long term.

Thoughts?

(Reply inline)

(+1 to all that, FWIW - though I'm far less informed here than Eli is,
just thought I'd throw my 2c in)

+1 to all of this. We need to have well-defined semantics at the IR level.

A lot of these problems seem to come from the fact that the verifier is verifying each value at the point of calculation instead of at the point each value contributes to an operation with side effects. If the verifier cannot be improved in this regard, it seems like you need to represent these side effects at the IR level, and perhaps other aspects of the state of the verifier. For example, we might essentially add the inaccessiblememonly attribute to condition calculations and other relevant intrinsics. Is the verifier worried about all arithmetic somehow, or only pointer arithmetic (from GEPs)?

-Hal

bpf verifier is briefly described in the comments here:
  https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38

It is a path sensitive verifier. It will go through all possible paths
to ensure all memory accesses are safe.
The verifier is able to carry refined info across function calls.
The condition may refine a value range, e.g.,
   unsigned a = foo(); /* " a" is an unsigned int, could be any value
0 <= "a" <= UINT_MAX */
   if (a < 128) {
     /* varifier concludes "a" range now is [0, 127]. */
     ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
  }

But verifier is not smart enough to do backtracking now, so if you have code
   unsigned a = foo();
   b = a + 10;
   if (b < 128) {
       ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
   }
The verifier may reject the above code as (p + a) may be out of bound
of legal permitted memory range.

Why verifier did not then implement backtracking? There are a few reasons:
   - This is actually not very common. As long as the program does not
have control flow like the above triggering a particular instcombine
transformation, the compiler actually did a good job avoiding the
above pattern to avoid a second copy of the same variable. But if this
actually happens, it often cause a lot of
frustration for developers as their codes are perfectly okay and they
often do not know how to please
the verifier.
   - verifier is already very complex. Adding generic backtracking
will add a lot of complexity to verifier.
More states may need to be kept during verification process and this
will require more kernel memory. The verifier will become slower too.

The same for speculative code motion. Yes, the current verifier may
reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
this pkt_ptr += UINT_MAX may just a speculative move and later on
pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
actual memory access. In this case, the verifier will be wrong to
reject the program.

Not all speculative code motion will cause problems. Some scalar
speculative code motion totally fine. Or if the speculative code
motion still within ptr valid range, it will be fine too.

My original proposal to disable certain optimizations are a big
hammer, just to get it work so the developer can move on. I totally
agree that using func attribute to disable certain specific
optimizations may happen to work for some cases, but it may not work
for all cases since other optimizations may perform optimizations.

Option 1:

(Reply inline)

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Y Song via cfe-
dev
Sent: Friday, June 19, 2020 11:40 AM
To: Hal Finkel <hfinkel@anl.gov>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>; Clang Dev <cfe-
dev@lists.llvm.org>; Alexei Starovoitov <ast@kernel.org>
Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
frontend

Just to be more specific about what transformations we want to disable:
    (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
          (https://reviews.llvm.org/D72787)
          This transformation created a new variable "var.off" for comparison but
          using the original variable "var" later on. The kernel verifier does not
          have a better range of "var" at its use place and this may cause
          verification failure.

          To generalize, BPF prefers the single variable is refined and used later
          for each verification. New variable can be created and used for further
          value range refinement, but all later usage of old variable should be
          replaced with the new variable.
     (2). Prevent speculative code motion
            (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
           If the code in the original program may not execute under
certain conditions,
           we could like the code not to execute in the final byte code
under the same
           condition, regardless of whether it is safe to execute or not.

      I guess we could have two attributes here:
         - "refine-value-range-with-original-var" (false by default, true for BPF)
         - "no-speculative-code-motion" (false by default, true for BPF)

Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.

If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely

bpf verifier is briefly described in the comments here:
   https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38

It is a path sensitive verifier. It will go through all possible paths
to ensure all memory accesses are safe.
The verifier is able to carry refined info across function calls.
The condition may refine a value range, e.g.,
    unsigned a = foo(); /* " a" is an unsigned int, could be any value
0 <= "a" <= UINT_MAX */
    if (a < 128) {
      /* varifier concludes "a" range now is [0, 127]. */
      ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
   }

But verifier is not smart enough to do backtracking now, so if you have code
    unsigned a = foo();
    b = a + 10;
    if (b < 128) {
        ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
    }
The verifier may reject the above code as (p + a) may be out of bound
of legal permitted memory range.

Why verifier did not then implement backtracking? There are a few reasons:
    - This is actually not very common. As long as the program does not
have control flow like the above triggering a particular instcombine
transformation, the compiler actually did a good job avoiding the
above pattern to avoid a second copy of the same variable. But if this
actually happens, it often cause a lot of
frustration for developers as their codes are perfectly okay and they
often do not know how to please
the verifier.
    - verifier is already very complex. Adding generic backtracking
will add a lot of complexity to verifier.
More states may need to be kept during verification process and this
will require more kernel memory. The verifier will become slower too.

The same for speculative code motion. Yes, the current verifier may
reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
this pkt_ptr += UINT_MAX may just a speculative move and later on
pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
actual memory access. In this case, the verifier will be wrong to
reject the program.

Not all speculative code motion will cause problems. Some scalar
speculative code motion totally fine. Or if the speculative code
motion still within ptr valid range, it will be fine too.

My original proposal to disable certain optimizations are a big
hammer, just to get it work so the developer can move on. I totally
agree that using func attribute to disable certain specific
optimizations may happen to work for some cases, but it may not work
for all cases since other optimizations may perform optimizations.

Thanks, this is helpful. One question: Is the information used by the verifier available at compile time? It seems like what you want here is for the backend to iterate over the IR, perform the verification, and in cases where it would fail, push the relevant parts of the use-def chain down into the blocks with their eventual users.

-Hal

...

>>
>>> (Reply inline)
>>>
>>>> From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Y Song via cfe-
>>>> dev
>>>> Sent: Friday, June 19, 2020 11:40 AM
>>>> To: Hal Finkel <hfinkel@anl.gov>
>>>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>; Clang Dev <cfe-
>>>> dev@lists.llvm.org>; Alexei Starovoitov <ast@kernel.org>
>>>> Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
>>>> frontend
>>>>
>>>> Just to be more specific about what transformations we want to disable:
>>>> (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
>>>> (https://reviews.llvm.org/D72787)
>>>> This transformation created a new variable "var.off" for comparison but
>>>> using the original variable "var" later on. The kernel verifier does not
>>>> have a better range of "var" at its use place and this may cause
>>>> verification failure.
>>>>
>>>> To generalize, BPF prefers the single variable is refined and used later
>>>> for each verification. New variable can be created and used for further
>>>> value range refinement, but all later usage of old variable should be
>>>> replaced with the new variable.
>>>> (2). Prevent speculative code motion
>>>> (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
>>>> If the code in the original program may not execute under
>>>> certain conditions,
>>>> we could like the code not to execute in the final byte code
>>>> under the same
>>>> condition, regardless of whether it is safe to execute or not.
>>>>
>>>> I guess we could have two attributes here:
>>>> - "refine-value-range-with-original-var" (false by default, true for BPF)
>>>> - "no-speculative-code-motion" (false by default, true for BPF)
>>> Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.
>>>
>>> If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely
> bpf verifier is briefly described in the comments here:
> https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38
>
> It is a path sensitive verifier. It will go through all possible paths
> to ensure all memory accesses are safe.
> The verifier is able to carry refined info across function calls.
> The condition may refine a value range, e.g.,
> unsigned a = foo(); /* " a" is an unsigned int, could be any value
> 0 <= "a" <= UINT_MAX */
> if (a < 128) {
> /* varifier concludes "a" range now is [0, 127]. */
> ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
> }
>
> But verifier is not smart enough to do backtracking now, so if you have code
> unsigned a = foo();
> b = a + 10;
> if (b < 128) {
> ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
> }
> The verifier may reject the above code as (p + a) may be out of bound
> of legal permitted memory range.
>
> Why verifier did not then implement backtracking? There are a few reasons:
> - This is actually not very common. As long as the program does not
> have control flow like the above triggering a particular instcombine
> transformation, the compiler actually did a good job avoiding the
> above pattern to avoid a second copy of the same variable. But if this
> actually happens, it often cause a lot of
> frustration for developers as their codes are perfectly okay and they
> often do not know how to please
> the verifier.
> - verifier is already very complex. Adding generic backtracking
> will add a lot of complexity to verifier.
> More states may need to be kept during verification process and this
> will require more kernel memory. The verifier will become slower too.
>
> The same for speculative code motion. Yes, the current verifier may
> reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
> this pkt_ptr += UINT_MAX may just a speculative move and later on
> pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
> actual memory access. In this case, the verifier will be wrong to
> reject the program.
>
> Not all speculative code motion will cause problems. Some scalar
> speculative code motion totally fine. Or if the speculative code
> motion still within ptr valid range, it will be fine too.
>
> My original proposal to disable certain optimizations are a big
> hammer, just to get it work so the developer can move on. I totally
> agree that using func attribute to disable certain specific
> optimizations may happen to work for some cases, but it may not work
> for all cases since other optimizations may perform optimizations.

Thanks, this is helpful. One question: Is the information used by the
verifier available at compile time? It seems like what you want here is

We have identified some compiler optimizations in instcombine and
simplifyCFG may generate verifier unfriendly code. Yes, we could do
IR analysis to identify these patterns, may not be 100% but should be
close.

Completely modelling what verifier did in llvm is complex. Verifier
interacts with kernel subsystem with helper functions and these
helper functions having some semantics utilized with the verifier.
Bringing kernel helpers to llvm does not sound a good idea.
Verifier implements a path sensitive verification framework. To replicate
in llvm probably not a good idea as there are constant changes to
verifier and it is very hard to keep them sync. So here, in llvm the
best probably
to identify known IR patterns where verifier unfriendly code may be
generated and prevent them from generating (preferred) or undo them
if already generated.

for the backend to iterate over the IR, perform the verification, and in
cases where it would fail, push the relevant parts of the use-def chain
down into the blocks with their eventual users.

This is a good idea. The issue is instCombine and SimplifyCFG happens
before any backend IR passes. So backend will have to undo some of
these optimizations. For speculative code motion, the backend probably
hard to do since it does not even know the code is speculative generated.

Can we prevent these optimizations happening in the first place?

In addition to my previous three ideas:
   - function attribute to state intention,
   - clang IR generation with certain variable barriers.
   - new TLI functions used by bpf backend and additional bpf backend
     IR phases.

Two more ideas here:
(1). Is it possible to implement an generic IR pass (before InstCombine
and SimplifyCFG) which takes some hints from backend to modify IR
to favor BPF verifier?
(2). is it possible that clang frontend provides a hook for target to
provide a pass manager or at least having a say
to the pass manager? This way, target may disable certain passes?
I realize this is a big hammer approach. But Considering here that our
main goal is to improve development experience and performance
is not a goal, maybe it is still acceptable?

From: Y Song <ys114321@gmail.com>
Sent: Monday, June 22, 2020 10:29 PM
To: Hal Finkel <hfinkel@anl.gov>
Cc: Eli Friedman <efriedma@quicinc.com>; cfe-dev <cfe-dev@lists.llvm.org>;
Andrii Nakryiko <andrii.nakryiko@gmail.com>; Alexei Starovoitov
<ast@kernel.org>
Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
frontend

> Thanks, this is helpful. One question: Is the information used by the
> verifier available at compile time? It seems like what you want here is

We have identified some compiler optimizations in instcombine and
simplifyCFG may generate verifier unfriendly code. Yes, we could do
IR analysis to identify these patterns, may not be 100% but should be
close.

Completely modelling what verifier did in llvm is complex. Verifier
interacts with kernel subsystem with helper functions and these
helper functions having some semantics utilized with the verifier.
Bringing kernel helpers to llvm does not sound a good idea.
Verifier implements a path sensitive verification framework. To replicate
in llvm probably not a good idea as there are constant changes to
verifier and it is very hard to keep them sync. So here, in llvm the
best probably
to identify known IR patterns where verifier unfriendly code may be
generated and prevent them from generating (preferred) or undo them
if already generated.

We don't necessarily need to know the exact rules; some sort of conservative approximation is fine. We do need some model at the LLVM IR level, so we can document the rules transforms need to follow.

It would be preferable to have rules that don't require modifying the semantics of existing instructions. That's not a hard requirement, but the further you go from the usual LLVM model, the bigger the maintenance burden.

Two more ideas here:
(1). Is it possible to implement an generic IR pass (before InstCombine
and SimplifyCFG) which takes some hints from backend to modify IR
to favor BPF verifier?

See addCoroutinePassesToExtensionPoints.

(2). is it possible that clang frontend provides a hook for target to
provide a pass manager or at least having a say
to the pass manager? This way, target may disable certain passes?
I realize this is a big hammer approach. But Considering here that our
main goal is to improve development experience and performance
is not a goal, maybe it is still acceptable?

I don't think it makes sense to provide a way to disable a pass by name. My primary concern here is maintainability. Nobody working on target-independent code is going to understand why the passes are disabled. If someone changes the pass responsible for a transform, it isn't reasonable to say that involves updating random backends.

-Eli

Thanks for the suggestion.
Another way of thinking about it is to treat the kernel consumed assembly
as output of C dialect.
Like the coroutine is a language feature.
And the C language used to write BPF programs verified by the kernel
is a subset of C. We use special builtins for CO-RE. It's a C language
extension to some degree.
Similarly not every C construct is supported. We still cannot do indirect calls.
swich statement with jump table is not supported either.
If we can tell clang that C with '-target bpf' is actually somewhat different C
that would fit nicely into LangOpts model.

> From: Y Song <ys114321@gmail.com>
> Sent: Monday, June 22, 2020 10:29 PM
> To: Hal Finkel <hfinkel@anl.gov>
> Cc: Eli Friedman <efriedma@quicinc.com>; cfe-dev <cfe-dev@lists.llvm.org>;
> Andrii Nakryiko <andrii.nakryiko@gmail.com>; Alexei Starovoitov
> <ast@kernel.org>
> Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
> frontend
>
> > Thanks, this is helpful. One question: Is the information used by the
> > verifier available at compile time? It seems like what you want here is
>
> We have identified some compiler optimizations in instcombine and
> simplifyCFG may generate verifier unfriendly code. Yes, we could do
> IR analysis to identify these patterns, may not be 100% but should be
> close.
>
> Completely modelling what verifier did in llvm is complex. Verifier
> interacts with kernel subsystem with helper functions and these
> helper functions having some semantics utilized with the verifier.
> Bringing kernel helpers to llvm does not sound a good idea.
> Verifier implements a path sensitive verification framework. To replicate
> in llvm probably not a good idea as there are constant changes to
> verifier and it is very hard to keep them sync. So here, in llvm the
> best probably
> to identify known IR patterns where verifier unfriendly code may be
> generated and prevent them from generating (preferred) or undo them
> if already generated.

We don't necessarily need to know the exact rules; some sort of conservative approximation is fine. We do need some model at the LLVM IR level, so we can document the rules transforms need to follow.

It would be preferable to have rules that don't require modifying the semantics of existing instructions. That's not a hard requirement, but the further you go from the usual LLVM model, the bigger the maintenance burden.

> Two more ideas here:
> (1). Is it possible to implement an generic IR pass (before InstCombine
> and SimplifyCFG) which takes some hints from backend to modify IR
> to favor BPF verifier?

See addCoroutinePassesToExtensionPoints.

Thanks for the pointer! Will take a look.

> (2). is it possible that clang frontend provides a hook for target to
> provide a pass manager or at least having a say
> to the pass manager? This way, target may disable certain passes?
> I realize this is a big hammer approach. But Considering here that our
> main goal is to improve development experience and performance
> is not a goal, maybe it is still acceptable?

I don't think it makes sense to provide a way to disable a pass by name. My primary concern here is maintainability. Nobody working on target-independent code is going to understand why the passes are disabled. If someone changes the pass responsible for a transform, it isn't reasonable to say that involves updating random backends.

Okay. Understand. My original thinking is that the burden for
maintenance lays on BPF backend. If anything in target independent
code breaks anything, it is BPF backend's responsibility to adjust its
pass manager. Putting any burden on generic optimization is surely not
reasonable.

I think your reference above to addCoroutinePassesToExtensionPoints
might provide a way for
BPF backend to tweak IR before major optimizations so that verifier
friendly code can be generated.

(Reply inline)

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Y Song via cfe-
dev
Sent: Friday, June 19, 2020 11:40 AM
To: Hal Finkel <hfinkel@anl.gov>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>; Clang Dev <cfe-
dev@lists.llvm.org>; Alexei Starovoitov <ast@kernel.org>
Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
frontend

Just to be more specific about what transformations we want to disable:
     (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
           (https://reviews.llvm.org/D72787)
           This transformation created a new variable "var.off" for comparison but
           using the original variable "var" later on. The kernel verifier does not
           have a better range of "var" at its use place and this may cause
           verification failure.

           To generalize, BPF prefers the single variable is refined and used later
           for each verification. New variable can be created and used for further
           value range refinement, but all later usage of old variable should be
           replaced with the new variable.
      (2). Prevent speculative code motion
             (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
            If the code in the original program may not execute under
certain conditions,
            we could like the code not to execute in the final byte code
under the same
            condition, regardless of whether it is safe to execute or not.

       I guess we could have two attributes here:
          - "refine-value-range-with-original-var" (false by default, true for BPF)
          - "no-speculative-code-motion" (false by default, true for BPF)

Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.

If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely

bpf verifier is briefly described in the comments here:
    https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38

It is a path sensitive verifier. It will go through all possible paths
to ensure all memory accesses are safe.
The verifier is able to carry refined info across function calls.
The condition may refine a value range, e.g.,
     unsigned a = foo(); /* " a" is an unsigned int, could be any value
0 <= "a" <= UINT_MAX */
     if (a < 128) {
       /* varifier concludes "a" range now is [0, 127]. */
       ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
    }

But verifier is not smart enough to do backtracking now, so if you have code
     unsigned a = foo();
     b = a + 10;
     if (b < 128) {
         ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
     }
The verifier may reject the above code as (p + a) may be out of bound
of legal permitted memory range.

Why verifier did not then implement backtracking? There are a few reasons:
     - This is actually not very common. As long as the program does not
have control flow like the above triggering a particular instcombine
transformation, the compiler actually did a good job avoiding the
above pattern to avoid a second copy of the same variable. But if this
actually happens, it often cause a lot of
frustration for developers as their codes are perfectly okay and they
often do not know how to please
the verifier.
     - verifier is already very complex. Adding generic backtracking
will add a lot of complexity to verifier.
More states may need to be kept during verification process and this
will require more kernel memory. The verifier will become slower too.

The same for speculative code motion. Yes, the current verifier may
reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
this pkt_ptr += UINT_MAX may just a speculative move and later on
pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
actual memory access. In this case, the verifier will be wrong to
reject the program.

Not all speculative code motion will cause problems. Some scalar
speculative code motion totally fine. Or if the speculative code
motion still within ptr valid range, it will be fine too.

My original proposal to disable certain optimizations are a big
hammer, just to get it work so the developer can move on. I totally
agree that using func attribute to disable certain specific
optimizations may happen to work for some cases, but it may not work
for all cases since other optimizations may perform optimizations.

Thanks, this is helpful. One question: Is the information used by the
verifier available at compile time? It seems like what you want here is

We have identified some compiler optimizations in instcombine and
simplifyCFG may generate verifier unfriendly code. Yes, we could do
IR analysis to identify these patterns, may not be 100% but should be
close.

Completely modelling what verifier did in llvm is complex. Verifier
interacts with kernel subsystem with helper functions and these
helper functions having some semantics utilized with the verifier.
Bringing kernel helpers to llvm does not sound a good idea.
Verifier implements a path sensitive verification framework. To replicate
in llvm probably not a good idea as there are constant changes to
verifier and it is very hard to keep them sync. So here, in llvm the
best probably
to identify known IR patterns where verifier unfriendly code may be
generated and prevent them from generating (preferred) or undo them
if already generated.

Honestly, I don't think that you're thinking about this in the most-useful way. Obviously, we strive for code reuse and reduced duplication. However, imagine if this were, not a VM with a software-programmed loader, but a hardware architecture. The verifier would be part of the hardware, and it's semantics would be part of the hardware ISA. In this case, updates to the verifier would, without particular debate, necessitate changes to the compiler. When the hardware is updated, the compiler needs to be updated. Keeping the two in sync is just part of the normal compiler-development process. Will this be difficult? Maybe. But, as Eli mentioned, you can use conservative approximations where modeling the details doesn't matter too much. Regardless, at a high level, yes, you should duplicate the relevant parts of the verifier logic in the backend. You can then use that to move code that is correct only assuming speculation is side-effect free (consistent with LLVM's semantics) into the blocks with the eventual users (which is required by the verifier).

The idea that you're doing to disable particular optimizations to fix this problem is not robust or sound. Any number of other transformations could introduce similar issues. Are you going to constantly audit everything? The input source-language code could have the same problem. You can, of course, say that the source-language inputs are not really C/C++ code, but some similar-looking language with different semantics, but frankly, you'll be fighting an uphill battle the whole way. I understand that you already do this to some extent (e.g., by disallowing unbounded loops), but these semantics are far more pervasive (it seems to me, anyway, as qualitatively different). As you can probably tell, I really don't recommend trying to go down this path.

Thanks again,

Hal

>>>>> (Reply inline)
>>>>>
>>>>>> From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Y Song via cfe-
>>>>>> dev
>>>>>> Sent: Friday, June 19, 2020 11:40 AM
>>>>>> To: Hal Finkel <hfinkel@anl.gov>
>>>>>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>; Clang Dev <cfe-
>>>>>> dev@lists.llvm.org>; Alexei Starovoitov <ast@kernel.org>
>>>>>> Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
>>>>>> frontend
>>>>>>
>>>>>> Just to be more specific about what transformations we want to disable:
>>>>>> (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
>>>>>> (https://reviews.llvm.org/D72787)
>>>>>> This transformation created a new variable "var.off" for comparison but
>>>>>> using the original variable "var" later on. The kernel verifier does not
>>>>>> have a better range of "var" at its use place and this may cause
>>>>>> verification failure.
>>>>>>
>>>>>> To generalize, BPF prefers the single variable is refined and used later
>>>>>> for each verification. New variable can be created and used for further
>>>>>> value range refinement, but all later usage of old variable should be
>>>>>> replaced with the new variable.
>>>>>> (2). Prevent speculative code motion
>>>>>> (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
>>>>>> If the code in the original program may not execute under
>>>>>> certain conditions,
>>>>>> we could like the code not to execute in the final byte code
>>>>>> under the same
>>>>>> condition, regardless of whether it is safe to execute or not.
>>>>>>
>>>>>> I guess we could have two attributes here:
>>>>>> - "refine-value-range-with-original-var" (false by default, true for BPF)
>>>>>> - "no-speculative-code-motion" (false by default, true for BPF)
>>>>> Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.
>>>>>
>>>>> If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely
>>> bpf verifier is briefly described in the comments here:
>>> https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38
>>>
>>> It is a path sensitive verifier. It will go through all possible paths
>>> to ensure all memory accesses are safe.
>>> The verifier is able to carry refined info across function calls.
>>> The condition may refine a value range, e.g.,
>>> unsigned a = foo(); /* " a" is an unsigned int, could be any value
>>> 0 <= "a" <= UINT_MAX */
>>> if (a < 128) {
>>> /* varifier concludes "a" range now is [0, 127]. */
>>> ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
>>> }
>>>
>>> But verifier is not smart enough to do backtracking now, so if you have code
>>> unsigned a = foo();
>>> b = a + 10;
>>> if (b < 128) {
>>> ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
>>> }
>>> The verifier may reject the above code as (p + a) may be out of bound
>>> of legal permitted memory range.
>>>
>>> Why verifier did not then implement backtracking? There are a few reasons:
>>> - This is actually not very common. As long as the program does not
>>> have control flow like the above triggering a particular instcombine
>>> transformation, the compiler actually did a good job avoiding the
>>> above pattern to avoid a second copy of the same variable. But if this
>>> actually happens, it often cause a lot of
>>> frustration for developers as their codes are perfectly okay and they
>>> often do not know how to please
>>> the verifier.
>>> - verifier is already very complex. Adding generic backtracking
>>> will add a lot of complexity to verifier.
>>> More states may need to be kept during verification process and this
>>> will require more kernel memory. The verifier will become slower too.
>>>
>>> The same for speculative code motion. Yes, the current verifier may
>>> reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
>>> this pkt_ptr += UINT_MAX may just a speculative move and later on
>>> pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
>>> actual memory access. In this case, the verifier will be wrong to
>>> reject the program.
>>>
>>> Not all speculative code motion will cause problems. Some scalar
>>> speculative code motion totally fine. Or if the speculative code
>>> motion still within ptr valid range, it will be fine too.
>>>
>>> My original proposal to disable certain optimizations are a big
>>> hammer, just to get it work so the developer can move on. I totally
>>> agree that using func attribute to disable certain specific
>>> optimizations may happen to work for some cases, but it may not work
>>> for all cases since other optimizations may perform optimizations.
>>
>> Thanks, this is helpful. One question: Is the information used by the
>> verifier available at compile time? It seems like what you want here is
> We have identified some compiler optimizations in instcombine and
> simplifyCFG may generate verifier unfriendly code. Yes, we could do
> IR analysis to identify these patterns, may not be 100% but should be
> close.
>
> Completely modelling what verifier did in llvm is complex. Verifier
> interacts with kernel subsystem with helper functions and these
> helper functions having some semantics utilized with the verifier.
> Bringing kernel helpers to llvm does not sound a good idea.
> Verifier implements a path sensitive verification framework. To replicate
> in llvm probably not a good idea as there are constant changes to
> verifier and it is very hard to keep them sync. So here, in llvm the
> best probably
> to identify known IR patterns where verifier unfriendly code may be
> generated and prevent them from generating (preferred) or undo them
> if already generated.

Honestly, I don't think that you're thinking about this in the
most-useful way. Obviously, we strive for code reuse and reduced
duplication. However, imagine if this were, not a VM with a
software-programmed loader, but a hardware architecture. The verifier
would be part of the hardware, and it's semantics would be part of the
hardware ISA. In this case, updates to the verifier would, without
particular debate, necessitate changes to the compiler. When the
hardware is updated, the compiler needs to be updated. Keeping the two
in sync is just part of the normal compiler-development process. Will
this be difficult? Maybe. But, as Eli mentioned, you can use
conservative approximations where modeling the details doesn't matter
too much. Regardless, at a high level, yes, you should duplicate the
relevant parts of the verifier logic in the backend. You can then use
that to move code that is correct only assuming speculation is
side-effect free (consistent with LLVM's semantics) into the blocks with
the eventual users (which is required by the verifier).

The idea that you're doing to disable particular optimizations to fix
this problem is not robust or sound. Any number of other transformations
could introduce similar issues. Are you going to constantly audit
everything? The input source-language code could have the same problem.
You can, of course, say that the source-language inputs are not really
C/C++ code, but some similar-looking language with different semantics,
but frankly, you'll be fighting an uphill battle the whole way. I
understand that you already do this to some extent (e.g., by disallowing
unbounded loops), but these semantics are far more pervasive (it seems
to me, anyway, as qualitatively different). As you can probably tell, I
really don't recommend trying to go down this path.

I agree that disabling optimization is not nice. It is just a big
hammer approach
and it may not work if optimization is changed. What I do not want to do
is to implement the whole verifier logic in LLVM. But as you and Eli mentioned,
we can just implement a portion of verifier logic, focusing solely on what
verifier does not like.

One of my previous suggestions is to implement some verifier logic in clang
code generation pass. I am not 100% sure how easy it is. Hence I am
proposing a target influenced PassManager in clang so BPF target can have
a say.

But later Eli pointed out that we could use
addCoroutinePassesToExtensionPoints()
approach to have an early IR pass to encode verifier logic to IR so downstream
optimization can stay as is. This effectively very similar to my clang
target influenced
PassManager, already implemented in llvm.
I think this is a sound approach and I will experiment with it.

Thanks for all your involved discussions for this topic!

(Reply inline)

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Y Song via cfe-
dev
Sent: Friday, June 19, 2020 11:40 AM
To: Hal Finkel <hfinkel@anl.gov>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>; Clang Dev <cfe-
dev@lists.llvm.org>; Alexei Starovoitov <ast@kernel.org>
Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
frontend

Just to be more specific about what transformations we want to disable:
      (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
            (https://reviews.llvm.org/D72787)
            This transformation created a new variable "var.off" for comparison but
            using the original variable "var" later on. The kernel verifier does not
            have a better range of "var" at its use place and this may cause
            verification failure.

            To generalize, BPF prefers the single variable is refined and used later
            for each verification. New variable can be created and used for further
            value range refinement, but all later usage of old variable should be
            replaced with the new variable.
       (2). Prevent speculative code motion
              (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
             If the code in the original program may not execute under
certain conditions,
             we could like the code not to execute in the final byte code
under the same
             condition, regardless of whether it is safe to execute or not.

        I guess we could have two attributes here:
           - "refine-value-range-with-original-var" (false by default, true for BPF)
           - "no-speculative-code-motion" (false by default, true for BPF)

Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.

If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely

bpf verifier is briefly described in the comments here:
     https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38

It is a path sensitive verifier. It will go through all possible paths
to ensure all memory accesses are safe.
The verifier is able to carry refined info across function calls.
The condition may refine a value range, e.g.,
      unsigned a = foo(); /* " a" is an unsigned int, could be any value
0 <= "a" <= UINT_MAX */
      if (a < 128) {
        /* varifier concludes "a" range now is [0, 127]. */
        ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
     }

But verifier is not smart enough to do backtracking now, so if you have code
      unsigned a = foo();
      b = a + 10;
      if (b < 128) {
          ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
      }
The verifier may reject the above code as (p + a) may be out of bound
of legal permitted memory range.

Why verifier did not then implement backtracking? There are a few reasons:
      - This is actually not very common. As long as the program does not
have control flow like the above triggering a particular instcombine
transformation, the compiler actually did a good job avoiding the
above pattern to avoid a second copy of the same variable. But if this
actually happens, it often cause a lot of
frustration for developers as their codes are perfectly okay and they
often do not know how to please
the verifier.
      - verifier is already very complex. Adding generic backtracking
will add a lot of complexity to verifier.
More states may need to be kept during verification process and this
will require more kernel memory. The verifier will become slower too.

The same for speculative code motion. Yes, the current verifier may
reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
this pkt_ptr += UINT_MAX may just a speculative move and later on
pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
actual memory access. In this case, the verifier will be wrong to
reject the program.

Not all speculative code motion will cause problems. Some scalar
speculative code motion totally fine. Or if the speculative code
motion still within ptr valid range, it will be fine too.

My original proposal to disable certain optimizations are a big
hammer, just to get it work so the developer can move on. I totally
agree that using func attribute to disable certain specific
optimizations may happen to work for some cases, but it may not work
for all cases since other optimizations may perform optimizations.

Thanks, this is helpful. One question: Is the information used by the
verifier available at compile time? It seems like what you want here is

We have identified some compiler optimizations in instcombine and
simplifyCFG may generate verifier unfriendly code. Yes, we could do
IR analysis to identify these patterns, may not be 100% but should be
close.

Completely modelling what verifier did in llvm is complex. Verifier
interacts with kernel subsystem with helper functions and these
helper functions having some semantics utilized with the verifier.
Bringing kernel helpers to llvm does not sound a good idea.
Verifier implements a path sensitive verification framework. To replicate
in llvm probably not a good idea as there are constant changes to
verifier and it is very hard to keep them sync. So here, in llvm the
best probably
to identify known IR patterns where verifier unfriendly code may be
generated and prevent them from generating (preferred) or undo them
if already generated.

Honestly, I don't think that you're thinking about this in the
most-useful way. Obviously, we strive for code reuse and reduced
duplication. However, imagine if this were, not a VM with a
software-programmed loader, but a hardware architecture. The verifier
would be part of the hardware, and it's semantics would be part of the
hardware ISA. In this case, updates to the verifier would, without
particular debate, necessitate changes to the compiler. When the
hardware is updated, the compiler needs to be updated. Keeping the two
in sync is just part of the normal compiler-development process. Will
this be difficult? Maybe. But, as Eli mentioned, you can use
conservative approximations where modeling the details doesn't matter
too much. Regardless, at a high level, yes, you should duplicate the
relevant parts of the verifier logic in the backend. You can then use
that to move code that is correct only assuming speculation is
side-effect free (consistent with LLVM's semantics) into the blocks with
the eventual users (which is required by the verifier).

The idea that you're doing to disable particular optimizations to fix
this problem is not robust or sound. Any number of other transformations
could introduce similar issues. Are you going to constantly audit
everything? The input source-language code could have the same problem.
You can, of course, say that the source-language inputs are not really
C/C++ code, but some similar-looking language with different semantics,
but frankly, you'll be fighting an uphill battle the whole way. I
understand that you already do this to some extent (e.g., by disallowing
unbounded loops), but these semantics are far more pervasive (it seems
to me, anyway, as qualitatively different). As you can probably tell, I
really don't recommend trying to go down this path.

I agree that disabling optimization is not nice. It is just a big
hammer approach
and it may not work if optimization is changed. What I do not want to do
is to implement the whole verifier logic in LLVM. But as you and Eli mentioned,
we can just implement a portion of verifier logic, focusing solely on what
verifier does not like.

One of my previous suggestions is to implement some verifier logic in clang
code generation pass. I am not 100% sure how easy it is. Hence I am
proposing a target influenced PassManager in clang so BPF target can have
a say.

But later Eli pointed out that we could use
addCoroutinePassesToExtensionPoints()
approach to have an early IR pass to encode verifier logic to IR so downstream
optimization can stay as is. This effectively very similar to my clang
target influenced
PassManager, already implemented in llvm.
I think this is a sound approach and I will experiment with it.

I don't understand what you're planning to try. How would the pass encode the verifier constraints in the IR?

Backends, as you're probably aware, commonly add IR level passes that run prior to instruction selection (BPFPassConfig::addIRPasses already does so).

-Hal

>>
>>>>>>> (Reply inline)
>>>>>>>
>>>>>>>> From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Y Song via cfe-
>>>>>>>> dev
>>>>>>>> Sent: Friday, June 19, 2020 11:40 AM
>>>>>>>> To: Hal Finkel <hfinkel@anl.gov>
>>>>>>>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>; Clang Dev <cfe-
>>>>>>>> dev@lists.llvm.org>; Alexei Starovoitov <ast@kernel.org>
>>>>>>>> Subject: [EXT] Re: [cfe-dev] Disable certain llvm optimizations at clang
>>>>>>>> frontend
>>>>>>>>
>>>>>>>> Just to be more specific about what transformations we want to disable:
>>>>>>>> (1). Undo a transformation in InstCombine/InstCombineAndOrXor.cpp
>>>>>>>> (https://reviews.llvm.org/D72787)
>>>>>>>> This transformation created a new variable "var.off" for comparison but
>>>>>>>> using the original variable "var" later on. The kernel verifier does not
>>>>>>>> have a better range of "var" at its use place and this may cause
>>>>>>>> verification failure.
>>>>>>>>
>>>>>>>> To generalize, BPF prefers the single variable is refined and used later
>>>>>>>> for each verification. New variable can be created and used for further
>>>>>>>> value range refinement, but all later usage of old variable should be
>>>>>>>> replaced with the new variable.
>>>>>>>> (2). Prevent speculative code motion
>>>>>>>> (https://reviews.llvm.org/D82112, https://reviews.llvm.org/D81859)
>>>>>>>> If the code in the original program may not execute under
>>>>>>>> certain conditions,
>>>>>>>> we could like the code not to execute in the final byte code
>>>>>>>> under the same
>>>>>>>> condition, regardless of whether it is safe to execute or not.
>>>>>>>>
>>>>>>>> I guess we could have two attributes here:
>>>>>>>> - "refine-value-range-with-original-var" (false by default, true for BPF)
>>>>>>>> - "no-speculative-code-motion" (false by default, true for BPF)
>>>>>>> Looking at this, I'm worried that there's a bigger disconnect between the way LLVM reasons about IR, vs. what the BPF verifier can actually handle. It isn't reasonable to blindly disable code motion; that's chasing after the symptoms of the problem, not solving the underlying issue.
>>>>>>>
>>>>>>> If I understand correctly, the underlying issue is that BPF has a notion of a "CFG guard": you can have a condition and a branch, and instructions inside the region guarded by that branch have different rules from instructions outside that region, based on that condition. Both clang and LLVM optimizations are completely
>>>>> bpf verifier is briefly described in the comments here:
>>>>> https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c#L38
>>>>>
>>>>> It is a path sensitive verifier. It will go through all possible paths
>>>>> to ensure all memory accesses are safe.
>>>>> The verifier is able to carry refined info across function calls.
>>>>> The condition may refine a value range, e.g.,
>>>>> unsigned a = foo(); /* " a" is an unsigned int, could be any value
>>>>> 0 <= "a" <= UINT_MAX */
>>>>> if (a < 128) {
>>>>> /* varifier concludes "a" range now is [0, 127]. */
>>>>> ... * (p + a) ... /* test whether *(p + [0, 127]) is legal or not.
>>>>> }
>>>>>
>>>>> But verifier is not smart enough to do backtracking now, so if you have code
>>>>> unsigned a = foo();
>>>>> b = a + 10;
>>>>> if (b < 128) {
>>>>> ... * (p + a) ... /* b will be [10, 127], but a is [0, UINT_MAX]. */
>>>>> }
>>>>> The verifier may reject the above code as (p + a) may be out of bound
>>>>> of legal permitted memory range.
>>>>>
>>>>> Why verifier did not then implement backtracking? There are a few reasons:
>>>>> - This is actually not very common. As long as the program does not
>>>>> have control flow like the above triggering a particular instcombine
>>>>> transformation, the compiler actually did a good job avoiding the
>>>>> above pattern to avoid a second copy of the same variable. But if this
>>>>> actually happens, it often cause a lot of
>>>>> frustration for developers as their codes are perfectly okay and they
>>>>> often do not know how to please
>>>>> the verifier.
>>>>> - verifier is already very complex. Adding generic backtracking
>>>>> will add a lot of complexity to verifier.
>>>>> More states may need to be kept during verification process and this
>>>>> will require more kernel memory. The verifier will become slower too.
>>>>>
>>>>> The same for speculative code motion. Yes, the current verifier may
>>>>> reject a pointer arithmetic (say pkt_ptr += UINT_MAX). But in really,
>>>>> this pkt_ptr += UINT_MAX may just a speculative move and later on
>>>>> pkt_ptr = original_legal_pkt_ptr + 10 will assign proper value before
>>>>> actual memory access. In this case, the verifier will be wrong to
>>>>> reject the program.
>>>>>
>>>>> Not all speculative code motion will cause problems. Some scalar
>>>>> speculative code motion totally fine. Or if the speculative code
>>>>> motion still within ptr valid range, it will be fine too.
>>>>>
>>>>> My original proposal to disable certain optimizations are a big
>>>>> hammer, just to get it work so the developer can move on. I totally
>>>>> agree that using func attribute to disable certain specific
>>>>> optimizations may happen to work for some cases, but it may not work
>>>>> for all cases since other optimizations may perform optimizations.
>>>> Thanks, this is helpful. One question: Is the information used by the
>>>> verifier available at compile time? It seems like what you want here is
>>> We have identified some compiler optimizations in instcombine and
>>> simplifyCFG may generate verifier unfriendly code. Yes, we could do
>>> IR analysis to identify these patterns, may not be 100% but should be
>>> close.
>>>
>>> Completely modelling what verifier did in llvm is complex. Verifier
>>> interacts with kernel subsystem with helper functions and these
>>> helper functions having some semantics utilized with the verifier.
>>> Bringing kernel helpers to llvm does not sound a good idea.
>>> Verifier implements a path sensitive verification framework. To replicate
>>> in llvm probably not a good idea as there are constant changes to
>>> verifier and it is very hard to keep them sync. So here, in llvm the
>>> best probably
>>> to identify known IR patterns where verifier unfriendly code may be
>>> generated and prevent them from generating (preferred) or undo them
>>> if already generated.
>>
>> Honestly, I don't think that you're thinking about this in the
>> most-useful way. Obviously, we strive for code reuse and reduced
>> duplication. However, imagine if this were, not a VM with a
>> software-programmed loader, but a hardware architecture. The verifier
>> would be part of the hardware, and it's semantics would be part of the
>> hardware ISA. In this case, updates to the verifier would, without
>> particular debate, necessitate changes to the compiler. When the
>> hardware is updated, the compiler needs to be updated. Keeping the two
>> in sync is just part of the normal compiler-development process. Will
>> this be difficult? Maybe. But, as Eli mentioned, you can use
>> conservative approximations where modeling the details doesn't matter
>> too much. Regardless, at a high level, yes, you should duplicate the
>> relevant parts of the verifier logic in the backend. You can then use
>> that to move code that is correct only assuming speculation is
>> side-effect free (consistent with LLVM's semantics) into the blocks with
>> the eventual users (which is required by the verifier).
>>
>> The idea that you're doing to disable particular optimizations to fix
>> this problem is not robust or sound. Any number of other transformations
>> could introduce similar issues. Are you going to constantly audit
>> everything? The input source-language code could have the same problem.
>> You can, of course, say that the source-language inputs are not really
>> C/C++ code, but some similar-looking language with different semantics,
>> but frankly, you'll be fighting an uphill battle the whole way. I
>> understand that you already do this to some extent (e.g., by disallowing
>> unbounded loops), but these semantics are far more pervasive (it seems
>> to me, anyway, as qualitatively different). As you can probably tell, I
>> really don't recommend trying to go down this path.
> I agree that disabling optimization is not nice. It is just a big
> hammer approach
> and it may not work if optimization is changed. What I do not want to do
> is to implement the whole verifier logic in LLVM. But as you and Eli mentioned,
> we can just implement a portion of verifier logic, focusing solely on what
> verifier does not like.
>
> One of my previous suggestions is to implement some verifier logic in clang
> code generation pass. I am not 100% sure how easy it is. Hence I am
> proposing a target influenced PassManager in clang so BPF target can have
> a say.
>
> But later Eli pointed out that we could use
> addCoroutinePassesToExtensionPoints()
> approach to have an early IR pass to encode verifier logic to IR so downstream
> optimization can stay as is. This effectively very similar to my clang
> target influenced
> PassManager, already implemented in llvm.
> I think this is a sound approach and I will experiment with it.

I don't understand what you're planning to try. How would the pass
encode the verifier constraints in the IR?

The following two examples, which I described in one of my previous emails,
show what I want to do in the IR transformation:

What we really want is to
"serialize" variable uses to avoid backtracking and speculative code
motion. The following is a specific example,
   a = bar();
   if (a >= 0) {
      if (a <= 16) {
            ...
      }
    }
We would like to generate a code like
// var_barrier(a) defines and uses a
#define var_barrier(a) asm volatile ("" : "=r"(a) : "0"(a))
    a = bar();
    var_barrier(a);
    if (a >= 0) {
        var_barrier(a);
        if (a <= 16) {
            ...
        }
     }
The above code will effectively disable control flow optimization and
will generate verifier friendly code.

For speculative code motion part,
  a = bar();
  if (a < 10) {
      ... *(p + a) ...
  }
We can do
   a = bar();
   var_barrier(a);
   if (a < 10) {
       var_barrier(a);
       ... *(p + a) ...
   }
This will prevent speculative code motion as well.

Basically, we selectively put some memory or variable barriers to
restrict some reorderings or
condition merging. As experiments go on, we will see whether this is
sufficient or not.

Backends, as you're probably aware, commonly add IR level passes that
run prior to instruction selection (BPFPassConfig::addIRPasses already
does so).

Yes, I am aware of this and we already use it. I hope to use adjustPassManager()
so BPF can "preprocess" IR before major optimization happens.

So, if I understand correctly, eBPF requires all memory accesses to be statically proven inbounds, so that it’s impossible to trigger a runtime-abort condition in the resulting execution via a bad memory dereference operation. In fact, there is no ‘exceptional abort’ even possible in eBPF: all programs will execute all the way through, and return a result code.

However, the desire is to allow writing “normal” C code, compiled with a “normal” C compiler, so the source language type system does not actually express these constraints directly. This is not made part of the type system, nor is there any explicit “bounds check” semantics as far as the compiler is concerned. Users expect to write typical C code, full of usual C pointer arithmetic, and math and comparisons. Then, they compile their code. Finally, they try to load it, and only then does anything start checking whether it’s well-formed, according to these hidden rules of well-formedness. The verifier goes to a lot of trouble to attempt to track down all possible states that program execution may result in, to show that all of them will result in only inbounds pointer accesses, and that loops terminate, etc. (If you’ve done it wrong, the only place you get an error is from the kernel verifier failing to load the program, never from the compiler.)

That’s an interesting model, and I guess it almost works in practice, but I’m really not sure it can feasibly be made sound in a theoretical sense. Having the compiler know something about the verifier doesn’t seem likely to be sufficient. Given that the verifier tracks potential states cross-function/cross-TU, basically any computation could be in-scope for being potentially important to prove bounds from the compiler’s POV. It may be the best you can really do is to hack up problematic optimizations as they are discovered to try to work around the issues.

On the other hand, if bounded-pointer types and constrained range integers were tracked as first-class concepts at the source level, this would seem more feasible. The compiler could ensure that there were no unchecked bounds possible, instead of making you wait for a verifier error. And by proving the boundedness at a high level, the compiler could explicitly ensure the operations contributing to the boundedness-check are emitted in a simplistic manner, without affecting optimization of the rest of the program. Alternatively, the backend could ensure that if any optimization passes do result in an instruction stream which no longer proves the bounds in a verifier-understandable way, it could insert an extraneous min/max to constrain to the bounds it had already proven to itself.

But making such a fundamental shift in strategy would seem quite tricky. Tricky to specify, probably tricky to implement, and would result in an obviously unusual dialect of C. (Instead of the current secretly unusual dialect which pretends to be normal C until you try to run the program.) There is some pre-existing research in this area, like Microsoft’s Checked-C (https://www.microsoft.com/en-us/research/project/checked-c/). But I don’t know how closely that aligns.

General comment after reading most of the thread:

I guess there are two opposing ways to approach this:

  1. try to “preserve” the restrictions assuming the input is valid, or

  2. try to enforce the restrictions at the end of the pipeline, i.a., legalize the IR.

I would strongly recommend 2) if at all possible.

One way would be:

Run normally, try to legalize with approximation of the requirements.

If the legalization fails report it to the user.

Maybe fail and force the user to lower optimization level, or automatically do so.

The thing is, any form of IR annotation that is semantic bearing needs to be known to a lot of places and a lot of places do similar things.

Disabling some concept like “code motion” is not only affecting various places but also a grey area wrt. the definition of “code motion”.

This will be the same for any “transformation”, I expect at least.

Cheers,

Johannes

So, if I understand correctly, eBPF requires all memory accesses to be statically proven inbounds, so that it's impossible to trigger a runtime-abort condition in the resulting execution via a bad memory dereference operation. In fact, there is no 'exceptional abort' even possible in eBPF: all programs will execute all the way through, and return a result code.

However, the desire is to allow writing "normal" C code, compiled with a "normal" C compiler, so the source language type system does not actually express these constraints directly. This is not made part of the type system, nor is there any explicit "bounds check" semantics as far as the compiler is concerned. Users expect to write typical C code, full of usual C pointer arithmetic, and math and comparisons. Then, they compile their code. Finally, they try to load it, and only then does anything start checking whether it's well-formed, according to these hidden rules of well-formedness. The verifier goes to a lot of trouble to attempt to track down all possible states that program execution may result in, to show that all of them will result in only inbounds pointer accesses, and that loops terminate, etc. (If you've done it wrong, the only place you get an error is from the kernel verifier failing to load the program, never from the compiler.)

That is true.

That's an interesting model, and I guess it almost works in practice, but I'm really not sure it can feasibly be made sound in a theoretical sense. Having the compiler know something about the verifier doesn't seem

There actually exist a paper for this.
https://seahorn.github.io/papers/ebpf-pldi19.pdf
Note that kernel verifier has since improved including verifying for
bounded loops, cross function calls, etc.

likely to be sufficient. Given that the verifier tracks potential states cross-function/cross-TU, basically any computation could be in-scope for being potentially important to prove bounds from the compiler's POV. It

This is totally true. Currently verifier is able to track cross
functions. The verifier tries to keep track of value ranges as precise
as possible. Currently, it uses a tnum (ternary number) to represent
value ranges. This sometime is not able to precisely capture distinct
values and may cause verifieration failure.
https://github.com/torvalds/linux/blob/master/kernel/bpf/tnum.c

may be the best you can really do is to hack up problematic optimizations as they are discovered to try to work around the issues.

This is what I try to do in the initial phase. We kind of know a few
code patterns and llvm optimizations may generate verifier unfriendly
codes and we will try to change IR to effectively prevent such
optimizations. This may have some false positives meaning the code may
be totally fine even with optimization but we restricted it. We are
fine with this as this prevents verifier failure which is the main
road block for bpf developers.