Adding diversity for security (and testing)

Greetings LLVM Devs!

I am a PhD student in the Secure Systems and Software Lab at UC
Irvine. We have been working on adding randomness into code generation
to create a diverse population of binaries. This diversity prevents
code-reuse attacks such as return-oriented-programming (ROP) by
denying the attacker information about the exact code layout. ROP has
been used is several high-profile recent attacks, and has also been
used as a jailbreaking avenue. We believe our transformations would
provide a significant security benefit for LLVM users who choose to
use diversity. For more details see [1] (although we are currently
proposing to upstream only a simplified subset of our work).

We would like to contribute some of our work back to the community,
and are preparing a small patch adding two new features: NOP insertion
and schedule randomization. The NOP insertion pass randomly adds NOPs
after each MachineInstr according to a command-line
parameter. Currently NOP insertion is implemented for X86, and we are
adding support for ARM. The schedule randomizer randomly picks a valid
instruction to schedule at every point, bypassing the scheduling
heuristics. These passes result in a binary which, while slightly
slower, is far more secure against code-reuse attacks. In addition,
schedule randomization may be useful for randomized compiler and
micro-architecture testing.

We would also include a secure random number generator which links
against OpenSSL. This would of course be an optional module disabled
by default, but is necessary so the randomization is cryptographically
secure and useful in security applications.

We are in the process of writing test cases and double checking
formatting to produce a useful patch, but would like to solicit
feedback on our proposed changes before submitting patches for
detailed consideration.

Thanks,
Stephen Crane
Secure Systems and Software Lab
UC Irvine

[1] A. Homescu, S. Neisius, P. Larsen, S. Brunthaler, and M. Franz;
“Profile-guided Automated Software Diversity,” in 2013 International
Symposium on Code Generation and Optimization (CGO 2013), Shenzhen,
China; February 2013.

Hi Stephen,

Greetings LLVM Devs!

I am a PhD student in the Secure Systems and Software Lab at UC
Irvine. We have been working on adding randomness into code generation
to create a diverse population of binaries. This diversity prevents
code-reuse attacks such as return-oriented-programming (ROP) by
denying the attacker information about the exact code layout. ROP has
been used is several high-profile recent attacks, and has also been
used as a jailbreaking avenue. We believe our transformations would
provide a significant security benefit for LLVM users who choose to
use diversity. For more details see [1] (although we are currently
proposing to upstream only a simplified subset of our work).

I think that this is very interesting and I would like LLVM to have a "randomness” feature. I think that it is useful for other aspects of security as well.

We would like to contribute some of our work back to the community,
and are preparing a small patch adding two new features: NOP insertion
and schedule randomization. The NOP insertion pass randomly adds NOPs
after each MachineInstr according to a command-line
parameter. Currently NOP insertion is implemented for X86, and we are
adding support for ARM.

Okay.

The schedule randomizer randomly picks a valid
instruction to schedule at every point, bypassing the scheduling
heuristics. These passes result in a binary which, while slightly
slower, is far more secure against code-reuse attacks. In addition,
schedule randomization may be useful for randomized compiler and
micro-architecture testing.

Which scheduler did you modify ? The plan is to disable the SelectionDAG scheduler and move to the MI Scheduler soon.

Also, have you looked at randomizing register-allocation ?

We would also include a secure random number generator which links
against OpenSSL. This would of course be an optional module disabled
by default, but is necessary so the randomization is cryptographically
secure and useful in security applications.

I am not sure why you need this feature. You can provide LLVM with a SEED value that can be controlled from the command line. A wrapper (such as a build-script) can control this value.

We are in the process of writing test cases and double checking
formatting to produce a useful patch, but would like to solicit
feedback on our proposed changes before submitting patches for
detailed consideration.

Please make sure that the LLVM nightly test suite passes with randomization enabled.

Hi Nadav,

Thanks for your interest!

Which scheduler did you modify ? The plan is to disable the SelectionDAG scheduler and move to the MI Scheduler soon. Also, have you looked at randomizing register-allocation ?

Yes, we modified the SelectionDAG scheduler. This was before the MI scheduler was around, but we will look into porting our ideas over to the new scheduler.

Register allocation randomization is in fact another of our existing transformations. We thought we would propose just a few simple transforms initially, but we can certainly include register randomization as well if there is enough interest.

We would also include a secure random number generator which links
against OpenSSL. This would of course be an optional module disabled
by default, but is necessary so the randomization is cryptographically
secure and useful in security applications.

I am not sure why you need this feature. You can provide LLVM with a SEED value that can be controlled from the command line. A wrapper (such as a build-script) can control this value.

We do in fact seed the RNG with a command line parameter (we reuse the -frandom-seed param that gcc implemented). However, we need some reproducible, cryptographically secure source of randomness for the each random decision made during our transformations. We have found that the system randomness (/dev/random) is insufficient for this purpose since reproducible builds (given the secret seed) are preferable. The only way to provide this reproducible stream of randomness is to have a process-specific RNG, which we implement on top of OpenSSL for simplicity.

Please make sure that the LLVM nightly test suite passes with randomization enabled.

Of course. Our patched version currently passes the existing test suite on x86_64, and after adding additional tests we will certainly make sure that the final patch passes the latest test suite.

Thanks,
Stephen

How is the "diverse population" of binaries generated and delivered? The tradition
software development model is to qualify one “golden master” which is then
duplicated to all customers.

-Nick

> We would also include a secure random number generator which links
> against OpenSSL. This would of course be an optional module disabled
> by default, but is necessary so the randomization is cryptographically
> secure and useful in security applications.

I am not sure why you need this feature. You can provide LLVM with a
SEED value that can be controlled from the command line. A wrapper (such
as a build-script) can control this value.

(disclaimer: I was a member of Stephen's research group and worked on this
project.)

To my knowledge, the pseudorandom number generator in LLVM is not
cryptographically secure. In this use case, the intent is to make it
difficult to get the random number seed or the generator's state at any
point in the random number stream by looking at the output. If the state
of the generator can be compromised, then an attacker can predict the
output of the generator by playing it forward. If an attacker can play the
generator forward, then the attacker can reproduce the rest of the
executable, including the randomized components that are no longer random
to the attacker. Reproducing the executable means that diversification
isn't going to work because the attacker can plan around it.

For reproducibility, such as for debugging, a pure software generator is a
good idea. This also prevents blocking read operations in the generator,
slowing down the compiler. Software generators can be optimized for speed.
These are reasons to avoid /dev/random et al.

Personally, I think it is necessary to go for the strongest random number
generator possible. Cryptographically secure pseudorandom number
generators have good properties that make them resistant to compromise. In
the case of the generator I think Stephen is proposing -- a generator based
on running AES-128 in Counter Mode and described in one of NIST's documents
-- the security comes from strong crypto building blocks, while still
suitable for embedding in a compiler.

Putting on my security hat (as opposed to my lurking-on-compiler-mailing-lists hat), note that artificial software heterogeneity doesn't actually prevent ROP, it makes it harder in a qualitatively similar way to ASLR. With ASLR, the attacker needs to discover a single memory address in order to construct a return-oriented program. What you're proposing requires reading out more of the address space.

A recent paper at Oakland proposed a "just-in-time code reuse" attack that repeatedly uses a memory disclosure and JITs an attack based on the results [1]. So while it does make the attackers job harder, it doesn't prevent such attacks.

1. Kevin Z. Snow, Fabian Monrose, Lucas Davi, Alexandra Dmitrienko, Christopher Liebchen, and Ahmad-Reza Sadeghi. Just-In-Time Code Resule: On the Effectiveness of Find-Grained Address Space Layout Randomization. In Proceedings of the IEEE Symposium on Security and Privacy ("Oakland") 2013. <http://cs.unc.edu/~fabian/papers/oakland2013.pdf&gt;

Hi Nick,

How is the "diverse population" of binaries generated and delivered? The tradition software development model is to qualify one “golden master” which is then duplicated to all customers. -Nick

Yes indeed. Adding diversity at compilation requires that the code producer create a population of variants. However, by introducing diversity at compile-time, we have much greater freedom in transforming the end result with lower performance impact. In addition, producing variants during distribution allows the distributor to use diversity to provide a certain amount of watermarking and protection against client-side tampering (jailbreaking, etc).

We initially forsee that this could be especially used where security was of the utmost concern. Open-source end users often have the option of compiling from source, and could create their own diversified versions, especially for criticial services. We think this would be an ideal situation to begin adopting diversity for security.

Of course, in the testing use-case, creating various versions is fairly trivial. These versions could then be compared for testing for micro-architectural and compiler corner cases, as well as performance.

Finally, for wide-spread adoption, we are currently researching ways to cache the bulk of the compilation effort and create many variants for distribution in a cost-effective manner using cloud compilation. In addition, error reporting can be normalized with knowledge of the secret random seed by regeneration of the particular binary.

Hope this helps to explain our ideas in more clarity.

- stephen

Hi Stephen,

Great to see a fellow security researcher lurking on this list as well!

You raise a very valid point here. I quite agree that diversity is not guaranteed to prevent ROP. Very sorry -- I wasn't as careful in my original wording as I certainly should have been. In practice, we believe introducing variation significantly raises the bar for attackers, which is a big step in the right direction. For many applications this may be more than sufficient. However, diversity may not be good enough for all applications, particularly in the presence of possible remote memory disclosures.

- stephen

> We would also include a secure random number generator which links

> against OpenSSL. This would of course be an optional module disabled
> by default, but is necessary so the randomization is cryptographically
> secure and useful in security applications.

I am not sure why you need this feature. You can provide LLVM with a
SEED value that can be controlled from the command line. A wrapper (such
as a build-script) can control this value.

(disclaimer: I was a member of Stephen's research group and worked on
this project.)

To my knowledge, the pseudorandom number generator in LLVM is not
cryptographically secure. In this use case, the intent is to make it
difficult to get the random number seed or the generator's state at any
point in the random number stream by looking at the output. If the state
of the generator can be compromised, then an attacker can predict the
output of the generator by playing it forward. If an attacker can play the
generator forward, then the attacker can reproduce the rest of the
executable, including the randomized components that are no longer random
to the attacker. Reproducing the executable means that diversification
isn't going to work because the attacker can plan around it.

I should think that the choices at each decision point of the randomized

code-generation effect would require only a few bits from the output of
each run of the RNG, and you can run the RNG again for each decision point.
Because the vast majority of the RNG output is therefore not available to
the attacker, it's really really really hard to reconstruct the sequence.
Even if it's not a crypto-based RNG.

Any RNG with adequately long cycles, reasonable bit-width, and
minimum-width enforcement on the seed value would be fine. And
computationally a lot cheaper than a crypto-based RNG.

For reproducibility, such as for debugging, a pure software generator is a
good idea. This also prevents blocking read operations in the generator,
slowing down the compiler. Software generators can be optimized for speed.
These are reasons to avoid /dev/random et al.

Personally, I think it is necessary to go for the strongest random number
generator possible. Cryptographically secure pseudorandom number
generators have good properties that make them resistant to compromise. In
the case of the generator I think Stephen is proposing -- a generator based
on running AES-128 in Counter Mode and described in one of NIST's documents
-- the security comes from strong crypto building blocks, while still
suitable for embedding in a compiler.

Security comes from careful threat analysis and establishing
counter-measures appropriate to the threats, which might or might not
warrant crypto. My house would be "more secure" if I put 24x7 armed guards
around it, but the threat level doesn't justify the cost.

As for using AES-128, I see buzzword value, but no real technical need.
(No question that "crypto == good" syndrome comes into play here; it's
rare that you have to defend using crypto even if it isn't warranted.
Until you run into a cranky-pants like me!) In any case you need a
fallback for when OpenSSL isn't available. I'm not claiming what LLVM has
now is adequate for you (looks like it uses rand(2)) but AES-128 seems like
overkill. (I've lost track of the general crypto-export-control state of
things, but just a reminder that LLVM avoids anything that could possibly
be export-controlled.)

Pogo

This is a very good point. It may help to clarify your threat model here. Let's think about who the attackers are. Some possibilities:

1. Local attacker who can read the contents of the binary. This defense doesn't really buy you anything given automated attack creation frameworks like Q [1].

2. Local attacker who cannot read the contents of the binary. (This is a pretty strange one, but it's possible.) The attacker is forced to rely on side channel information such as timing channels in an attempt to discover the length of the inserted NOP sleds. This sounds like an extraordinarily difficult task, but possibly doable. With a weak PRNG like a LCG, for example, you may have sufficient information to break it [2] (I believe there are better attacks, but this is the first that came up with a quick search).

3. Remote attacker who can use a memory disclosure bug to read the contents of the memory. Like attacker 1, the defense can be bypassed.

4. Remote attacker who cannot read out memory. This is similar to 2 but would seem to be far more difficult since your timing channel is far more noisy and the delay added by the NOPs is miniscule. Given how many sessions the latest DTLS attack took [3], even for attackers on the same LAN, and the fact that the difference in timing was in number of blocks to be MAC'd which takes at least two order of magnitude more time than some NOPs (500 to 1000 cycles according to AlFardan and Paterson), I'm doubtful the number of NOPs in even a single sled could be recovered.

For attackers 1 and 3, any PRNG is as good as any other. For 4, I'd be shocked if anything could be recovered using any PRNG (for whatever that's worth). Attacker 2 seems like the only situation where choice of PRNG might actually matter in practice.

Are there other classes of attackers you have in mind?

Of course, AES is _really_ fast [4] and in general, for a security application, I can't think of a good reason not to use a CSPRNG when random numbers are warranted.

1. http://users.ece.cmu.edu/~ejschwar/papers/usenix11.pdf
2. http://www.reteam.org/papers/e59.pdf
3. http://www.isg.rhul.ac.uk/tls/TLStiming.pdf
4. http://cr.yp.to/aes-speed/aesspeed-20080926.pdf

> We would also include a secure random number generator which links

> against OpenSSL. This would of course be an optional module disabled
> by default, but is necessary so the randomization is cryptographically
> secure and useful in security applications.

I am not sure why you need this feature. You can provide LLVM with a
SEED value that can be controlled from the command line. A wrapper (such
as a build-script) can control this value.

(disclaimer: I was a member of Stephen's research group and worked on
this project.)

To my knowledge, the pseudorandom number generator in LLVM is not
cryptographically secure. In this use case, the intent is to make it
difficult to get the random number seed or the generator's state at any
point in the random number stream by looking at the output. If the state
of the generator can be compromised, then an attacker can predict the
output of the generator by playing it forward. If an attacker can play the
generator forward, then the attacker can reproduce the rest of the
executable, including the randomized components that are no longer random
to the attacker. Reproducing the executable means that diversification
isn't going to work because the attacker can plan around it.

I should think that the choices at each decision point of the randomized

code-generation effect would require only a few bits from the output of
each run of the RNG, and you can run the RNG again for each decision point.
Because the vast majority of the RNG output is therefore not available to
the attacker, it's really really really hard to reconstruct the sequence.
Even if it's not a crypto-based RNG.

Yes and no. My primary concern was RNG state compromise.

You are correct in that the majority of the RNG output is probably not
available to the attacker. However, it is still possible to reconstruct
the state of the RNG depending on how it's designed. I did an experiment
where I used a linear congruential generator (I know they're not great
generators either, but it was a place to start when defining requirements)
to add random pads to the stack frame, and then tried to reconstruct the
seed from observing the resulting executable. I was able to do this in a
few minutes on a Pentium 4, using three values of the RNG in order. I can
provide details on the experiment off list if people are interested.

My requirements for this use case became:
1) Reproducible and deterministic. This is a side effect of researchers
thinking I was insane for proposing a RNG inside of a compiler, saying that
output couldn't be debugged since it was random. The intent is that if you
put in the same seed and source code, you will always get the same
executable.
2) Resistant to state compromise.
3) Small enough to embed into a compiler that's open source, without making
assumptions about underlying hardware instructions. That's why I used AES
in counter mode -- people have optimized the heck out of it to run on
multiple platforms. Compilers already take long enough to build on some
platforms, and I didn't want to add a whole new library.
4) Generates good enough randomness. I trusted NIST on this one. Linear
conguential generators don't fit. Last thing I wanted was a generator that
"looks" random, biases or invalidates results, and gives a false sense of
security.

Any compliant RNG will do. I used a crypto based RNG because it was
available, I was unaware of any other resistant generators, and I was being
conservative, not because I was looking for "buzzwords".

Any RNG with adequately long cycles, reasonable bit-width, and
minimum-width enforcement on the seed value would be fine. And
computationally a lot cheaper than a crypto-based RNG.

Maybe. State compromise still matters. Sure, I could have used a hardware
generator, or Intel's AES instructions, but that didn't meet the
requirements.

As an implementation detail, I had to do key stretching to turn a 64-bit
numerical seed into something long enough to properly seed the generator.
This was the actual time sink (measured by profiling), not the generator,
as AES was fast enough for proof of concept, and my experiments didn't
generate that many random numbers. Stephen's version may vary from mine.

For reproducibility, such as for debugging, a pure software generator is

a good idea. This also prevents blocking read operations in the generator,
slowing down the compiler. Software generators can be optimized for speed.
These are reasons to avoid /dev/random et al.

Personally, I think it is necessary to go for the strongest random number
generator possible. Cryptographically secure pseudorandom number
generators have good properties that make them resistant to compromise. In
the case of the generator I think Stephen is proposing -- a generator based
on running AES-128 in Counter Mode and described in one of NIST's documents
-- the security comes from strong crypto building blocks, while still
suitable for embedding in a compiler.

Security comes from careful threat analysis and establishing
counter-measures appropriate to the threats, which might or might not
warrant crypto. My house would be "more secure" if I put 24x7 armed guards
around it, but the threat level doesn't justify the cost.

I believe you've missed my point. I was aiming for a conservative proof of
concept and a secure RNG.

As for using AES-128, I see buzzword value, but no real technical need.

I'm not sure what to make of that. See above. If you're aware of an RNG
that meets the above requirements, please feel free to suggest it. It's
entirely possible I've missed something.

(No question that "crypto == good" syndrome comes into play here; it's
rare that you have to defend using crypto even if it isn't warranted.
Until you run into a cranky-pants like me!) In any case you need a
fallback for when OpenSSL isn't available. I'm not claiming what LLVM has
now is adequate for you (looks like it uses rand(2)) but AES-128 seems like
overkill. (I've lost track of the general crypto-export-control state of
things, but just a reminder that LLVM avoids anything that could possibly
be export-controlled.)

Fair point on export, but the cipher and/or generator can be substituted.

I think others more qualified than myself have sufficiently addressed why having a strong (aka crypto-secure) RNG is a good idea, so I will leave that. However, I can shed some light on a few of the practical concerns of our proposed additions.

Hi Stephen,

Thanks for spelling out the possible attack models in detail. This is pretty much the way we were thinking. However, I wanted to make a minor clarification on our proposed implementation, just to be clear. Rather than inserting NOP sleds before functions, we actually insert various NOP-like instructions randomly between existing MachineInstrs, in order to provide more fine-grained diversity. This gives the attacker as little information as possible about the exact layout of the final binary. In terms of RNG selection, we were especially concerned about partial leakage of binary contents (or leakage of some but not all binary files from a build sharing the same seed) revealing the initial seed and therefore, by recompilation, the entire binary.

- Stephen Crane

Thanks. This is really interesting -- it's hard to find good protections
against ROP attacks, and this is promising. However, I'm going to start
with the "bad news" first. I have a few questions as to whether this is
useful for our (llvm's) users in the real world:

1. I'm concerned about the deployment problem. I realize that being in the
compiler means you can transform the program in more exciting ways, but it
gives you a much worse deployment story than something which modifies the
program on disk like "prelink".

2. Does this actually fill a gap in our protections? How do we ever get
into the situation where the user is able to deploy a ROP attack against
us, without tripping asan or ubsan or something caught by our warnings or
the static analyzer or any of the other protections offered by clang and
llvm? It may suffice that there exists a niche which can't afford the
performance penalty from asan or other things, but then we'll need to
discuss what the performance impact is.

3. Reproducible builds are a must. GCC has a -frandom-seed=text flag and
you should use that here too. Given the same random seed, the compiler
shall produce the same output.

Ultimately my concern here derives from the fact that I do use clang
warnings today and I do use asan and ubsan today. If this ROP-protection
were added, I don't immediately see how I would use it. Just being I'm not
inside the target audience doesn't mean the target audience doesn't exist,
but I am asking for a little justification.

And one issue for us, the llvm developers. If we're going to accept this
feature, own it and maintain it, how can we test it? We know how to test
for correctness and measure performance, we even understand what
protections -fstack-protector or ASAN offer and can write spot-tests for
them, but how can we measure the effectiveness of probability-based
ROP-protections? We don't need an answer to this right now, but over time
we'll be maintaining it, and it seems plausible that we could accept a
patch which accidentally diminishes the actual security provided (image us
maintaining a random number generator -- it's possible to add a weakness
which isn't easily noticed).

There's a "good news" side to this too. Over lunch I talked with one of our
security guys, and he's excited. He tells me that diversity for
ROP-protection is entirely the way to go, and speculates that it ought to
be deployable by other vendors. Furthermore, we expect that rotating the
register allocation is going to be especially painful on the exploit
authors.

Looking forward to seeing patches!

Nick

Rather than inserting NOP sleds before functions, we actually insert various NOP-like instructions randomly between existing MachineInstrs, in order to provide more fine-grained diversity.

Right. I assumed it was something like the multibyte NOPs in <http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/37204.pdf&gt; (Section 5.4).

This gives the attacker as little information as possible about the exact layout of the final binary. In terms of RNG selection, we were especially concerned about partial leakage of binary contents (or leakage of some but not all binary files from a build sharing the same seed) revealing the initial seed and therefore, by recompilation, the entire binary.

Oh interesting. I hadn't thought about partial information in that way.

One thought I had about seed selection is if it ends up being based on time (when not chosen by command line flags), then do you have issues with files that use the __TIME__ macro? Or did you have a better seeding mechanism?

Tools like asan and ubsan don’t detect ever type of problem. From my understanding of asan, it only performs checks on loads and stores and delays reuse of virtual address space for memory allocations, so it is not guaranteed to catch all use-after-free and array bounds errors. To catch everything, you would need to use a tool like Softbound + CETS. Even then, you need to exercise all the program paths which I suspect does not happen in practice. Static analysis tools don’t find every possible memory safety bug, either. I believe it’s theoretically impossible for them to do so. Run-time protections are attractive because they can guarantee that some bad behavior does not happen in any execution of the program. Is there a reason why your security guy thinks that diversity is a better protection than control-flow integrity (CFI)? Recent work on CFI [1] indicates that even very conservative call graphs remove nearly all gadgets from the list of valid function targets (hence they cannot be used). That implementation has an average overhead of 3.6% and a max overhead of 8.6%. A more flexible implementation [2] that already works for LLVM has average overhead of 10%. There’s more recent work on CFI (including at least one paper that uses LLVM) published this month at Usenix Security, but I haven’t had time to read it yet. Control-flow integrity should be effective in stopping ROP attacks, and we should be able to deploy it without using inter-procedural analysis (although inter-procedural analysis can improve the call graph results to make the program even more secure). I think diversity is a nice thing to have to provide defense in depth, but I currently think that CFI will provide the most bang for the buck. – John T. [1] [2]

Is there a reason why your security guy thinks that diversity is a
better protection than control-flow integrity (CFI)? Recent work on CFI
[1] indicates that even very conservative call graphs remove nearly all
gadgets from the list of valid function targets (hence they cannot be
used). That implementation has an average overhead of 3.6% and a max
overhead of 8.6%. A more flexible implementation [2] that already works
for LLVM has average overhead of 10%. There's more recent work on CFI
(including at least one paper that uses LLVM) published this month at
Usenix Security, but I haven't had time to read it yet.

I agree that CFI is a very interesting technique and the fact that the
Usenix best paper award went to a paper on CFI (via binary rewriting
however) just underscores this view. My colleague, Stephen, posted a link
to a paper on our LLVM implementation of diversity which uses the profile
guidance infrastructure to lower the overheads of diversification below the
numbers you cite for CFI-on Spec2006, we report an average overhead of 1%
and a max. overhead of 2.5%. I think this difference matters a lot given
that single core performance is improving much more slowly now and power
consumption matters a lot in mobile. So while both techniques offers good
security, profile guided diversity seems to have an edge w.r.t. efficiency.

I should point out that the initial patch set we are suggesting to upstream
does not use profile guidance; we'd simply like to start out small.

I think diversity is a nice thing to have to provide defense in depth, but
I currently think that CFI will provide the most bang for the buck.

I'm glad you see the value in diversity. I think these are ultimately
complimentary techniques. If a developer don't mind the higher overhead
from CFI and/or need to continue distributing a single binary to all
end-users, then CFI should be offered, in the opposite case, LLVM should be
able to diversify. Finally, we think schedule-randomization helps tease out
bugs in the compiler and novel micro-architectures.

Cheers,
Per

Greetings LLVM Devs!

I am a PhD student in the Secure Systems and Software Lab at UC
Irvine. We have been working on adding randomness into code generation
to create a diverse population of binaries. This diversity prevents
code-reuse attacks such as return-oriented-programming (ROP) by
denying the attacker information about the exact code layout. ROP has
been used is several high-profile recent attacks, and has also been
used as a jailbreaking avenue. We believe our transformations would
provide a significant security benefit for LLVM users who choose to
use diversity. For more details see [1] (although we are currently
proposing to upstream only a simplified subset of our work).

We would like to contribute some of our work back to the community,
and are preparing a small patch adding two new features: NOP insertion
and schedule randomization. The NOP insertion pass randomly adds NOPs
after each MachineInstr according to a command-line
parameter. Currently NOP insertion is implemented for X86, and we are
adding support for ARM. The schedule randomizer randomly picks a valid
instruction to schedule at every point, bypassing the scheduling
heuristics. These passes result in a binary which, while slightly
slower, is far more secure against code-reuse attacks. In addition,
schedule randomization may be useful for randomized compiler and
micro-architecture testing.

We would also include a secure random number generator which links
against OpenSSL. This would of course be an optional module disabled
by default, but is necessary so the randomization is cryptographically
secure and useful in security applications.

We are in the process of writing test cases and double checking
formatting to produce a useful patch, but would like to solicit
feedback on our proposed changes before submitting patches for
detailed consideration.

Thanks. This is really interesting -- it's hard to find good protections
against ROP attacks, and this is promising. However, I'm going to start with
the "bad news" first. I have a few questions as to whether this is useful
for our (llvm's) users in the real world:

1. I'm concerned about the deployment problem. I realize that being in the
compiler means you can transform the program in more exciting ways, but it
gives you a much worse deployment story than something which modifies the
program on disk like "prelink".

2. Does this actually fill a gap in our protections? How do we ever get into
the situation where the user is able to deploy a ROP attack against us,
without tripping asan or ubsan or something caught by our warnings or the
static analyzer or any of the other protections offered by clang and llvm?
It may suffice that there exists a niche which can't afford the performance
penalty from asan or other things, but then we'll need to discuss what the
performance impact is.

Tools like asan and ubsan don't detect ever type of problem. From my
understanding of asan, it only performs checks on loads and stores and
delays reuse of virtual address space for memory allocations, so it is not
guaranteed to catch all use-after-free and array bounds errors. To catch
everything, you would need to use a tool like Softbound + CETS. Even then,
you need to exercise all the program paths which I suspect does not happen
in practice.

Static analysis tools don't find every possible memory safety bug, either.
I believe it's theoretically impossible for them to do so.

Run-time protections are attractive because they can guarantee that some bad
behavior does not happen in any execution of the program.

3. Reproducible builds are a must. GCC has a -frandom-seed=text flag and you
should use that here too. Given the same random seed, the compiler shall
produce the same output.

Ultimately my concern here derives from the fact that I do use clang
warnings today and I do use asan and ubsan today. If this ROP-protection
were added, I don't immediately see how I would use it. Just being I'm not
inside the target audience doesn't mean the target audience doesn't exist,
but I am asking for a little justification.

And one issue for us, the llvm developers. If we're going to accept this
feature, own it and maintain it, how can we test it? We know how to test for
correctness and measure performance, we even understand what protections
-fstack-protector or ASAN offer and can write spot-tests for them, but how
can we measure the effectiveness of probability-based ROP-protections? We
don't need an answer to this right now, but over time we'll be maintaining
it, and it seems plausible that we could accept a patch which accidentally
diminishes the actual security provided (image us maintaining a random
number generator -- it's possible to add a weakness which isn't easily
noticed).

There's a "good news" side to this too. Over lunch I talked with one of our
security guys, and he's excited. He tells me that diversity for
ROP-protection is entirely the way to go, and speculates that it ought to be
deployable by other vendors. Furthermore, we expect that rotating the
register allocation is going to be especially painful on the exploit
authors.

Is there a reason why your security guy thinks that diversity is a better
protection than control-flow integrity (CFI)? Recent work on CFI [1]
indicates that even very conservative call graphs remove nearly all gadgets
from the list of valid function targets (hence they cannot be used). That
implementation has an average overhead of 3.6% and a max overhead of 8.6%.
A more flexible implementation [2] that already works for LLVM has average
overhead of 10%. There's more recent work on CFI (including at least one
paper that uses LLVM) published this month at Usenix Security, but I haven't
had time to read it yet.

Control-flow integrity should be effective in stopping ROP attacks, and we
should be able to deploy it without using inter-procedural analysis
(although inter-procedural analysis can improve the call graph results to
make the program even more secure).

I think diversity is a nice thing to have to provide defense in depth, but I
currently think that CFI will provide the most bang for the buck.

As the aforementioned security guy, let me be clear: I'm excited about
CFI and ASAN too :).

IMO, there's enough room in the world for mitigations at both the
low-cost and the high-value ends of the scale. I don't have enough
data (or patches, hint hint) to figure out if this does what's on the
tin, but my hope is that it will turn out to be a low cost technique
to deploy in places where more comprehensive tools are considered
prohibitively expensive.

Geremy Condra

1. I'm concerned about the deployment problem. I realize that being in the compiler means you can transform the program in more exciting ways, but it gives you a much worse deployment story than something which modifies the program on disk like "prelink".

Yes, definitely. Deployment is an issue which users will need to consider when deciding whether to use compiler-based diversity. However there are certainly use-cases which could benefit, especially in small deployments. Additionally, the benefits may outweigh the increase in deployment costs for larger deployments.

2. Does this actually fill a gap in our protections? How do we ever get into the situation where the user is able to deploy a ROP attack against us, without tripping asan or ubsan or something caught by our warnings or the static analyzer or any of the other protections offered by clang and llvm? It may suffice that there exists a niche which can't afford the performance penalty from asan or other things, but then we'll need to discuss what the performance impact is.

Briefly looking at ASAN again, I saw a performance penalty of 2x mentioned. Diversity could act as both defense in depth, and as a lower-impact defense for performance critical code.

3. Reproducible builds are a must. GCC has a -frandom-seed=text flag and you should use that here too. Given the same random seed, the compiler shall produce the same output.

Completely agree, and that's exactly what we have done.

And one issue for us, the llvm developers. If we're going to accept this feature, own it and maintain it, how can we test it? We know how to test for correctness and measure performance, we even understand what protections -fstack-protector or ASAN offer and can write spot-tests for them, but how can we measure the effectiveness of probability-based ROP-protections? We don't need an answer to this right now, but over time we'll be maintaining it, and it seems plausible that we could accept a patch which accidentally diminishes the actual security provided (image us maintaining a random number generator -- it's possible to add a weakness which isn't easily noticed).

Unfortunately testing for security is a difficult problem. We have currently been evaluating our work by comparing a population of randomized variants to make sure that no gadgets are the same across variants. While this is reasonable for research evaluation, I'm not sure that would be practical (or useful) for the compiler test suite. To prevent regressions, I think the test suite will need to test at least 3 major areas: making sure that RNG results are consistent and deterministic, testing that NOPs are actually inserted and scheduling decisions randomized, and verifying that different seeds result in entirely different binaries with approximately equal amounts of diversifications.

Speaking of maintenance, we plan to continue work on this in our lab. We can certainly maintain these features at least for the next year and a half, and possibly longer.

There's a "good news" side to this too. Over lunch I talked with one of our security guys, and he's excited. He tells me that diversity for ROP-protection is entirely the way to go, and speculates that it ought to be deployable by other vendors. Furthermore, we expect that rotating the register allocation is going to be especially painful on the exploit authors.

As mentioned, we have an implementation of register randomization, although it will require some reworking to be ready for prime-time. We can definitely look into getting that polished.

- stephen

I also find the prelink approach interesting, particularly if the OS is already doing prelinking. The idea is that the compiler always generates one binary, but it makes a record of permutation that could be made at prelink time. For instance, register r5 is restored in the epilog, but it is only used in three other instructions, so prelinking could switch them all to use r6. Or here is how to slide a block of code within a function. The end result of these recorded potential randomizations is to alter the ROP gadget.

This would allow you to generate one golden binary and release it, then on each machine where it is installed, the prelinking would apply some random permutations to the code. You get your diversity across machines, but only need to ship one binary.

-Nick