Help adding the Bullet physics sdk benchmark to the LLVM test suite?

Hi,

We are developing the open source Bullet physics engine, used by game and movie studios,
and compiler performance tuning is important to us. See http://bullet.googlecode.com
The physics engine includes collision detection, rigid body dynamics and soft body dynamics.

I’ve been following the LLVM project for a while, and it seems the Clang C++ compiler is mature enough
to compile our source tree and benchmarks. Bullet 2.75 uses a lot of SIMD-friendly vector operations,
that might be an interesting addition to the LLVM test suite.

We have cmake and autoconf/automake build system support, and the latest benchmark is just a console application.
The SDK is under Bullet/src and the benchmarks Bullet/Demos/Benchmarks

I’m not very familiar with the LLVM test-suite. Is someone interested to help integrating Bullet in the test suite?
Thanks a lot,
Erwin

Hello, Erwin

The physics engine includes collision detection, rigid body dynamics and
soft body dynamics.

This sounds really promising addition to LLVM testsuite!

to compile our source tree and benchmarks. Bullet 2.75 uses a lot of
SIMD-friendly vector operations,

Which archs are currently supported for SIMD operations?

The SDK is under Bullet/src and the benchmarks Bullet/Demos/Benchmarks
I'm not very familiar with the LLVM test-suite. Is someone interested to
help integrating Bullet in the test suite?

Yes, I am. Should I just use the 2.75 release?
Anything special I should be aware of (e.g. special "benchmark" mode,
or "small" mode for embedded systems) ?

Hi Anton,

Thanks a lot for offering help.

Bullet uses basic linear algebra with 4-way vectors, quaternion and matrices.
Although most of this is plain portable C++ perhaps LLVM can auto-vectorize some of this?

There is a little bit of hand optimized x86 SSE code. This is only enabled on 32bit Windows and Mac OSX Intel builds.

Should I just use the 2.75 release?

If you are interested, I think it is best to start with Bullet 2.75.
If it turns out that LLVM requires some modifications (due to current C++ limitations),
we can modify Bullet and go for an uncoming release such as Bullet 2.76 (planned around January 2010).

There is also ongoing work, together with AMD, to add OpenCL optimizations for Bullet 3.x,
but that is not available yet in the short term.

Anything special I should be aware of (e.g. special “benchmark” mode,

If you compile the Bullet SDK using one of the included build systems, such as cmake or autotools,
it will try to compile all of the demos, many of them are using OpenGL/glut graphics,
and a lot of optional ‘extras’. I suggest removing all of that (Bullet/Extras folder etc).
We modified the Bullet/Demos/Benchmarks so it runs as a console program, without graphics by default.

You could take Bullet/Demos/Benchmarks as a starting point, and modify it so it fits the LLVM test suite?

What hardware/platforms does the LLVM test suite run on typically?
Thanks,
Erwin

2009/12/15 Anton Korobeynikov <anton@korobeynikov.info>

Hello, Erwin

Although most of this is plain portable C++ perhaps LLVM can auto-vectorize
some of this?

Well, I doubt so, unfortunately - LLVM does not have any autopar these days

There is a little bit of hand optimized x86 SSE code. This is only enabled
on 32bit Windows and Mac OSX Intel builds.

Ok. What's about Linux builds? Are there any other implementations
e.g. altivec / NEON ?

You could take Bullet/Demos/Benchmarks as a starting point, and modify it so
it fits the LLVM test suite?

Ok, will do.

What hardware/platforms does the LLVM test suite run on typically?

So far 32 and 64 bit Linux and MacOS, 32-bit ppc/darwin. Also, there
is special "small" mode mainly designed for stuff like ARM.
It will be nice (in theory) to have some NEON/VFP code, since the
testsuite currently lacks any beanchmark for this.

The linux builds are not using SSE right now, but the vector data is
16-byte aligned on all platforms.
So if you port this SSE code to another platform (Linux, Altivec,
NEON), you could contribute it back to Bullet?
The most interesting SSE part is the innerloop of the constraint
solver: Google Code Archive - Long-term storage for Google Code Project Hosting.

Some developers replaced some linear algebra functions (in
Bullet/LinearMath) with VFP/Neon
optimizations, but haven't contributed this back.
This NEON/VFP, part of the an open source iPhone project, could be a
starting point for this:
http://tinyurl.com/y9gv3e8

Thanks,
Erwin

Speaking of which, I've been looking into the loop passes and noticed
we do alias analysis and scalar evolution only, trying to clean up the
loop as far as possible.

I suppose that, if we were to define SCCs, split them into groups and
re-arranging into multiple loops, we would still do it in the IR,
right? Would that spoil any other pass? What passes should run
before/after such a pass?

I believe that would be a FunctionPass and registered in the
LoopDependencyAnalysis "runOnLoop()", so it can run when such pass is
called by the PassManager. Or should it be a completely separate pass
(VectorizationPass?) so we can control it from a separate command-line
flag?

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Hello, Erwin

So if you port this SSE code to another platform (Linux, Altivec,
NEON), you could contribute it back to Bullet?

I believe this should work as-is on linux. Am I missing something?

optimizations, but haven't contributed this back.
This NEON/VFP, part of the an open source iPhone project, could be a
starting point for this:
http://tinyurl.com/y9gv3e8

Ok.

It might compile as-is on linux, if is the SIMD syntax is the same, I haven’t tried it.

Has anyone experimented with PS-DSWP to auto-parallelize?
http://liberty.princeton.edu/videos/step-by-step.php

Thanks,
Erwin

By the way, I’m not sure if I mentioned it before, but there are optionals components
in Bullet that that can benefit from multi threading, using pthreads (or Win32 threads).
It requires a few minor changes in the benchmark, I can help enabling this.

2009/12/16 Anton Korobeynikov <anton@korobeynikov.info>

The linux builds are not using SSE right now, but the vector data is
16-byte aligned on all platforms.
So if you port this SSE code to another platform (Linux, Altivec,
NEON), you could contribute it back to Bullet?
The most interesting SSE part is the innerloop of the constraint
solver: Google Code Archive - Long-term storage for Google Code Project Hosting.

Sounds like a very interesting SSE test.

Some developers replaced some linear algebra functions (in
Bullet/LinearMath) with VFP/Neon
optimizations, but haven't contributed this back.
This NEON/VFP, part of the an open source iPhone project, could be a
starting point for this:
http://tinyurl.com/y9gv3e8

Very unfortunate it's using inline asm. Are there developers should are interested in using NEON intrinsics?

Evan

Hello, Everyone

Sounds like a very interesting SSE test.

I'm working on it. Hopefully it will be added today or tomorrow to
LLVM testsuite.

Hello, Everyone

Sounds like a very interesting SSE test.

I'm working on it. Hopefully it will be added today or tomorrow to
LLVM testsuite.

Awesome! Thanks.

Evan

Hello, Erwin

If you are interested, I think it is best to start with Bullet 2.75.
If it turns out that LLVM requires some modifications (due to current C++
limitations),
we can modify Bullet and go for an uncoming release such as Bullet 2.76
(planned around January 2010).

I added bullet to LLVM testsuite. Basically I had to flatten source
directories since this is a current requirement of the llvm testsuite
harness.
Some include paths tweaks were required due to this. Also, I disabled
the time reports, since otherwise we cannot compare the outputs.

bullet appeared to be ~20% slower for me compared to gcc 4.2.4, so,
definitely something should be worked on :slight_smile:

One questions though: is it possible to "verify" the results of all
the computations somehow? We need to care not only about speed, but
about correctness too :slight_smile:

This is an excellent addition, thanks a lot Anton!

So far it looks like its working on ppc/x86_64 and some ARM testers.
It's not working on smoosh-01 yet (the buildbot driven nightly
tester), I will try to investigate...

- Daniel

Hi Anton, and happy new year all,

One questions though: is it possible to “verify” the results of all
the computations somehow?

Good point, and there is no automated way currently, but we can work on that.
Note that simulation suffers from the ‘butterfly effect’, so the smallest change anywhere,
(cpu, compiler etc) diverges into totally different results after a while.

There are a few ways of verification I can think of:

  1. verifying by adding unit tests for all stages in the physics pipeline (broadphase acceleration structures, closest point computation, constraint solver)
    Given known input and output we can check if the solution is within a certain tolerance.

  2. using the benchmark simulation and verifying the results frame by frame and check for unusual behaviour

  3. modify the benchmark so that it is easier to test the end result, even through it might be different.
    For example, we can drop a number of boxes above a bowl, and after a while make sure all boxes are ‘in’ the bowl in a resting pose.

What are your thoughts?
Thanks,
Erwin

2009/12/19 Anton Korobeynikov <anton@korobeynikov.info>

Hi Anton, and happy new year all,

>>One questions though: is it possible to "verify" the results of all
>>the computations somehow?

Good point, and there is no automated way currently, but we can work on
that.
Note that simulation suffers from the 'butterfly effect', so the smallest
change anywhere,
(cpu, compiler etc) diverges into totally different results after a while.

I haven't been following this thread, but this sounds like a typical
unstable algorithm problem. Are you always operating that close to
the tolerance level of the algorithm or are there some sets of inputs
that will behave reasonably?

If not, the code doesn't seem very useful to me. How could anyone rely
on the results, ever?

In the worst case, you could experiment with different optimization levels
and/or Pass combinations to find something that is reasonably stable.

Perhaps LLVM needs a flag to disable sometimes undesireable transformations.
Like anything involving floating-point calculations. Compiler changes should
not affect codes so horribly unless the user tells them to. :slight_smile: The Cray
compiler provides various -Ofp (-Ofp0, -Ofp1, etc.) levels for this very
reason.

There are a few ways of verification I can think of:

1) verifying by adding unit tests for all stages in the physics pipeline
(broadphase acceleration structures, closest point computation, constraint
solver)
Given known input and output we can check if the solution is within a
certain tolerance.

At each stage? That's reasonable. It could also help identify the parts of
the pipeline that are unstable (if not already known).

2) using the benchmark simulation and verifying the results frame by frame
and check for unusual behaviour

Sounds expensive.

3) modify the benchmark so that it is easier to test the end result, even
through it might be different.

We really don't want to do this. Either LLVM needs to be fixed to respect
floating-point evaluation in unstable cases or the benchmark and upstream code
needs to be fixed to be more stable.

                                       -Dave