scalar replacement of aggregates slower?

I just upgraded our optimizer to LLVM 3.0 from 2.8 and noticed that the scalar replacement of aggregates pass takes a lot longer for some code. Has there been a performance regression in this pass, or does it do more work?

LLVM 3.0:

   Total Execution Time: 1.0600 seconds (1.0526 wall clock)

    ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
    0.5100 ( 49.5%) 0.0000 ( 0.0%) 0.5100 ( 48.1%) 0.5099 ( 48.4%) Scalar Replacement of Aggregates (SSAUp)
    0.1900 ( 18.4%) 0.0300 (100.0%) 0.2200 ( 20.8%) 0.2156 ( 20.5%) Scalar Replacement of Aggregates (DT)
    0.1200 ( 11.7%) 0.0000 ( 0.0%) 0.1200 ( 11.3%) 0.1158 ( 11.0%) VEX Constant Propagation
    0.0200 ( 1.9%) 0.0000 ( 0.0%) 0.0200 ( 1.9%) 0.0196 ( 1.9%) Simplify the CFG
    0.0200 ( 1.9%) 0.0000 ( 0.0%) 0.0200 ( 1.9%) 0.0181 ( 1.7%) Module Verifier
...

LLVM 2.8:

   Total Execution Time: 0.6500 seconds (0.6489 wall clock)

    ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
    0.1400 ( 21.9%) 0.0000 ( 0.0%) 0.1400 ( 21.5%) 0.1379 ( 21.3%) Scalar Replacement of Aggregates
    0.1200 ( 18.7%) 0.0000 ( 0.0%) 0.1200 ( 18.5%) 0.1208 ( 18.6%) VEX Constant Propagation
    0.1000 ( 15.6%) 0.0000 ( 0.0%) 0.1000 ( 15.4%) 0.1050 ( 16.2%) Scalar Replacement of Aggregates
    0.0400 ( 6.3%) 0.0000 ( 0.0%) 0.0400 ( 6.2%) 0.0481 ( 7.4%) Combine redundant instructions
    0.0200 ( 3.1%) 0.0000 ( 0.0%) 0.0200 ( 3.1%) 0.0235 ( 3.6%) Preliminary module verification
...

I’ve patched SROA in a way that may have made it slower. Do you have a testcase we can look at?

Nick