Reassociate lose parallelism

I compile this very simple c-program:

#define T unsigned
T foo(T a, T b, T c, T d) {
return (a+b)+(c+d);

Before reassociate, the first two adds in the IR are made in parallel:

%add = add i16 %a, %b
%add1 = add i16 %c, %d
%add2 = add i16 %add, %add1
ret i16 %add2

After reassociate, the adds have been serialized:

%add1 = add i16 %b, %a
%add = add i16 %add1, %c
%add2 = add i16 %add, %d
ret i16 %add2

It seems to me that RewriteExprTree() does this and there's this comment:
// Not the last operation. The left-hand side will be a sub-expression
// while the right-hand side will be the current element of Ops.
So I gather the serialization is a result of this algorithm.

Now, my question is if the reassociate pass is supposed to care about the
depth of expression trees, or if a conscious tradeoff has been made to not

(I made a quick hack to bail out if the depth of the original expression
would increase in RewriteExprTree(). Our benchmark suite had the hack kick
in a few times, with a clear improvement in one benchmark and another
benchmark being better in unweighted cycles but worse in loop weighted