I have been working on something along similar lines:
translating Alpha machine code into LLVM. It is probably
much more simple to handle Alpha instructions than x86
instructions. Translating individual instructions is easy,
more analysis is needed to eliminate things like register allocated variables, function call setup/prolog/epilogue etc.
Machine specific operations can be encoded using intrinsic
functions. As for how suitable the LLVM IR is for doing
any optimizations, I think it should be able do most of the
basic optimizations provided that we do a good job of
translation.
I have been working on something along similar lines:
translating Alpha machine code into LLVM.
Cool. I'll be interested to see how this goes, please send mail to this
list when you have reportable results.
Transforming machine code in ways that leave its data layout intact is not
that hard, people have been doing this for a long time. I'm particularly
interested in machine code transformations that change data layout. This
appears to be difficult to do in a sound way since it boils down to alias
analysis of machine code, which boils down to something like type safety.
Consider, for example, a program that (possibly on purpose) overwrites a
return value on its own call stack. It might make sense to bail out if
you can prove that a program does this, but it's not so easy to prove that
a program does not do this. On the other hand, if you assume somewhat
well-behaved code that was produced by a compiler and where ptr accesses
are in bounds, the problem gets pretty easy again.