Request for comments on optimizing assembler

Hi Colin,

Hi everyone, we've been prototyping an optimizing assembler for Hexagon for
the purpose of updating legacy assembly for new architectures, packet rules,
and instruction latencies. It seems like others would be interested in
using this and we're looking for any related feedback: has it been attempted
before, who's interested, or any general suggestions.

Regarding the "has it been attempted before" question -- if you're looking for general (not necessarily LLVM specific) sources of inspiration / related projects, there's been a few that perhaps may be interesting.
Some of the following projects & references fall into the related category of binary rewriting for optimization purposes, which may also be interesting to explore on its own.

Robert Hundt, Easwaran Raman, Martin Thuresson, and Neil Vachharajani. 2011. MAO -- An extensible micro-architectural optimizer. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 1-10. https://research.google.com/pubs/archive/37077.pdf
- presents MAO, an extensible micro-architectural assembly to assembly optimizer, which seeks to address this problem for x86/64 processors

Bruno De Bus, Daniel Kästner, Dominique Chanet, Ludo Van Put, and Bjorn De Sutter. 2003. Post-pass compaction techniques. Commun. ACM 46, 8 (August 2003), 41-46.
- refers to another work by the authors: "To illustrate the potential of these techniques, three existing post-pass optimizers developed by the authors—the assembly optimizer aiPop and the link-time optimizers Squeeze++ and Diablo—are evaluated in three sidebars appearing at the end of this article." [. . .] "aiPop (see www.AbsInt.com/aipop and [5]) is a commercial assembly-based post-pass optimizer for the C16x/ST10 processor family that performs a wide range of code optimizations. Quick retargeting to other processors is supported by an underlying hardware specification mechanism."
- with [5] in the above referring to: Ferdinand, C. Post-pass code compaction at the assembly level for C16x. Infineon Technologies Development Tool Partners Magazine (2001). https://www.absint.com/aipop/aiPop_c3935.pdf

More research by these authors may also be of interest -- particularly the works listed under Whole-Program Compiler Techniques & Binary Rewriting: Bjorn De Sutter: all publications
For instance:
- Bjorn De Sutter, Bruno De Bus, and Koen De Bosschere. 2005. Link-time binary rewriting techniques for program compaction. ACM Trans. Program. Lang. Syst. 27, 5 (September 2005), 882-945.
http://users.elis.ugent.be/~brdsutte/research/publications/2005TOPLASdesutter.pdf
- De Sutter, B., De Bus, B., De Bosschere, K., Keyngnaert, P., Demoen, B.: On the static analysis of indirect control transfers in binaries. In: H.R. Arabnia (ed.) International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2000. CSREA Press (2000) http://users.elis.ugent.be/~brdsutte/research/publications/2000PDPTAdesutter.pdf

Other projects:
- MAQAO (Modular Assembly Quality Analyzer and Optimizer): http://www.maqao.org/, http://maqao.bordeaux.inria.fr/
- The alto Project: Link-time Code Optimization: The alto Project Home Page: Link-time Code Optimization - cf. "Software Power Optimization via Post-Link-Time Binary Rewriting," by Saumya Debray, Robert Muth, and Scott Watterson. Draft, Feb. 2001. http://www2.cs.arizona.edu/~debray/Publications/poweropt.pdf
- An optimizing decompiler - http://zneak.github.io/fcd - allows to write custom optimization passes: http://zneak.github.io/fcd/2016/02/21/csaw\-wyvern\.html

Best,

Matt