NVPTX codegen surprisingly slow on some functions

-debug-pass=Structure will show you the cumulative set of passes run

The LoadStoreVectorizer does have some quadratic behavior in it.