[RFC] Adding support #pragma clang loop [no]prefetch() for prefetch

I think I need to redesign this pragma. I intend to perfect this work with the following four pragma.

  1. #pragma clang loop prefetch(disable) LoopDataPrefetch Pass is enabled by default in some CPU architectures, such as aarch64. However, the perfect loop data prefetch policy cannot be always guaranteed. Incorrect prefetch mode may deteriorate program performance. The pragma may be used to disable data prefetching for a specifical loop. I think this will make it easy for developers to do performance debugging.
  2. #pragma clang loop prefetch(enable) The LoopDataPrefetch Pass is allowed to provide loop data prefetch for all loads/stores in the loop as much as possible. The specific prefetch implementation depends on the LoopDataPrefetch capability. The pragma has not yet determined a specific support solution. Maybe I’ll put it at the end for support.
  3. #pragma clang loop prefetch(variable[[, <r/w>, , ]; variable[, …];…]) The variable is a required argument, and the <r/w>, <trip count ahead>, <cache level> arguments are optional. The reason why this pragma is proposed is to take full advantage of the LoopDataPrefetch Pass, On the other hand, it is difficult for the downstream to expand builtin_prefetch, because if the design of builtin_prefetch is different between different compilers, there will be compatibility problems, which will cause the program to fail to compile normally. So for programs that use builtin_prefetch, the migration cost of project developers is very high. I think pragma can well avoid this problem.
  4. #pragma clang loop noprefetch(variable) Provides fine-grained data prefetch control. Currently, at the compiler level, there is no control method that can be specifically used to disable data prefetching of a certain variable. Now, we hope to provide a control method.