Performance Analysis of basic operations in TPU memory structure

I am trying to perform a performance analysis of basic operations in TPU and try to do a benchmarking in the different memory hierarchies. I am trying to use the code below in Cloud TPUs.

I am wondering that is there any memory type classification in TPU as in GPUs like local memory, global memory, texture memory, or register memory.

If there is what kind of HLO representation do I need to use?

This may be better answered Tensorflow discourse/repo side (or openxla/xla repo) than here.