I am wondering that how does the LLVM PTX backend find out the
constraints on executing GPU thread/block/grid size ( i.e. a block can
at most have 1024 threads). Can anyone point me to the code ? I need
information in the optimizer, how can I get it ?
You specify shader model, bit size and etc. arch-specified parameters
though -march, -mattr and -mcpu, but AFAIK, PTX backend does not use
the GPU thread/block/grid size information in optimization yet.
but does it have default values ?
I don't think so, but you should check source code.
The backend currently does not carry any information like this explicitly. The goal is to represent a single thread in LLVM IR, and generate the appropriate thread-level PTX code. At that level, the number of threads per block and blocks per grid is irrelevant. If you need that information, you’re better off feeding it directly into your compiler/optimizer. We can (and should) carry around explicitly stated block size constraints representable in PTX, but that’s a bit different.