CUDA Support for clang-tidy

Hi!
I’m currently doing an internship with facebook and we are looking to add support for device-specific or host-specific checks for cuda code in clang-tidy. Right now, the plan is to do the following:

  • Add cleaner external-facing API for getting info about AST node annotations (primarily to check whether a function is a host or a device function declaration)
  • Add cuda-specific matchers to the AST matcher API (e.g. hostFunctionDecl, deviceFunctionDecl, globalFunctionDecl and analogous ones for variable declarations)
  • go through the existing cuda checks and amend the ones that are not respective to cuda and add ones that might be useful for it (e.g. warning about the usage of control flow)

my question is for now if there is any obvious mistake/missing part in that plan and who might want to be a possible reviewer for such changes?

2 Likes

I am not particularly knowledgeable about CUDA, but aren’t these bits of information achieved with language-specific “keywords” that translate to Attrs (attributes) in the AST? In that case, perhaps hasAttr should suffice?

Indeed, from the AST Matcher Reference:

Matches declaration that has a given attribute.

Given

__attribute__((device)) void f() { ... }

decl(hasAttr(clang::attr::CUDADevice)) matches the function declaration of
f. If the matcher is used from clang-query, attr::Kind parameter should be
passed as a quoted string. e.g., hasAttr("attr::CUDADevice").

This should mean that point 2 (adding new top-level matchers that hoist some predicates to the matcher library) should suffice without having to change the AST structure itself.

1 Like

I’m not familiar with clang-tidy internals, so the following drive-by suggestions are just to highlight potential sources of problems related to CUDA compilation in general.

One issue that the API may need to deal with is whether we want the functions with explicit host/device attributes or implicit ones (e.g. clang treats lambdas and constexpr functions as implicitly __host__ __device__ functions).

Another thing to consider is that CUDA compilation is actually N different compilations under the hood – one for the host and one per targeted GPU. AST for each of these compilations will be different.
Host-side compilation will likely provide the largest subset of the AST interesting for the checks, but we may want to have some sort of knob to do the checking on the sources as seen by the specific sub-sompilation, or a subset of them.

1 Like

Are you sure the ASTs differ? As far as I know, the frontend is shared between the compilations and then the dispatch happens at the backend; that’s the primary difference between how clang and nvcc handle the compilation and why clang can do function overloading on the host/device label and nvcc can’t?

@Artem-B has been right about some clang cuda questions in the past. I’d trust him :wink:

1 Like

Yup. Each compilation will have __CUDA_ARCH__ macro defined to a GPU-specifric value (GPU sub-compilation) or undefined (host). Quite a few CUDA’s own headers, included by all TUs, and some user code do use that macro.

While the differences are likely to be inconsequential for clang-tidy in most cases, they do exist and will matter for some users. Using host compilation will provide a decent view of the source code, but will likely skip some parts of GPU-only code that would be preprocessed away.

1 Like

Alright, thanks a lot for the heads up :slight_smile: