Question about Clang codegen tests

Hello, everyone!

I’m a newcomer, and want to make some contributions to the LLVM community. Is here the right place to reach out for help? Please forgive me if I did things wrong, as I’m unfamiliar with the LLVM workflow :slight_smile:

So a few days ago I noticed an issue in the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=52444, I thought it would be suitable for me to work on. But now I run into some problems:

  • What is meaning of the capitalized words in tests comments?

For example, in llvm-project/clang/test/CodeGen/builtins-elementwise-math.c:


// RUN: %clang_cc1 -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s

// CHECK-LABEL: define void @test_builtin_elementwise_abs(

// CHECK: [[F1:%.+]] = load float, float* %f1.addr, align 4

// CHECK-NEXT: call float @llvm.fabs.f32(float [[F1]])

What do RUN, CHECK mean? I can guess they are some test instructions, so I briefly look through the docs, but I can’t find anything useful :frowning: I’m sorry but the docs are so much, it’s hard to find something I actually don’t know what is.

  • How can I write my own test for codegen?

The actual code part is not that hard, problems came in when I want to write some tests. The fact is that I’m unfamiliar with LLVM IR, so I don’t know how to write the expected output…

Thanks so much for people who offer me help!

Best regards,

Jun Zhang

Hi,
see if this helps you tu understand the test it self: [ FileCheck - Flexible pattern matching file verifier — LLVM 16.0.0git documentation | FileCheck - Flexible pattern matching file verifier — LLVM 16.0.0git documentation ]
And once you have a C/C++ input test file with a RUN: command on it, the test commands to check the result usually can be generated with the llvm-project/llvm/utils/update_cc_test_checks.py script.
cheers

Diogo Sampaio
Senior Compiler Engineer • Kalray
dsampaio@kalrayinc.com • [ https://www.kalrayinc.com/ | www.kalrayinc.com ]

[ https://www.kalrayinc.com/ | ]

Please consider the environment before printing this e-mail.
This message contains information that may be privileged or confidential and is the property of Kalray S.A. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

FileCheck (which you can see in the RUN lines) is what consumes the CHECK lines (or other suffixes, if specified in the FileCheck invocation in the RUN line) - the thing that runs the RUN line is lit: https://llvm.org/docs/CommandGuide/lit.html

Hi,
Thanks for your help!

Cheers
Jun Zhang

Hi,

Thanks to all folks that help me, but I’m still a little confused when running llvm-project/llvm/utils/update_cc_test_checks.py script. I didn’t open a new thread but replied to you directly, hope you won’t mind :slight_smile:

First of all, the file I want to update: llvm-project/clang/test/CodeGen/builtins-elementwise-math.c, has RUN command like:

// RUN: %clang_cc1 -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s.

I found that I cannot generate CHECKS unless I remove -triple x86_64-apple-darwin. It even cleaned CHECKS it used to have. FYI, I’m running Ubuntu in my machine.

Another thing that bothers me is that though I successfully generate CHECKS, they are very wired…

I expect:


void test_builtin_elementwise_abs(float f1, float f2, double d1, double d2,

float4 vf1, float4 vf2, si8 vi1, si8 vi2,

long long int i1, long long int i2, short si) {

// CHECK-LABEL: define void @test_builtin_elementwise_abs(

// CHECK: [[F1:%.+]] = load float, float* %f1.addr, align 4

// CHECK-NEXT: call float @llvm.fabs.f32(float [[F1]])

f2 = __builtin_elementwise_abs(f1);

// CHECK: [[D1:%.+]] = load double, double* %d1.addr, align 8

// CHECK-NEXT: call double @llvm.fabs.f64(double [[D1]])

d2 = __builtin_elementwise_abs(d1);

...

However, it was like:


// CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP8]] to i32

// CHECK-NEXT: [[ELT_ABS8:%.*]] = call i32 @llvm.abs.i32(i32 [[CONV]], i1 false)

// CHECK-NEXT: [[CONV9:%.*]] = trunc i32 [[ELT_ABS8]] to i16

// CHECK-NEXT: store i16 [[CONV9]], i16* [[SI_ADDR]], align 2

// CHECK-NEXT: ret void

//

void test_builtin_elementwise_abs(float f1, float f2, double d1, double d2,

float4 vf1, float4 vf2, si8 vi1, si8 vi2,

long long int i1, long long int i2, short si) {

f2 = __builtin_elementwise_abs(f1);

d2 = __builtin_elementwise_abs(d1);

vf2 = __builtin_elementwise_abs(vf1);

The comments are all above the function declaration, which is not what I want.

The command to update check result is :


python3 ~/dev/cpp-projects/llvm-project/llvm/utils/update_cc_test_checks.py --llvm-bin=/home/jun/dev/cpp-projects/llvm-project/build/bin CodeGen/builtins-elementwise-math.c

Best regards,

Jun Zhang

---- 在 星期四, 25 十一月 2021 19:02:48 +0800 Diogo Sampaio dsampaio@kalray.eu 撰写 ----

Hi,
Thanks to all folks that help me, but I'm still a little confused when running `llvm-project/llvm/utils/update_cc_test_checks.py` script. I didn't open a new thread but replied to you directly, hope you won't mind :slight_smile:

First of all, the file I want to update: `llvm-project/clang/test/CodeGen/builtins-elementwise-math.c`, has RUN command like:
`// RUN: %clang_cc1 -triple x86_64-apple-darwin %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s`.
I found that I cannot generate `CHECKS` unless I remove `-triple x86_64-apple-darwin`.

That’s not right, but I can reproduce the issue. The problem is that update_cc_test_checks uses the Clang AST JSON output to determine what the names of the functions are in the IR so it can extract them into CHECK lines. To support C++, it uses the mangled name, as _Z3foov etc is what the IR name is. However, on Darwin, names have an extra underscore, even for C, which does feature in the AST’s mangledName, but not in what the IR use, which omits the underscore, so foo shows up a foo/_Z3foov (C/C++) in the IR, but _foo/__Z3foov in the AST and final object file, so there is no match and thus update_cc_test_checks thinks there is no IR for that function.

Given D69564 added the mangled name to the JSON AST specifically for update_cc_test_checks’s use I don’t know if we should just alter it to match what the IR will contain or add a separate field for the IR name. Both are useful to have, and arguably the mangled name does have the extra underscore in the final object file so the current output is correct, so even though we probably make no guarantees about the stability of -ast-dump=json, it’s probably better to do the latter.

It even cleaned `CHECKS` it used to have.

That is expected, it regenerates all the CHECK lines by removing the existing ones and adding new ones. If the CHECK lines were previously generated by the script, that just results in the diff being the diff in the IR. If they weren’t, the diff has more noise in it as the unchanged IR will give rise to different CHECK lines than were hand-written.

FYI, I'm running Ubuntu in my machine.

Clang can generate IR for any OS and architecture on any machine that Clang can run on, so that shouldn’t be relevant.

Another thing that bothers me is that though I successfully generate `CHECKS`, they are very wired...
I expect:

void test_builtin_elementwise_abs(float f1, float f2, double d1, double d2,
                                  float4 vf1, float4 vf2, si8 vi1, si8 vi2,
                                  long long int i1, long long int i2, short si) {
  // CHECK-LABEL: define void @test_builtin_elementwise_abs(
  // CHECK:      [[F1:%.+]] = load float, float* %f1.addr, align 4
  // CHECK-NEXT:  call float @llvm.fabs.f32(float [[F1]])
  f2 = __builtin_elementwise_abs(f1);

  // CHECK:      [[D1:%.+]] = load double, double* %d1.addr, align 8
  // CHECK-NEXT: call double @llvm.fabs.f64(double [[D1]])
  d2 = __builtin_elementwise_abs(d1);
...

However, it was like:

// CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[TMP8]] to i32
// CHECK-NEXT:    [[ELT_ABS8:%.*]] = call i32 @llvm.abs.i32(i32 [[CONV]], i1 false)
// CHECK-NEXT:    [[CONV9:%.*]] = trunc i32 [[ELT_ABS8]] to i16
// CHECK-NEXT:    store i16 [[CONV9]], i16* [[SI_ADDR]], align 2
// CHECK-NEXT:    ret void
//
void test_builtin_elementwise_abs(float f1, float f2, double d1, double d2,
                                  float4 vf1, float4 vf2, si8 vi1, si8 vi2,
                                  long long int i1, long long int i2, short si) {
  f2 = __builtin_elementwise_abs(f1);

  d2 = __builtin_elementwise_abs(d1);

  vf2 = __builtin_elementwise_abs(vf1);

The comments are all above the function declaration, which is not what I want.

That is how the script works. It therefore encourages each test to be its own function, rather than large test-all-the-things functions, that are a pain to deal with when the test fails *anyway*, so that best practice is nothing new.

Jess

Hi,

clang -disable-O0-optnone | opt -S -mem2reg is a common pattern for avoiding that issue, if you’re just talking about the naive alloca/load/store CodeGen. There are also Clang tests that use -O1 for similar reasons.

In fact, that pattern produces cleaner IR than is being CHECK’ed today: Compiler Explorer

Jess