[RFC] warning about unparenthized macro replacement lists (CERT/MISRA compliance)

Hello!

I have written a simple compiler warning for unparenthised macro replacement lists. Such macros violates both MISRA and CERT rules.

A compiler warning is written when there is dangerous usage.

For example it writes a warning here:

    #define A 1+2
    int x = A * 2; // <- warning

My intention was to make sure that code is compliant to this CERT rule:
https://www.securecoding.cert.org/confluence/display/c/PRE02-C.+Macro+replacement+lists+should+be+parenthesized

However there are some interesting false positives..

I've tested it on these projects so far:
a52dec_0.7.4 aalib_1.4p5 aaphoto_0.43.1 abcm2ps_7.8.9 abcmidi_20141016 abe_1.1 acct_6.5.5 ace-of-penguins_1.3 achilles_2 acovea_5.1.1 acpi_1.7 acpitool_0.5.1 acr_0.9.9 acr38_1.7.11 adjtimex_1.29 adns_1.5.0~rc1 adolc_2.5.2 adtool_1.3.3 aephea_12-248 aespipe_2.4c aft_5.098 agedu_9723 aggregate_1.6 aiksaurus_1.2.1+dev-0.12 ajaxterm_0.10 ale_0.9.0.3 algol68g_2.8 allegro4.2_4.2.2 alsa-lib_1.0.28 alsaplayer_0.99.81 ample_0.5.7 amule-emc_0.5.2 anthy_9100h antlr_2.7.7 anubis_4.1.1+dfsg1 anyremote_6.4 anyremote2html_1.4 aolserver4_4.5.1 apache-mod-auth-ntlm-winbind_0.0.0.lorikeet+svn+801 apcupsd_3.14.8 aplus-fsf_4.22.1 apparix_07-261 apr_1.5.1 aprsd_2.2.5-13 aprsdigi_3.5.1 apsfilter_7.2.6 aptitude-robot_1.3.4 arc-gui-clients_0.4.6 archmbox_4.10.0 argtable2_9 aria2_1.18.8 arpack++_2.3 ascii2binary_2.14 asciidoc_8.6.9
babl_0.1.12 backintime_1.0.36 backupninja_1.0.1 bar_1.11.1 barcode_0.98+debian bash_4.3 bash-completion_2.1 bashdb_4.2.0.8 bbe_0.2.2 bc_1.06.95 bcpp_0.0.20050725 bdfresize_1.5 beecrypt_4.2.1 bfbtester_2.0.1 bibclean_2.11.4 biblememorizer_0.6.4 bibutils_5.4 biff_0.17.pre20000412 binfmtc_0.17 binutils_2.25 binutils-h8300-hms_2.16.1 bison_3.0.2.dfsg bison++_1.21.11 bld_0.3.4.1 blitz++_0.9 blktool_4 blm_0.9.3 bluez-hcidump_2.4 bmake_20140620 bmf_0.9.4 bml_0.6.1 boa_0.94.14rc21 bodr_9 boolstuff_0.1.14 bopm_3.1.3 bottlerocket_0.05b3 bristol_0.60.5 brltty_5.2~20141018 bsd-finger_0.17
m17n-docs_1.6.2 m17n-lib_1.6.4 m4_1.4.17 macchanger_1.7.0 mach_0.9.1 madlib_1.3.0 magic_7.5.241 magicfilter_1.2 magicrescue_1.1.9 mailfilter_0.8.3 mailtextbody_0.1.3 mailutils_2.99.98 makebootfat_1.4 makedepf90_2.8.8 makepp_2.0.98.5 maloc_0.2 maq_0.7.1 maradns_2.0.09 marisa_0.2.4 matanza_0.13+ds1 matchbox-themes-extra_0.3 maude_2.6 mauve_20140821 mboxgrep_0.7.9 mbr_1.1.11 mbuffer_20140310 mcl_14-137 mcpp_2.7.2 md5deep_4.3 mdds_0.5.4 mdf2iso_0.3.1 mecab_0.996 medicalterms_20110608 medusa_2.1.1 mell_1.0.0 mergelog_4.5.1 meschach_1.2b metastudent-data_1.0.1 metastudent-data-2_1.0.0 mew_6.6 mew-beta_7.0.50~6.6+0.20140902 mhash_0.9.9.9 mig_1.4 mimetic_0.9.8 min12xxw_0.0.9 mined_2000.15.4 minicom_2.7 minidjvu_0.8.svn.2010.05.06+dfsg minisapserver_0.3.6 miredo_1.2.6 miscfiles_1.5+dfsg missidentify_1.0 mkcue_1 mkelfimage_2.7 mlmmj_1.2.18.1 mlocate_0.26 mlt_0.9.6 mm_1.4.2 mm-common_0.9.7 mmdb_1.25.5 mmorph_2.3.4.2 moap_0.2.7 mobile-broadband-provider-info_20140317 mod-wsgi_4.3.0 module-init-tools_3.12 mona_1.4-15 monitoring-plugins_2.1.1
oath-toolkit_2.4.1 ocaml_4.02.1 ocl-icd_2.2.3 ocrad_0.24 octave-benchmark_1.1.1 octave-bioinfo_0.1.2 octave-combinatorics_1.0.9 octave-fixed_0.7.10 octave-ga_0.9.7 octave-ident_1.0.7 octave-informationtheory_0.1.8 octave-integration_1.0.7 octave-irsa_1.0.7 octave-mapping_1.0.7 octave-missing-functions_1.0.2 octave-multicore_0.2.15 octave-nlwing2_1.1.1 octave-outliers_0.13.9 octave-pdb_1.0.7 octave-physicalconstants_0.1.7 octave-secs2d_0.0.8 octave-simp_1.1.0 octave-struct_1.0.7 octave-symband_1.0.10 octave-time_1.0.9 octave-xraylib_1.0.8 octave-zenity_0.5.7 ocurl_0.5.3 ode_0.11.1 odyssey_0.4 oggvideotools_0.8a ogmtools_1.5 oidentd_2.0.8 omnievents_2.6.2 omniorb-dfsg_4.2.0 onak_0.4.5 onioncat_0.2.2+svn559 op_1.32 opari2_1.0.7+dfsg opencore-amr_0.1.3 opendchub_0.8.2 openfst_1.3.3 openh323_1.18.0.dfsg openload_0.1.2 openmama_2.2.2.1 openmpi_1.6.5 openmsx_0.8.2 openntpd_5.7p3 openocd_0.8.0 openresolv_3.5.2 openslp-dfsg_1.2.1

I have looked at some of the warnings....

There was warnings in abcm2ps_7.8.9. Example code:

    #define CM * 28.35 /* factor to transform cm to pt */
    ...
    f->topmargin = 1.0 CM;
    ...

=> I believe the macro usage is intentional here. So it's likely a FP from the user perspective.

There was warnings in bc_1.06.95. There was code generated by flex like:

    #define BEGIN (yy_start) = 1 + 2 *
    ..
    BEGIN(slcomment);

=> imho stylistically this is code from hell but it's autogenerated. I guess users have to disable various warnings for autogenerated code?

In binutils there was such code:

    #define ILF_DATA_SIZE + SIZEOF_ILF_SYMS + ...
    ...
    ptr = (bfd_byte *) bfd_zmalloc ((bfd_size_type) ILF_DATA_SIZE);
    vars.bim->buffer = ptr;
    vars.bim->size = ILF_DATA_SIZE;

=> It would not hurt to add parentheses to the macro. For information, bfd_size_type is a type.

In brighton there was this code:

    #define K6W 1000 / 43
    ..
    {"", 2, 2 * K6W, KIH, K6W, KIL, 0, 1, 0,

=> hmm.. might be intended. If parentheses are used there is more truncation.
    
About abcm2ps_7.8.9 and brighton. The warnings are technically correct - the code violates MISRA/CERT rules. A user who wants his code to be compliant to the CERT rule would want to have these warnings. But I also understand that in these projects these warnings would be seen as FP. Do you think these are FP I should fix?

Technical question:
My checker has false negatives when the dangerous macro usage is inside a macro.
Currently my checker uses isMacroID(). It only tells me if an expression comes from some macro or not. I want to know if two expressions come from different macros.

Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

cert-pre2.patch (4.62 KB)

This seems like it would have too many false positives and MISRA/CERT is a specific coding standard that not everyone uses. It would probably make sense in a clang-tidy check that enforces MISRA/CERT guidelines.

I can see utility in such a diagnostic, but the false positives may be
problematic. What is the true positive rate that you're finding from
this check?

I think that having a series of CERT (and/or MISRA) checkers that can
be enabled as a group would be a fantastic idea, but the design might
be slightly tricky since those checks are likely to be most naturally
placed in drastically different parts of the compiler (Sema, Analysis,
the static analyzer, ubsan, clang-tidy, etc).

~Aaron

Hello!

Thanks for the feedback.

This seems like it would have too many false positives and MISRA/CERT is a specific coding standard that not everyone uses.

Yes I agree, not everyone uses MISRA/CERT. I am currently not trying to enforce strict MISRA/CERT compliance.

But I thought this would be an interesting rule in general.

It would probably make sense in a clang-tidy check that enforces MISRA/CERT guidelines

Yes maybe.

I can see utility in such a diagnostic, but the false positives may be problematic. What is the true positive rate that you're finding from this check?

If we classify all warnings about macros that violate MISRA/CERT as TP then it seems to me the rate is good.

But that would not be ideal for the compiler I guess. To be honest.. most warnings are not bugs => false positives right now.

For most projects there are no warnings at all.

I will give you more details.. but not immediately.. I am in the process of creating a testsuite right now and am rerunning the buildscripts frequently. I'll try to give you more details in a couple of days.

I hope that I can also add some heuristics to avoid some false positives.

Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

Hello!

Here are the statistics right now..

I have scanned ~900 projects. All these projects are in Debian (so they are relatively mature).

I got warnings in 35 projects.

FP:
1 lex generated code: 18 projects
2 "*123" macros 3 projects
3 "+1" macros 1 project
4 dangerous macro, no bug 7 projects
5 libcap: a->x(y) => a->blk[y] |= .. 1 project

TP:
6 brighton: division in macro 1 project
7 dangerous macro, mistake 1 project
8 real bugs: 4 projects

there was both TPs and FPs in binutils so this project was counted twice.

In my first email I said there was FP for a cast in binutils. I have personally changed my mind and now classify it as a TP. The cast is only applied to the first operand in the macro and not the result, I doubt that is intentional.

In my first email I said there was FP for a division in brighton as it might prevent some truncation. I have changed my mind about this particular case and classify this as a TP. If the programmer wanted to avoid truncation then some expressions are not well written.

About "lex generated code". It is the exact same case for every project. It is the BEGIN macro it warns about. I don't know how.. but maybe it's possible to write some good heuristic about this.

About "dangerous macro, no bug". I used this classification when the macro is dangerous but the execution order does not really matter. I believe that these are likely TP for many users - I do not think this is by design.
But it would be relatively safe to skip these warnings when the execution order does not seem to matter.
Example:
#define X 2 + 3
int i = 1 + X;

In the "dangerous macro, mistake" case the programmer has clearly tried to enclose the macro in parentheses:
#define bcd2dec(x) (((x)&0x70)>>4)*10 + ((x)&0x0F)
However, one more () is needed. If the "dangerous macro, no bug" false positives are fixed the warning about this macro will also go away I am afraid. It is only used in a addition so there is no danger.

I had thought that there would be more true positives.

I still personally think it would make sense to add this in the compiler if we can deal with the false positives good enough. It does detect real bugs.

But if you think that the clang-tidy would be more appropriate I am open to that also.

Best regards,
Daniel Marjamäki
..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

Hello!

Here are the statistics right now..

I have scanned ~900 projects. All these projects are in Debian (so they are relatively mature).

I got warnings in 35 projects.

FP:
1 lex generated code: 18 projects
2 "*123" macros 3 projects
3 "+1" macros 1 project
4 dangerous macro, no bug 7 projects
5 libcap: a->x(y) => a->blk[y] |= .. 1 project

TP:
6 brighton: division in macro 1 project
7 dangerous macro, mistake 1 project
8 real bugs: 4 projects

there was both TPs and FPs in binutils so this project was counted twice.

In my first email I said there was FP for a cast in binutils. I have personally changed my mind and now classify it as a TP. The cast is only applied to the first operand in the macro and not the result, I doubt that is intentional.

In my first email I said there was FP for a division in brighton as it might prevent some truncation. I have changed my mind about this particular case and classify this as a TP. If the programmer wanted to avoid truncation then some expressions are not well written.

About "lex generated code". It is the exact same case for every project. It is the BEGIN macro it warns about. I don't know how.. but maybe it's possible to write some good heuristic about this.

About "dangerous macro, no bug". I used this classification when the macro is dangerous but the execution order does not really matter. I believe that these are likely TP for many users - I do not think this is by design.
But it would be relatively safe to skip these warnings when the execution order does not seem to matter.
Example:
#define X 2 + 3
int i = 1 + X;

In the "dangerous macro, mistake" case the programmer has clearly tried to enclose the macro in parentheses:
#define bcd2dec(x) (((x)&0x70)>>4)*10 + ((x)&0x0F)
However, one more () is needed. If the "dangerous macro, no bug" false positives are fixed the warning about this macro will also go away I am afraid. It is only used in a addition so there is no danger.

I had thought that there would be more true positives.

I still personally think it would make sense to add this in the compiler if we can deal with the false positives good enough. It does detect real bugs.

But if you think that the clang-tidy would be more appropriate I am open to that also.

I think that this does catch real bugs, and definitely has value. The
lex-generated code has me concerned about the false positive rate
though, since that's a fairly widely-used code generation tool. I
think this may be best implemented within clang-tidy instead of the
compiler, especially given that the FP to TP rate is 5:1 with your
sample set.

Thank you for working on this!

~Aaron