Static Analyzer for template / floating point precision audit?

Hello!

I had an idea yesterday about a possible use of clang that I'd like to discuss and hopefully contribute towards. I want a static analysis tool that tells me which types get used in the instantiation of C++ templates. Perhaps this exists somewhere already but I haven't found it yet. Basically I want to see the results of template generation, and verify that types aren't getting converted by mistake when I'm passing values around between templates and non-template functions. Additionally, I want to be able to see where floating point values go in and out of the extended precision registers on Intel. To be frank, I don't know much about how this works.

My rudimentary understanding of C++ internals is that the template declarations and instantiations exist as language elements in the AST, but that the actual template 'code' is generated later. Is this correct?

What is the proper nomenclature for all this? I'm looking for words to differentiate between the instantiation in the source (ie std::pair<int>) and the generated 'code' resulting from lookup, which i assume is not actual source. What is that called?

  Do template instantiations modify the AST, do they generate LLVM IR, or something else in between?

If the generation step produces something other than AST, does this mean my problem is outside of the scope of the static analyzer?

  I think this problem might be best addressed via LLVM IR, because I want to to see how well inlined, nested templates flatten out after optimization. Is it even possible to get the original source line for a piece of optimized IR?

Regarding the floating point precision analysis, i sounds like a tall order, but I'd like to hear what you all think. For one thing, it's architecture dependent. At what point does the compiled code lose it's original source identity?

For an example of the issues I'm talking about, here is an arbitrary precision geometry library that gets betrayed by extended precision: Robust Predicates on Pentium CPUs. I talked to some compiler engineers at Apple over a year ago, and they were proud of how well gcc handles floating point precision, by keeping intermediate calculations in the extended registers. Nevertheless, it seems to me that sooner or later these values have to get reduced down to regular precision. The ideal tool I have in mind would be able to show me where the precision loss is happening.

Please let me know if this is something that would be of interest to clang developers, and the static analyzer in particular.

Thanks,

George

My rudimentary understanding of C++ internals is that the template
declarations and instantiations exist as language elements in the AST,
but that the actual template 'code' is generated later. Is this
correct?

Yes, template instantiation is a purely AST-based operation; later
passes can essentially ignore it if they don't care.

What is the proper nomenclature for all this? I'm looking for words to
differentiate between the instantiation in the source (ie
std::pair<int>) and the generated 'code' resulting from lookup, which
i assume is not actual source. What is that called?

Just call it the instantiation, I think; I've never heard another term for it

Do template instantiations modify the AST, do they generate LLVM IR,
or something else in between?

Purely an AST-based operation.

I think this problem might be best addressed via LLVM IR, because I
want to to see how well inlined, nested templates flatten out after
optimization. Is it even possible to get the original source line for
a piece of optimized IR?

If you generate debug information (command-line option "-g"), you can
get a rough approximation; optimizations have a tendency to mess with
debug information, though.

Regarding the floating point precision analysis, i sounds like a tall
order, but I'd like to hear what you all think. For one thing, it's
architecture dependent. At what point does the compiled code lose it's
original source identity?

There are essentially two large transformation steps : AST to LLVM IR,
and LLVM IR to assembly. Looking at the code in between those steps
is easy; correlating the representations, or trying to see what
transformation is happening to a particular node, is extremely
difficult.

For an example of the issues I'm talking about, here is an arbitrary
precision geometry library that gets betrayed by extended precision: Robust Predicates on Pentium CPUs
. I talked to some compiler engineers at Apple over a year ago, and
they were proud of how well gcc handles floating point precision, by
keeping intermediate calculations in the extended registers.
Nevertheless, it seems to me that sooner or later these values have to
get reduced down to regular precision. The ideal tool I have in mind
would be able to show me where the precision loss is happening.

With normal code generation where SSE/SSE2 is disabled, it's
essentially impossible to determine where exactly this does/does not
happen; various transformations can introduce/remove extra precision
in unexpected places, and we don't actually determine whether a value
in a narrower type loses precision isn't determined until deep in
register allocation. See also
http://www.nondot.org/sabre/LLVMNotes/FloatingPointChanges.txt .

-Eli