Proposal: Debug information improvement - keep the line number with optimizations

Hi Patel,

Thanks for your comments, some reply below... (This is the first part,
I'll send the second part later)

2.1 Verification Flow
> The most important of this project is to make the debug information
> do not block any optimization by LLVM transform passes. Here I
> propose a way to determine whether codegen is being impacted by
> debug info. This is also useful for us to scan the LLVM transform
> pass list to find which pass need to update to work with debug
> information.
>
> From Chris: Add a -strip-debug pass that removes all debug info from
> the LLVM IR. Given this, it would allow us to do:
> $ llvm-gcc -O3 -c -o - | llc > good.s
> $ llvm-gcc -O3 -c -g -o - | opt -strip-debug | llc > test.s
> $ diff good.s test.s
  

> If the two .s files differed, then badness happened.
    
This may not work perfectly because presence of debug info may
influence compiler generated symbol names and label numbers.
  
I see. But if the optimizations do the right thing with debug info, the
two .s files should be very similar. And the labels in .s are basically
corresponds to the basicblock in .ll
I think we can find some way to workaround this, like adding some filter
before the differ.

I'm not sure how the debug info influence the symbol names in assembly
file, for example?

> This obviously only catches badness that happens in the LLVM
> optimizer,
    
There is an establish way to check this. See
  http://llvm.org/docs/SourceLevelDebugging.html#debugopt

That's great, thanks.

> if the code generator is broken, we'll need something more
> sophisticated that strips debug info out of the .s file. In any
> case, this is a good place to start, and should be turned into a
> llvm-test TEST/report.
>
> Incidentally, we have to go through codegen, we can't diff .ll files
> after debug info is stripped out. This is because debug info is
> allowed to (and probably does) impact local names within functions,
> but these functions are removed at codegen and are not important to
> preserve. End
>
>
>
> 2.2 A Pass to clean up the debug info
> LLVM already has a transform pass "-strip-debug", it removes all the
> debug information. But for the first half of this project, we want
> to just keep the line number information (stop point) in the
> optimized code. So we need a new transform pass to just removes the
> variable declaration information.
    
FWIW, mem2reg already does this.
  

Seems mem2reg now can work very well with the line number information.

> Pass "-strip-debug" also doesn't cleanup the dead variable and
> function calling for debug information, it thinks other pass like "-
> dce" or "-globaldce" can handle this.
    
Yes.

> But as we are also going to update those passes, we can't use them
> in the verification flow, otherwise, it may output incorrect check
> results.
    
I am not sure, I follow this.

>
> The new pass "-strip-debug-pro" should have the following functions:
> 1. Just remove the variable declaration information and
> clean up the dead debug information.
    
This are two separate tasks.
  1) Remove variable declaration info.
    This is already done (indirectly) by mem2reg. But a separate pass to
do so won't hurt either.
  2) Remove dead debug information.
    This is very useful as a separate pass and can be used while
debugging non optimized code (for example, to remove type info for the
types that are not used at all).

> 2. Just remove the line number information and clean up the
> dead debug information.
    
I am not sure what is the purpose of this ?
  

Eh...just forget this.

  

> 3. Remove all the debug information and clean up.
    
That's what, "Remove Debug Info", -strip-debug does.

If you put -strip-debug + -dce in one pass then you're not comparing
apple and apple in your 2.1 style verification. Or I am missing
something.

Don't remember why I wrote this, skip it.