Working out source ranges for expressions involving macros

I am building a program transformation tool that performs various source code mutations. I only want these mutations to be applied to source regions in the main translation unit file, but I do want to be able to transform expressions and statements that involve top-level macro invocations.

I am trying to write a procedure that, given an AST node, returns a source range that begins and ends in the main source file, outside of any macro, if this range tightly captures the AST node, and returns an invalid source range otherwise.

I have been experimenting with the functionality provided by SourceManager, SourceLocation and Preprocessor, but I have ended up a bit confused: the solutions I’ve come up with have ended up being very complex, and I’m wondering whether folks here can suggest a straightforward, elegant solution.

To give some concrete examples, suppose I want to replace an integer literal expression with 0, if the expression is wholly contained in the main source file.

For this program, I would like to identify that the expression “1” is enclosed in the range that starts and ends at 3,3 (inclusive) - i.e., the function I’m trying to write should return that range when presented with the AST node corresponding to the expression “1”:

#define X 1
void foo() {
  X;
}

For this program, I would like my function to return an invalid source range, since it’s not possible to replace the 1 with 0 by only modifying text in the main source file, outside of any macros:

#define X 1;
void foo() {
  X
}

Here’s a more complex, contrived example:

#define G(A) A

#define F(A, B, C) G(B)
void foo() {
  F(10, 20, 1);
}

I would like my function to return the range starting at 5,3 and ending at 5,14, inclusive.

And finally, for this example I would like an invalid source range to be returned:

#define STUFF void foo() { 1; }

STUFF

because the expression “1” comes from a macro body, and that macro body contributes more than just the expression.

I’ve been wondering if functions like SourceManager::isAtStartOfImmediateMacroExpansion and SourceManager::isAtEndOfImmediateMacroExpansion could help, but I haven’t managed to use them to get what I want.

Any advice would be much appreciated!

I came up with the below, which seems to do what I want - thought I’d post it in case useful for others.

template <typename HasSourceRange>
[[nodiscard]] clang::SourceRange GetSourceRangeInMainFile(
    const clang::Preprocessor& preprocessor, const HasSourceRange& ast_node) {
  const clang::SourceManager& source_manager = preprocessor.getSourceManager();
  auto main_file_id = source_manager.getMainFileID();

  clang::SourceLocation begin_loc_in_main_file;
  clang::SourceLocation end_loc_in_main_file;
  clang::SourceLocation macro_expansion_location;
  {
    clang::SourceLocation begin_loc = ast_node.getSourceRange().getBegin();
    auto begin_file_id = source_manager.getFileID(begin_loc);
    if (begin_file_id == main_file_id) {
      begin_loc_in_main_file = begin_loc;
    } else if (begin_loc.isMacroID() &&
               preprocessor.isAtStartOfMacroExpansion(
                   begin_loc, &macro_expansion_location) &&
               source_manager.getFileID(macro_expansion_location) ==
                   main_file_id) {
      begin_loc_in_main_file = macro_expansion_location;
    } else {
      // There is no location in the main file corresponding to the start of the
      // AST node.
      return {};
    }
  }
  {
    clang::SourceLocation end_loc = ast_node.getSourceRange().getEnd();
    auto end_file_id = source_manager.getFileID(end_loc);
    if (end_file_id == main_file_id) {
      end_loc_in_main_file = end_loc;
    } else if (end_loc.isMacroID() &&
               preprocessor.isAtEndOfMacroExpansion(
                   end_loc, &macro_expansion_location) &&
               source_manager.getFileID(macro_expansion_location) ==
                   main_file_id) {
      end_loc_in_main_file = macro_expansion_location;
    } else {
      // There is no location in the main file corresponding to the end of the
      // AST node.
      return {};
    }
  }
  return {begin_loc_in_main_file, end_loc_in_main_file};
}