My take... Conceptualize this as an analysis checker. The property you want to verify would be that the source is well indented. So, for example:
is poorly indented, as it doesn't start in column 0. Indentation should always start at 0. While this could be changed later, just take it is a given for now. You check that the column of the int token is 0. If it is not, you issue a warning for the indentation. Also, you can issue a fixit hint (see the fixit testsuite for what those look like, and that will guide you to the code that implements them) as well. After you get that working, you should be able to get that to rewrite to:
The lexer knows when something is first on a line, you only want to apply the check, if what you're looking at is the first token on the line. You might have to preserve or recreate that information. I'd recommend not tying in directly to the parser as that would be a layering violation. People that wanted a fast parser without that feature, don't want to pay for it. Also, I think I'd nix this as a semantic check.
I'd recommend this a an analysis checker (lib/Analysis). That framework allows for easy addition and removal of checks. You'll be walking ASTs, starting from the top level forms. At first, just check that the first top level form is in column 0.
That's step 1.
After that, you can start walking down the AST, and figuring out the increments to the nesting depths as you go. You can have checks at the start of popular forms, such as declarations, statements and expressions, and get 90% of what you need. CFG has an example of a something that walks the AST. You can cheat off it for ideas.
My idea would be for you to actually check this into clang as you develop it. I think it'd be cool to have. Eventually, lib/Analysis may get a plugin architecture, and it in time, could just be a plugin. An interesting mode would be to have a fairly permissive mode that merely checks for consistency. That way, people could just use it, and not have to explain the formatting rules. Hard parts would be to understand the headers can come from a different formatting domain as well as be external to the code.
Sounds cool, and welcome.