[RFC] Lightweight LLVM IR Checkpointing

I think the main concerns have to do with (please correct me if I am wrong):

  • Checkpointing coverage, and whether supporting a reduced set of the IR is a viable approach.
    • I think not supporting 100% of the IR is inevitable, at least in the beginning. The scenario that highlights the problem that could be caused by this is the following: Consider a pass that calls an external function foo() within a code region where checkpointing is active. If at some point foo() is modified by a developer who is not aware of the limitations of checkpointing, then this change can cause a crash.
      I don’t think that we have a good way to completely avoid this problem, but we can mitigate it by having a good-enough coverage that makes this less likely to happen.
  • Adding new functionality to the IR and forgetting to support it in checkpointing.
    • We cannot guarantee that this won’t happen ever. The tools that we have to mitigate this are: (i) Refactoring the code that is likely to change such that accesses are funneled through centralized APIs, and (ii) Relying on existing tests to catch any such change.
  • ValueHandles
    • I think the safest option is to prohibit their use while checkpointing is active, by triggering a crash if there are any registered listeners. The reasoning is that their listeners may not have state that can be safely reverted, so a rollback may result in a listener’s state that is different from what is expected. Down the road we may enable their use for specific analyses with state that can be reversed.