I think the main concerns have to do with (please correct me if I am wrong):
- Checkpointing coverage, and whether supporting a reduced set of the IR is a viable approach.
- I think not supporting 100% of the IR is inevitable, at least in the beginning. The scenario that highlights the problem that could be caused by this is the following: Consider a pass that calls an external function
foo()
within a code region where checkpointing is active. If at some pointfoo()
is modified by a developer who is not aware of the limitations of checkpointing, then this change can cause a crash.
I don’t think that we have a good way to completely avoid this problem, but we can mitigate it by having a good-enough coverage that makes this less likely to happen.
- I think not supporting 100% of the IR is inevitable, at least in the beginning. The scenario that highlights the problem that could be caused by this is the following: Consider a pass that calls an external function
- Adding new functionality to the IR and forgetting to support it in checkpointing.
- We cannot guarantee that this won’t happen ever. The tools that we have to mitigate this are: (i) Refactoring the code that is likely to change such that accesses are funneled through centralized APIs, and (ii) Relying on existing tests to catch any such change.
- ValueHandles
- I think the safest option is to prohibit their use while checkpointing is active, by triggering a crash if there are any registered listeners. The reasoning is that their listeners may not have state that can be safely reverted, so a rollback may result in a listener’s state that is different from what is expected. Down the road we may enable their use for specific analyses with state that can be reversed.