Hi all,
As a way to get my feet wet with Clang, I'm interested in solving what I think is a bug in array literals under ARC. I have a start on a patch, but I'd love for someone to tell me if I'm going in a reasonable direction. The motivating code is:
@implementation AClass {
id ivar;
}
- (void)test {
id local = ivar;
[self aMethod];
@[local, ivar];
}
//...
The bug is that the object stored in `local`, which is retained before the call to `aMethod`, is released before the call to build the array (in -O1). If `aMethod` modifies `ivar`, then this can cause the `local` value to be deallocated before it's passed to the array. It seems that the array literal code gen produces a local buffer that the objects are stored in, and that buffer is passed to +[NSArray arrayWithObjects:count:]. Between the store to that buffer and the call, there's a window where those objects could be released, since the ARC optimizer doesn't know about the value stored in the temporary buffer.
The solution I've put together is to mark that temporary buffer as strong and have a cleanup for it afterwards. (And ditto for dictionary literals and their key buffer.) This seems like it's the "right thing" to do, but will be extra work at runtime. The alternatives seem to be to make ARC understand that the value is still needed, or to require `local` in the above code be marked with objc_precise_lifetime, which seems surprising to me.
I'm attaching a (tiny) patch with my approach and the emitted LLVM before and after. I'd love to hear any feedback about the solution in general and any particulars about the code. If it seems reasonable, I'll work on adjusting the array literal tests so they pass again in the hopes of getting it committed.
Thanks!
Jesse
before_and_after.txt (4.5 KB)
patch_wip.txt (1.57 KB)
This is very interesting.
If we look at this under the formal model (but break down the
array literal into lower-level language concepts), then it's clear that
we're violating the model. The object is held in a variable with
imprecise lifetime, but the last use of that variable was to copy its
contents into a cell of an __unsafe_unretained buffer, at which point
dependence is cut and we're no longer required to keep it alive.
So something needs to change here.
However, we'd really like to avoid retaining and releasing every value
in the array — it'd be different if we could transfer retains into the
NSArray, but that's not how the API we're using works. In general,
it's likely to be very difficult for the optimizer to remove most of those
retains and releases — and if it can, it'll probably turn it right back
into the code it's getting right now.
So instead I would suggest just creating a "barrier" to tell the ARC
optimizer that the function still formally depends on the values stored
in the array until after that call is complete. This should be quite
easy: just collect all the objects in the array and then emit a call
like this right after the message-send:
call void @clang.arc.use_objects(i8* %value_from_local, i8* %value_from_ivar) nounwind
The ARC optimizer would then see that the value is still used and know
not to move the release before this call.
You would then remove all calls to that function in the late ARC pass.
John.
It looks nicer than the actual way to extend the lifetime of an object (by using attribute to force precise lifetime).
If you go that way, wouldn't it be interesting to have something like __builtin_clang_arc_use_objects() to be able to use this constructs explicitly too ?
-- Jean-Daniel
No, I think precise lifetime is the right tool for end-users.
John.
Thanks for the detailed response; this makes a lot of sense to me. So, clang.arc.use_objects is a non-existent function which is removed after the main ARC optimizer runs, but the use of those objects is sufficient to prevent ARC from moving the release up. I'll try this out and put together a patch.
Cheers,
Jesse
I've actually done the LLVM side of this as r177769, although the function is named @clang.arc.use now.
John.
Ah, great. I had something similar coming together, but I had two concerns that I was still working through:
First, I think the ARC contract phase only runs under -O1 or higher, so calls to @clang.arc.use will remain under -O0 and cause link-time errors. I was thinking of either changing EmitAssemblyHelper::AddEmitPasses to always include the contract pass, or else move the erasure logic elsewhere (perhaps to a new, later ARC pass).
Second, I was also concerned with the interpreter, which seems to use IntrinsicLowering to evaluate intrinsics; do we need to add some logic to IntrinsicLowering::LowerIntrinsicCall to handle this function?
The other option which might solve both of these is to make this a "real" LLVM intrinsic and give it an empty body.
Let me know if you have any thoughts about the above; I was planning on finishing up a first patch this weekend.
Cheers,
Jesse
Thanks for the detailed response; this makes a lot of sense to me. So, clang.arc.use_objects is a non-existent function which is removed after the main ARC optimizer runs, but the use of those objects is sufficient to prevent ARC from moving the release up. I'll try this out and put together a patch.
I've actually done the LLVM side of this as r177769, although the function is named @clang.arc.use now.
Ah, great. I had something similar coming together, but I had two concerns that I was still working through:
First, I think the ARC contract phase only runs under -O1 or higher, so calls to @clang.arc.use will remain under -O0 and cause link-time errors. I was thinking of either changing EmitAssemblyHelper::AddEmitPasses to always include the contract pass, or else move the erasure logic elsewhere (perhaps to a new, later ARC pass).
Oh, you're underestimating how hacky the interaction between IR-gen and the optimizer is here.
The ARC contract pass is essentially an ARC-specific codegen prepare pass that cleans up things that we want the optimizer to see (or not see). For example, it inserts the objc_retainAutoreleaseReturnValue assembly markers, which the frontend is responsible for at -O0 but which would really annoy/disable the optimizer if we emitted them at -O1 or higher.
So the solution is to just not emit these intrinsic calls at -O0.
Second, I was also concerned with the interpreter, which seems to use IntrinsicLowering to evaluate intrinsics; do we need to add some logic to IntrinsicLowering::LowerIntrinsicCall to handle this function?
The interpreter needs to be either running -O0 code without optimization or queuing up the ARC contract pass. I actually don't know what it's currently doing.
The other option which might solve both of these is to make this a "real" LLVM intrinsic and give it an empty body.
Such an intrinsic would probably promise too much — we don't actually need these values to be live across the call in the general compiler sense. We really do want ARC-specific semantics here.
Let me know if you have any thoughts about the above; I was planning on finishing up a first patch this weekend.
I'm actually going to land a patch for a different but similar miscompile soon — this one with the writeback-from-out-parameter operation. So the frontend will have the basic infrastructure for this already.
John.
First, I think the ARC contract phase only runs under -O1 or higher, so calls to @clang.arc.use will remain under -O0 and cause link-time errors. I was thinking of either changing EmitAssemblyHelper::AddEmitPasses to always include the contract pass, or else move the erasure logic elsewhere (perhaps to a new, later ARC pass).
Oh, you're underestimating how hacky the interaction between IR-gen and the optimizer is here.
The ARC contract pass is essentially an ARC-specific codegen prepare pass that cleans up things that we want the optimizer to see (or not see). For example, it inserts the objc_retainAutoreleaseReturnValue assembly markers, which the frontend is responsible for at -O0 but which would really annoy/disable the optimizer if we emitted them at -O1 or higher.
So the solution is to just not emit these intrinsic calls at -O0.
Ah! I had considered that, but I was worried that omitting those calls would bring back the original problem. Thinking through it more, though, there can't be an early release under -O0, since there's no motion of the objc_releases. That does seem a *bit* hacky. 
I'm actually going to land a patch for a different but similar miscompile soon — this one with the writeback-from-out-parameter operation. So the frontend will have the basic infrastructure for this already.
OK, great. I'll wait for your patch and go from there. Thanks again for all the exposition.
- Jesse