Extracting values from tokens


Now that we have the token type (http://reviews.llvm.org/rL245029), I need an operation that will “extract” a non-token value from a token. I know people have several use cases in mind for tokens, so I wanted to solicit feedback on how general the solution should be (so I’ve cc’ed the people from the review of the token change). I’m also interested in getting consensus so that as "extraction"s get added for each use case they have similar look-and-feel.

My particular need here is very narrow: I need the ‘catchpad’ operation to define a value which is a pointer to the on-heap exception object it catches (which my target’s personality routine will supply to the handler code). Since the ‘catchpad’ operation is defined as producing a token, in order to get at the exception pointer I need some operation that can take that token as input and produce the exception pointer as output.

Going fully general, I could imagine having an operator with a name like ‘tokenextract’ that is parameterized by the type it produces and accepts one argument of type token plus zero or more arguments of arbitrary type which indicate what is being extracted. If we’re ever going to want to support orthogonal kinds of extractions operating on the same token value, I think that approach would break down because it doesn’t give a good way to specify which kind of extraction is being performed. On the other hand, I think it’s entirely plausible that each token-producing operator will only ever have a fixed set of extractions that make sense for it, so this could be a workable solution under the assumption that the way to interpret ‘%x = tokenextract %tok, ty1 %arg1, ty2 %arg2’ (for the sake of e.g. lowering out some construct that is represented using token linkage) is to first look at the operator defining %tok, and then interpret the selector args in the context of that operation. This in turn implies that each token-producing operator’s definition (in the Lang Ref) should spell out what can be extracted from it and what its convention for selector args is. To my mind, that’s a bit too convoluted, and the informal description of an operator’s selector arg convention really seems like something that one ought to be able to specify as typing rules.

So I find myself arguing against a fully general solution here. I think instead it makes sense for each kind of extraction to specify an intrinsic that represents it, with the argument/return types specified in the usual way as the signature of the intrinsic. And on a case-by-case basis any intrinsic could be replaced with an instruction, following the same process that any other operation follows as it finds its way into the IR.

Ironically, the intrinsic approach that I’m advocating is awkward for my actual use case of extracting an exception pointer from a catchpad – the argument and return types should really be dictated by the personality routine, and so can vary from function to function, but intrinsics only support a limited form of overloading. But I think it would be ok to start with an intrinsic (called @llvm.eh.get_pad_param or something) that can be overloaded to return anyptr (or maybe anyptr + anyint) and not worry about more overloading until/unless we have more use cases.




After reading your description, I find myself with no strong opinion either direction. Your discussion of the pros and cons of each approach covers the topic well. I’d be perfectly willing to go either direction due to the lack of a compelling argument in one direction. I’d probably lean towards the generic version myself, but I’m happy to defer to the people actual working on using the mechanism at the moment. Seems reasonable to me. We could also go with a generic mechanism based on a variadic intrinsic if we wanted. We have all of the building blocks for this between gc.result and gc.statepoint. If we combined a variadic argument list with anyany result, we’d get an intrinsic with close to the semantics of the instruction you were considering. We could potentially use this to prototype both approaches and see which one appears less ugly.Â

I think you’re right that intrinsics are better than ‘extracttoken’.

The intrinsic tells you what kind of data you want out of the token, and codegen will fail in an obvious way when you use an intrinsic on the wrong kind of token. For example, if we tried to extract the SEH exception code from a statepoint, codegen can abort rather than perhaps working accidentally.

Sounds like a plan. :slight_smile: