Variadic runtime functions

Hi Jim,

When we talked about the variadic runtime functions before, especially "__kmpc_fork_call", I argued with the benefits of keeping them.
Now I think there is more reason to replace them by the "phtread_create"-like model, thus a struct passed as payload. Given that I
have to teach LLVM to optimize that scheme anyway, the benefits of variadic functions will go away eventually. Having this "simpler",
payload-based model should allow us to have the same interface for target and non-target code. That is, we do not need to keep
track if we are in a target region or not when we generate the "__kmpc_fork_call" calls. My hope is this will cut down the front-end logic
further and simplify IR generation at the same time.

If we agree on removing the variadic version, we should discuss an alternative that combines the requirements of target and host parallelism,
maybe sth. like shown below where RequiredRTFeatures is used as a bitfield:

void __kmpc_parallel(ident_t* Ident, uint64_t RequiredRTFeatures, uint16_t NumThreads, ParallelWorkFnTy WorkFn,
                                     void * SharedValues, uint16_t SharedValuesBytes, void * PrivateValues, uint16_t PrivateValuesBytes);

Please, everybody, let me know what you think.