Jump Theading/GVN bug - moving discussion to llvm-dev

Hello,

I encountered a problem triggered by Jump-Threading optimization. This pass is creating an unreachable block with an instruction that is not well formed, which then causes the subsequent GVN pass to enter an infinite loop.

I have submitted a bug report and proposed fix to llvm-commits. This bug opened a can of worms. I was asked to move the discussion to llvm-dev to reach for a wider audience.

Can we move the general discussion to llvm-dev? This probably warrants a wider audience.

Here is the original post and a couple of replies from last week:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150216/261330.html

Here is a thread of replies from today:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150223/261463.html

Katya.

So, first things first.

Handling unreachable code is annoying. I agree. However, its important to characterize how annoying. Most code is not unreachable, and we’re (obviously) fine just bailing on such crazy code paths. So in practice the common cost is keeping a local set to detect cycles in the graph where we aren’t expecting them. We are essentially always traversing a linked list representing this graph. Because these linked list nodes don’t have any particular reason to have high locality, we have a cache-hostile traversal where we have to do primarily local and cache friendly book-keeping at each step to catch duplication. I suspect that this has a very small (but non-zero) impact on compile time.

Handling unreachable code is also error prone. We have had a long history of bugs here. So if it were free to get rid of unreachable code, that would be super awesome.

The first problem to realize is that for a large chunk of LLVM, we can’t actually get rid of handling unreachable code. Passes are quite likely to create unreachable code while running, and that will mean that all of the utilities will end up needing to be conservatively correct and handle unreachable code even when they don’t need to. =/

The second problem is that cleaning up unreachable code isn’t free. The code which is most likely to create unreachable code is similarly likely to destroy the analysis information we would use to remove it cheaply. And even then, just walking lists of basic blocks as we go out of the function is going to dirty cache lines that we might not have any other reason to look at. I can genuinely imagine cases where batching this will be beneficial. Will it outstrip the cost of handling it? I don’t know. But I think it will mitigate the gains, especially if the gains aren’t as significant as we might naively think.

The third problem I have is that this makes it much harder to constructively produce a correct IR transformation pass. When transforming the IR we must never forget about regions we have made unreachable. A single mistake here will cascade to a verification failure. This is the reverse of the problem we have today where your pass must accept unreachable IR. But I think the constructive reasoning is much easier. It makes it harder to have a bug through omission of a case.

The fourth problem I have is related to the third problem. How do we gain confidence in the correctness of any IR transformation? We must construct test cases that we believe will exercise the system. It seems incredibly easier to construct test cases with interesting unreachable IR patterns that should all be handled, than to construct test cases where the pass will happen to form unreachable IR patterns in each way and ensure that none escape the pass. One is essentially covering input diversity, the other has to cover output diversity, and must prove the negative. We fundamentally must gain confidence that the pass never produces unreachable IR. This seems much harder than demonstrating that it does handle unreachable IR.

The fifth problem I have is also related to the third and fourth problems. Precluding unreachable code seems likely to make tools like delta test case reduction dramatically less effective. Now, when cutting edges in the CFG to minimize a test case they must actually build the graph and prune all unreachable code.

All of these problems seem surmountable, but in aggregate, they make me strongly doubt that the benefit is worth the cost.

-Chandler

The other thing I meant to include here but forgot...

Another reason why handling these cases has a tendency to be less expensive
than you might expect is that we very often already have a set-like
structure in place to remove *duplication*, and it is often very easy to
re-use it to avoid cycles. So we won't be able to completely remove many of
the sets you might expect to remove, they would just be narrower in scope
due to not needing to handle cycles. My strong suspicion is that that
narrower scope does not in fact translate to a significant efficiency gain.

From: "Chandler Carruth" <chandlerc@google.com>
To: "Katya Romanova" <Katya_Romanova@playstation.sony.com>, "Nick Lewycky" <nlewycky@google.com>
Cc: llvmdev@cs.uiuc.edu, "Philip Reames" <listmail@philipreames.com>, "Hal Finkel" <hfinkel@anl.gov>, "Rafael
Espindola" <rafael_espindola@playstation.sony.com>, "Sanjoy Das" <sanjoy@playingwithpointers.com>, "David Majnemer"
<david.majnemer@gmail.com>
Sent: Monday, February 23, 2015 10:32:30 PM
Subject: Re: Jump Theading/GVN bug - moving discussion to llvm-dev

So, first things first.

Handling unreachable code is annoying. I agree. However, its
important to characterize how annoying. Most code is not
unreachable, and we're (obviously) fine just bailing on such crazy
code paths. So in practice the common cost is keeping a local set to
detect cycles in the graph where we aren't expecting them. We are
essentially always traversing a linked list representing this graph.
Because these linked list nodes don't have any particular reason to
have high locality, we have a cache-hostile traversal where we have
to do primarily local and cache friendly book-keeping at each step
to catch duplication. I suspect that this has a very small (but
non-zero) impact on compile time.

Handling unreachable code is also error prone. We have had a long
history of bugs here. So if it were free to get rid of unreachable
code, that would be super awesome.

The first problem to realize is that for a large chunk of LLVM, we
can't actually get rid of handling unreachable code. Passes are
quite likely to create unreachable code while running, and that will
mean that all of the utilities will end up needing to be
conservatively correct and handle unreachable code even when they
don't need to. =/

The second problem is that cleaning up unreachable code isn't free.
The code which is most likely to create unreachable code is
similarly likely to destroy the analysis information we would use to
remove it cheaply. And even then, just walking lists of basic blocks
as we go out of the function is going to dirty cache lines that we
might not have any other reason to look at. I can genuinely imagine
cases where batching this will be beneficial. Will it outstrip the
cost of handling it? I don't know. But I think it will mitigate the
gains, especially if the gains aren't as significant as we might
naively think.

The third problem I have is that this makes it much harder to
constructively produce a correct IR transformation pass. When
transforming the IR we must never forget about regions we have made
unreachable. A single mistake here will cascade to a verification
failure. This is the reverse of the problem we have today where your
pass must *accept* unreachable IR. But I think the constructive
reasoning is much easier. It makes it harder to have a bug through
omission of a case.

The fourth problem I have is related to the third problem. How do we
gain confidence in the correctness of any IR transformation? We must
construct test cases that we believe will exercise the system. It
seems *incredibly* easier to construct test cases with interesting
unreachable IR patterns that should all be handled, than to
construct test cases where the pass will happen to form unreachable
IR patterns in each way and ensure that none escape the pass. One is
essentially covering input diversity, the other has to cover
*output* diversity, and must prove the negative. We fundamentally
must gain confidence that the pass *never* produces unreachable IR.
This seems much harder than demonstrating that it does handle
unreachable IR.

The fifth problem I have is also related to the third and fourth
problems. Precluding unreachable code seems likely to make tools
like delta test case reduction dramatically less effective. Now,
when cutting edges in the CFG to minimize a test case they must
actually build the graph and prune all unreachable code.

All of these problems seem surmountable, but in aggregate, they make
me strongly doubt that the benefit is worth the cost.

I don't disagree with this, but I think there are two (separable) issues here:

1. Should transformations produce unreachable code? On the face of it, they must be able to, because the problem of determining dynamic reachability is generally undecidable. Should passes produce trivially dead code? Well, probably not if they avoid it. Even this is not locally decidable, and so you're right, removing it will add extra expense (but also provide savings, so we'd need to experiment).

2. Should unreachable code be allowed to contain nonsense (like instructions that depend on themselves)? To this, I believe the answer is no. We currently permit this, and I think that a lot of the bugs regarding unreachable code some from this. I've yet to hear a good argument why, for example, JumpThreading must produce self-referential instructions in trivially-dead regions.

-Hal

Handling unreachable code is annoying. I agree. However, its important to
characterize how annoying. Most code is not unreachable, and we're
(obviously) fine just bailing on such crazy code paths. So in practice the
common cost is keeping a local set to detect cycles in the graph where we
aren't expecting them. We are essentially always traversing a linked list
representing this graph. Because these linked list nodes don't have any
particular reason to have high locality, we have a cache-hostile traversal
where we have to do primarily local and cache friendly book-keeping at each
step to catch duplication. I suspect that this has a very small (but
non-zero) impact on compile time.

Why not have an analysis pass that does a DFS starting from the entry
block? Is that likely to be very costly? Having an analysis pass is
attractive because that allows some transformation passes to put in
some extra work to preserve it if they so desire.

The analysis pass does not have to be clever at all, since the
verifier (rightly, I think) considers a block reachable even if it is
predicated on a literal false -- the following IR does not verify:

define void @f() {
entry:
  br i1 false, label %t, label %f

t:
  ret void

f:
  %m = add i32 %m, 1
  ret void
}

-- Sanjoy

  br i1 false, label %t, label %f

Typo -- should be "br i1 true ..."

Programmers don't usually write code like this directly, but it is common
for it to happen as a result of the expansion of macros or inline
functions. You would not want to require that a compiler front end *not*
produce this.

Programmers don't usually write code like this directly, but it is common
for it to happen as a result of the expansion of macros or inline functions.
You would not want to require that a compiler front end *not* produce this.

I meant to say that whatever mechanism we use to track dead blocks
that may exhibit edge-cases of SSA def-use behavior does *not* need to
be clever enough to deal with this example, since the verifier rejects
this snippet (def of %m does not dominate its use). If all we care
about are blocks that have to follow def-dominates-use in the
intuitive sense then a super simple succ_begin()/succ_end() based DFS
is sufficient. What I don't know is whether such a thing will be fast
enough.

In any case, I don't have enough experience with LLVM to have a strong
/ defensible opinion on this, and I'll defer to the decision taken by
people who do.

-- Sanjoy

So, first things first.

Handling unreachable code is annoying. I agree. However, its important to
characterize how annoying. Most code is not unreachable, and we're
(obviously) fine just bailing on such crazy code paths. So in practice the
common cost is keeping a local set to detect cycles in the graph where we
aren't expecting them. We are essentially always traversing a linked list
representing this graph. Because these linked list nodes don't have any
particular reason to have high locality, we have a cache-hostile traversal
where we have to do primarily local and cache friendly book-keeping at each
step to catch duplication. I suspect that this has a very small (but
non-zero) impact on compile time.

So it's worse than this.

It also means that doing something like "walk the graph" is no longer
sufficient to generate correct code. It's not just about cycles.

Let's take an example:

PromoteMemoryToRegisters has a rename pass. This rename pass walks blocks
from entry, following successors, renaming values to use the nearest
definition. After this is done, it's done.

Except, it isn't.
Because you see, then it has to go and see whether it missed some alloca's,
because they were unreachable.

And then it has to go and see if any of the blocks ended up with phi nodes
that have predecessors that are unreachable, , because it has to replace
those with undef.
And so on.
(It actually turns out this is not enough, and this handling is buggy).

Handling unreachable code is also error prone. We have had a long history
of bugs here. So if it were free to get rid of unreachable code, that would
be super awesome.

The first problem to realize is that for a large chunk of LLVM, we can't
actually get rid of handling unreachable code. Passes are quite likely to
create unreachable code while running

Which passes, exactly?

I am going to assert, based on experience writing almost all of the opt
passes llvm has (except a vectorizer :P), that it is entirely possible, and
not even difficult, to avoid creating unreachable code in these passes.
I'm also going to point out that *roughly all the other compilers in the
world* do not allow it either :slight_smile:
Or at least, not have it at the point at which you need to call a utility
or another pass.

So i would say your assertion that things are quite likely to create it is
wrong. Things may temporarily create it, but there is simply no need, at
all, to expose it to *anything else*.

, and that will mean that all of the utilities will end up needing to be

conservatively correct and handle unreachable code even when they don't
need to. =/

So i strongly disagree with this.
This is an assertion based on the way the world is now, where things
willy-nilly create unreachable code and expect the rest of the world to
deal with it. I don't believe it is even that difficult to get to a world
where this isn't treat.

The second problem is that cleaning up unreachable code isn't free. The
code which is most likely to create unreachable code is similarly likely to
destroy the analysis information we would use to remove it cheaply.

I disbelieve this :slight_smile:

And even then, just walking lists of basic blocks as we go out of the
function is going to dirty cache lines that we might not have any other
reason to look at. I can genuinely imagine cases where batching this will
be beneficial. Will it outstrip the cost of handling it? I don't know. But
I think it will mitigate the gains, especially if the gains aren't as
significant as we might naively think.

The third problem I have is that this makes it much harder to
constructively produce a correct IR transformation pass. When transforming
the IR we must never forget about regions we have made unreachable.

Almost all algorithms i can think of already expect to have to clean up
unreachable regions and delete dead blocks. It's only LLVM that doesn't do
it.

A single mistake here will cascade to a verification failure. This is the
reverse of the problem we have today where your pass must *accept*
unreachable IR.

But is also *much* easier to verify, because the number of points in which
predecessors are modified, or edges redirected, is actually not that
large. At worst, *those* are the only places that can create
forward-unreachable code. So you have some bounded test test. On the
other hand, accepting "whatever input" is not a bounded problem. People
can come up with crazier and crazier input you must accept.

But I think the constructive reasoning is much easier. It makes it harder
to have a bug through omission of a case.

The fourth problem I have is related to the third problem. How do we gain
confidence in the correctness of any IR transformation? We must construct
test cases that we believe will exercise the system. It seems *incredibly*
easier to construct test cases with interesting unreachable IR patterns
that should all be handled, than to construct test cases where the pass
will happen to form unreachable IR patterns in each way and ensure that
none escape the pass.

I fundamentally do not understand why you believe this. The number of
places that create unreachable code is finite, bounded, and testable. You
cannot create unreachable code out of magic.
You are right that it is easier to create test cases ith interesting
unreachable IR patterns, but there is an infinite number of them.

One is essentially covering input diversity, the other has to cover
*output* diversity, and must prove the negative. We fundamentally must gain
confidence that the pass *never* produces unreachable IR. This seems much
harder than demonstrating that it does handle unreachable IR.

The fifth problem I have is also related to the third and fourth problems.
Precluding unreachable code seems likely to make tools like delta test case
reduction dramatically less effective. Now, when cutting edges in the CFG
to minimize a test case they must actually build the graph and prune all
unreachable code.

Again, this is just "not hard". All the other compilers in the world do
this, and do it cheaply.

One is essentially covering input diversity, the other has to cover

*output* diversity, and must prove the negative. We fundamentally must gain
confidence that the pass *never* produces unreachable IR. This seems much
harder than demonstrating that it does handle unreachable IR.

Missed a spot :slight_smile:

Except, in this case, output diversity is finite (each pass has a limited
set of ways it can produce unreachable code, and unreachable code is not
valid input to the next pass) , but input diversity is pretty much
infinite, because you need to know *all* possible unreachable code that
could be produced by all passes combined (since unreachable code is a valid
input into each pass, you are testing the combination of all possible
unreachable code transforms by any order of passes that could possibly
exist up to that point).

I would much rather have the former than the latter.

You are also "proving the negative" either way.
"There is no unreachable code this does not handle vs This does not produce
unreachable code"

BTW, one is formally verifiable, the other is pretty much not.

(IE i can formally verify that a pass removes all dead blocks it creates,
whereas you pretty much cannot formally verify there is no set of dead
input cfgs that affects an algorithm)

Yes, in fact, for each pass, at worst, even if the pass had no idea how to clean up unreachable code as it went (and again, most do, because most compiler algorithms are built with the idea that they need to clean up after themselves), your algorithm is what the entirety of “find unreachable code” looks like for gcc.

As for compile time, we already do it (see simplifycfg) :slight_smile:

We just don’t try to prevent things like jump threading from messing it up.

and again, the number of places that will mess it up are very small.

Most algorithms were built to track what dead code they were adding (whether LLVM implemented that part of the algorithm or not).

As an example, look at GVN.cpp’s addDeadBlock, SCCP’s code to avoid messing up the CFG (IPSCCP knows how to delete dead blocks, SCCP preserves CFG so it just doesn’t mess with the terminators, etc), etc

The hardest pass to make stop generating unreachable code tends to be jump threading, but this is more because “nobody has written it down in a book” than “it is hard to do” :slight_smile:

Is it fair (and correct) to say that the main issue with unreachable
code is this: if we allow unreachable code then the dom tree is no
longer a tree, it can have components (disconnected from the root)
that are cycles. So no pass that wants to just work with unreachable
code can depend on the dom tree actually being a tree.

Or are there more fundamental issues (I realize the dom tree not being
a tree is rather fundamental itself)?

-- Sanjoy

Will try to reply to the larger thread again soon, but a quick reply on a
bit of a tangent...

Is it fair (and correct) to say that the main issue with unreachable
code is this: if we allow unreachable code then the dom tree is no
longer a tree, it can have components (disconnected from the root)
that are cycles. So no pass that wants to just work with unreachable
code can depend on the dom tree actually being a tree.

Or are there more fundamental issues (I realize the dom tree not being
a tree is rather fundamental itself)?

You can't rely on walking successors to generate correct code (IE you may
have to later fix up code that you didn't see).
You can't rely on basic blocks you look at actually having otherwise-valid
code.
You can't rely on phi nodes having predecessors that are valid (IE if you
walk up the graph, you can validly hit invalid code).
etc

Essentially, you have *no guarantee*, at all, what is in an unreachable
block.

You get instructions like this from jump threading:

  %2 = getelementptr inbounds i8* %2, i64 1

It's not clear what one is supposed to do with such an instruction, and in
fact, a lot of the existing passes (beside GVN) will enter an infinite loop
if you convince them to look at it.

It seems quite bad to say "well, you know, everywhere else you have
well-formed code. But if you hit an unreachable block (and PS, you'll have
to check yourself to see if you are in one of these magic blocks all of the
sudden), all bets are off, and you should be prepared to hit anything".

Will try to reply to the larger thread again soon, but a quick reply on a
bit of a tangent...

2. Should unreachable code be allowed to contain nonsense (like
instructions that depend on themselves)? To this, I believe the answer is
no. We currently permit this, and I think that a lot of the bugs regarding
unreachable code some from this. I've yet to hear a good argument why, for
example, JumpThreading must produce self-referential instructions in
trivially-dead regions.

I could easily see tightening the rules on what passes the verifier within
unreachable code to try to make it substantially cheaper, easier, and less
error prone to handle unreachable code. It seems likely that there are good
pragmatic compromises here that would actually cost nothing.

One such compromise (which you didn't seem fond of) was that "unreachable
blocks have to contain nothing but a terminator".

Earlier, you said "The code which is most likely to create unreachable code
is similarly likely to destroy the analysis information we would use to
remove it cheaply. " and "When transforming the IR we must never forget
about regions we have made unreachable. "

I would like to challenge this assertion with actual evidence:
Right now the state of the world is actually "the vast majority of llvm
passes try to clean up after themselves, try not to create weird
unreachable code".

GVN removes dead blocks that replacement of values/equality propagation
creates
SCCP deletes all instructions in dead blocks except terminators (because it
preserves CFG)
ADCE does the same (it never removes terminators)
DCE the same
IPSCCP actually removes dead blocks, because it does not preserve CFG
JumpThreading removes unreachable blocks on input but not output (and in
fact, as you'll see, it's the only pass that seems to really be a problem)
SROA cleans up unreachable code it generates in it's utility class
SimplifyCFG removes unreachable blocks
CorrelatedValuePropagation does the same through a utility (it calls
constantfoldterminator, which in turn takes great pains to clean up the CFG
properly)
etc
I didn't look at the loop passes, but outside of jump threading, it appears
there is little work to be done to be in a state where things clean up
after themselves.

That is, *right* now, we know for at least a ton of the passes and their
analysis, it is not "destroying the analysis information we would use to
remove it" because they are in fact removing it.
They already "do not forget about regions they have made unreachable",
because they already clean up after themselves.

For giggle, i stopped two of them that i happened to know pretty well (GVN,
CorrelatedValuePropagation) from removing unreachable blocks, just leaving
them unreachable with instructions still in them.
While it does nothing to cause verifier crashes, it causes a *bunch* of
brand new crashes *in* passes in the standard pass ordering, which it
wouldn't if we were really living in a world where passes were really
handling unreachable code properly already.

Regardless of what the outcome is (and i'm pretty clearly on one side), we
shouldn't go half-way. Right now we are already in a world, where, because
we have some passes cleaning up and some not,we have passes that can't
handle unreachable code as input, and some that can.
So if you change the pass order, and those passes do something, you get
wonderful crashes.
Either we should be testing passes with unreachable code as input, or
expecting their not to be any.
(and the side-issue of "in a world where anything can be in an unreachable
block, all of this cleanup and carefulness is a waste of time")

Obvious example is, as you highlight, self-referential non-phi instructions

FWIW, since I am not really much of a contributor to LLVM.

The points Chandler makes for why passes must be allowed to create dead code are excellent. Dead code elimination is a general compiler cleanup

pass, do be run once in a while to get rid of code that for whatever reasons is dead. Every pass should not have to police themselves not to produce dead code.

This strategy (of having independent passes to do well-defined specific things) is pretty much (it seems to me) one of the founding principles of LLVM design,

and one of its very key strengths. Dead code elimination should not be built into all other passes.

Hal gets to the heart of the matter (in my opinion) when he writes:

Ø 2. Should unreachable code be allowed to contain nonsense (like instructions that depend on themselves)? To this, I believe the answer is no. We currently permit this, and I think that a lot of the bugs regarding unreachable code some from this. I’ve yet to hear a good argument why, for example, JumpThreading must produce self-referential instructions in trivially-dead regions.

To me this looks like an example of incorrect SSA IR. SSA form is not supposed to allow for the case of a use of a def where the def cannot reach. And my understanding was that “undef” LLVM IR was created exactly for specifying cases where a def was required for proper SSA form, but where none was available.

Clearly ( to me at least), an instruction cannot use in its inputs a def that occurs later in the same basic-block (and that includes the def the instruction itself defines).

That would either need to be an undef, or a phi that merges an undef and the def that occurs later.

Years ago (late 90s, early 2000s) the Intel compiler had quite a few problems with SSA form that could be traced to somewhat undisciplined handling of

“undefined” defs, PHI removal, and this type of degenerate SSA. We did as is being suggested currently for LLVM, and made such IR illegal, and changed

our internal verifiers to catch those. After a short time, this shook loose some remaining bugs with mishandling of SSA form, and improved compiler stability

in this regard. I think that is an excellent direction for LLVM infrastructure to go in.

Kevin B. Smith

Intel Compiler Team

Whatever we end up deciding, I think we do need to preserve the ability of a language frontend to generate complete garbage in the form of unreachable code. If we decide to move in the direction of disallowing the production of unreachable code, we need to write a canonicalization pass that runs first thing in the pipeline. A frontend should be able to produce unreachable code. This enables important simplifications in the frontend.

I happen to agree with Chandler, but I don’t have a strong opinion either way. His testing point in particular is one that resonates with me. I’ll, in practice, take testability over verification any day. :slight_smile:

Sanjoy’s suggestion of having an “unreachable code analysis” might be a way to split the difference. Every pass could depend on it. If it doesn’t report “no unreachable code”, the pass calls a cleanup transform utility. Any pass that wanted to spend time to “preserve” the analysis (by making sure there was no unreachable code), could do so. Everything would “just work”.

Do we have a set of unreachable, but otherwise valid, code examples? It doesn’t seem to hard to write extra run lines for each pass. The corpus of valid but problematic examples should be common to all passes.

Philip

From: "Philip Reames" <listmail@philipreames.com>
To: "Chandler Carruth" <chandlerc@google.com>, "Katya Romanova" <Katya_Romanova@playstation.sony.com>, "Nick Lewycky"
<nlewycky@google.com>
Cc: llvmdev@cs.uiuc.edu, "Hal Finkel" <hfinkel@anl.gov>, "Rafael Espindola" <rafael_espindola@playstation.sony.com>,
"Sanjoy Das" <sanjoy@playingwithpointers.com>, "David Majnemer" <david.majnemer@gmail.com>
Sent: Tuesday, February 24, 2015 12:45:51 PM
Subject: Re: Jump Theading/GVN bug - moving discussion to llvm-dev

Whatever we end up deciding, I think we do need to preserve the
ability of a language frontend to generate complete garbage in the
form of unreachable code.

And this is exactly what I don't understand. Why would a frontend need to produce "complete garbage" in unreachabe code?

-Hal

I agree with Hal here. Fixing (2) does not necessarily involve changing (1). Can someone sketch out how we'd ever get an instance of (2)? I'd naively expect to at least see a PHI in the cycle.

Philip

The only use of dominance information I can find in JT/LVI is for the handling for @llvm.assume in ValueTracking. Adding a DT->isReachableFromEntry check to this path seems downright simple.

On the surface, I don't see why the LVI analysis would care about unreachable blocks. It's only looking at the edges in the CFG and would treat an unreachable region the same as a reachable one.

Philip