(no subject)

Johannes_Doerfert · March 8, 2017, 10:50pm

>I don't know who pointed it out first but Mehdi made me aware of it at
>CGO. I try to explain it shortly.
>
>Given the following situation (in pseudo code):
>
> alloc A[100];
> parallel_for(i = 0; i < 100; i++)
> A[i] = f(i);
>
> acc = 1;
> for(i = 0; i < 100; i++)
> acc = acc * A[i];
>
>Afaik, with your parallel regions there won't be a CFG loop for the
>parallel initialization, right? Instead some intrinsics that annotate
>the parallel region and then one initialization. I imagine it somewhat
>like this:
>
> token = llvm.parallel.for.start(0, 100, 1)
> i = llvm.parallel.for.iterator(token)
> A[i] = f(i)
> llvm.parallel.for.end(token)
>
>Which would (in serial semantics) allow an optimization to remove the
>second loop as it uses uninitialized (undef) values. In other words, it
>is a problem if we assume only one A[i] was initialized but the rest was
>not.
>
>@Mehdi Does that reflect what you explained to some degree?

Yes, this is essentially the issue that Sanjoy pointed out on the mailing
list (as well). We definitely need semantics that do not have this problem.

Xinmin said there is a loop so why does this problem exists? I did
assume there is none (as shown above).

mehdi_amini · March 8, 2017, 10:51pm

Absolutely!

To be clear, I was not trying to downplay the importance of this work or the interest in it (I find it pretty interesting indeed!), I was just trying to provide one possible answer to what I perceived to be Johannes’ frustration with “my patches aren’t reviewed / discussed extensively”

Best,

Johannes_Doerfert · March 8, 2017, 10:58pm

>
>>
>>>
>>>
>>> <mehdi.amini@apple.com>,
>>> Bcc:
>>> Subject: Re: [llvm-dev] [RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension
>>> Reply-To:
>>> In-Reply-To: <20170224221713.GA931@arch-linux-jd.home>
>>>
>>> Ping.
>>>
>>> PS.
>>>
>>> Are there actually people interested in this?
>
>
> I’m definitely interested too. I will have some high-level comments after next week.
>
>
>>> We will continue working anyway but it might not make sense to put it
>>> on reviews and announce it on the ML if nobody cares.
>>
>> I expect this to takes many months, unless it is on the critical path of some core devs, because it seems to me to be a huge feature to consider and quite a time sink (which is why even people that care haven’t been able to invest enough time into this).
>
>
> There are enough commercial and open-source projects working actively on this that I think the issue is reaching critical mass.

Absolutely!

To be clear, I was not trying to downplay the importance of this work
or the interest in it (I find it pretty interesting indeed!), I was
just trying to provide one possible answer to what I perceived to be
Johannes’ frustration with “my patches aren’t reviewed / discussed
extensively”

I did not say that, neither did I mean it. I was merely asking because
there was almost no comment in months.

Sorry if it sounded that I am upset in any way. My point was basically
that preparing the emails and patches is work that we can avoid if there
is no interest

Anyway, it seemed to have worked since people are looking or going to look
at it

Finkel_Hal_J · March 8, 2017, 11:06pm

I don't know who pointed it out first but Mehdi made me aware of it at
CGO. I try to explain it shortly.

Given the following situation (in pseudo code):

   alloc A[100];
   parallel_for(i = 0; i < 100; i++)
     A[i] = f(i);

   acc = 1;
   for(i = 0; i < 100; i++)
     acc = acc * A[i];

Afaik, with your parallel regions there won't be a CFG loop for the
parallel initialization, right? Instead some intrinsics that annotate
the parallel region and then one initialization. I imagine it somewhat
like this:

   token = llvm.parallel.for.start(0, 100, 1)
   i = llvm.parallel.for.iterator(token)
   A[i] = f(i)
   llvm.parallel.for.end(token)

Which would (in serial semantics) allow an optimization to remove the
second loop as it uses uninitialized (undef) values. In other words, it
is a problem if we assume only one A[i] was initialized but the rest was
not.

@Mehdi Does that reflect what you explained to some degree?

Yes, this is essentially the issue that Sanjoy pointed out on the mailing
list (as well). We definitely need semantics that do not have this problem.

Xinmin said there is a loop so why does this problem exists? I did
assume there is none (as shown above).

Regardless of whether there is a loop for a loop or not, there is no loop for a pure parallel region and the problem still occurs there.

-Hal

Johannes_Doerfert · March 8, 2017, 11:24pm

>>>I don't know who pointed it out first but Mehdi made me aware of it at
>>>CGO. I try to explain it shortly.
>>>
>>>Given the following situation (in pseudo code):
>>>
>>> alloc A[100];
>>> parallel_for(i = 0; i < 100; i++)
>>> A[i] = f(i);
>>>
>>> acc = 1;
>>> for(i = 0; i < 100; i++)
>>> acc = acc * A[i];
>>>
>>>Afaik, with your parallel regions there won't be a CFG loop for the
>>>parallel initialization, right? Instead some intrinsics that annotate
>>>the parallel region and then one initialization. I imagine it somewhat
>>>like this:
>>>
>>> token = llvm.parallel.for.start(0, 100, 1)
>>> i = llvm.parallel.for.iterator(token)
>>> A[i] = f(i)
>>> llvm.parallel.for.end(token)
>>>
>>>Which would (in serial semantics) allow an optimization to remove the
>>>second loop as it uses uninitialized (undef) values. In other words, it
>>>is a problem if we assume only one A[i] was initialized but the rest was
>>>not.
>>>
>>>@Mehdi Does that reflect what you explained to some degree?
>>Yes, this is essentially the issue that Sanjoy pointed out on the mailing
>>list (as well). We definitely need semantics that do not have this problem.
>Xinmin said there is a loop so why does this problem exists? I did
>assume there is none (as shown above).

Regardless of whether there is a loop for a loop or not, there is no loop
for a pure parallel region and the problem still occurs there.

Got it, thanks!

vadve · March 8, 2017, 11:53pm

We will continue working anyway but it might not make sense to put it
on reviews and announce it on the ML if nobody cares.

I expect this to takes many months, unless it is on the critical path of some core devs, because it seems to me to be a huge feature to consider and quite a time sink (which is why even people that care haven’t been able to invest enough time into this).

There are enough commercial and open-source projects working actively on this that I think the issue is reaching critical mass.

Absolutely!

To be clear, I was not trying to downplay the importance of this work or the interest in it (I find it pretty interesting indeed!), I was just trying to provide one possible answer to what I perceived to be Johannes’ frustration with “my patches aren’t reviewed / discussed extensively”

Right, I got that! Thanks,

—Vikram

Topic		Replies	Views
[RFC] Polly Status and Integration LLVM Dev List Archives	19	95	October 20, 2017
HTO status IR & Optimizations	7	600	November 5, 2022
[DRAFT] Announcement for LLVM 1.6 [DRAFT] LLVM Dev List Archives	4	108	November 3, 2005
llvm-dev Digest, Vol 154, Issue 83 LLVM Dev List Archives	0	110	April 25, 2017
Parallel IR [PIR] --- BoF preparation discussion LLVM Dev List Archives	4	85	October 17, 2016

(no subject)

Related Topics