RFC: Are we ready to completely move away from the optionality of a DataLayout?

I’ve just wasted a day chasing my tail because of subtleties introduced to handle the optionality of the DataLayout. I would like to never do this again. =]

We now have this attached to the Module with just a flimsy faked-up pass to keep APIs consistent. So, is there any problem with beginning down the path of:

  1. Synthesizing a “default” boring DataLayout for all modules that don’t specify one.
  2. Changing the APIs to make it clear that this can never be missing and is always available.
  3. Start ripping out all of the complexity in the compiler dealing with this.

If there isn’t, I’m willing to do some of the leg work here.
-Chandler

I've also recently had to chase down test case failures caused by assuming that it was safe to dereference the DataLayout, so a hearty 'yes please!' from me!

David

I would love to see a mandatory data layout. Thanks for working on this!

From: "Chandler Carruth" <chandlerc@gmail.com>
To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, nicholas@mxc.ca, "Rafael Ávila de Espíndola"
<rafael.espindola@gmail.com>
Sent: Sunday, October 19, 2014 3:22:26 AM
Subject: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

I've just wasted a day chasing my tail because of subtleties
introduced to handle the optionality of the DataLayout. I would like
to never do this again. =]

I agree; while I've heard of use cases, I don't personally feel that, at this stage, it is worth the maintenance burden. Realistically, we just don't have a good way to test the no-data-layout code paths (the great majority of our testing coverage comes from frontends that always add DataLayout). I also vote for making it mandatory.

-Hal

I've just wasted a day chasing my tail because of subtleties introduced to handle the optionality of the DataLayout. I would like to never do this again. =]

Agreed, it's a pain to do this.

We now have this attached to the Module with just a flimsy faked-up pass to keep APIs consistent. So, is there any problem with beginning down the path of:

1) Synthesizing a "default" boring DataLayout for all modules that don't specify one.
2) Changing the APIs to make it clear that this can never be missing and is always available.
3) Start ripping out all of the complexity in the compiler dealing with this.

Sounds like a good plan.

One more thing I'd like us to consider after this is where the struct layout map should live. Currently it's in DataLayout which feels right until you think that DataLayout lives in the module but is caching based on pointers in the context.

It makes me feel like DataLayout should live in the context, but then LTO is an issue with linking modules with different layouts (is that even allowed?). I can think of a bunch of ways it could fail with struct layouts of the same struct on different DataLayouts.

Pete

From: "Pete Cooper" <peter_cooper@apple.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 3:34:59 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

Sent from my iPhone

>
> I've just wasted a day chasing my tail because of subtleties
> introduced to handle the optionality of the DataLayout. I would
> like to never do this again. =]
Agreed, it's a pain to do this.
>
> We now have this attached to the Module with just a flimsy faked-up
> pass to keep APIs consistent. So, is there any problem with
> beginning down the path of:
>
> 1) Synthesizing a "default" boring DataLayout for all modules that
> don't specify one.
> 2) Changing the APIs to make it clear that this can never be
> missing and is always available.
> 3) Start ripping out all of the complexity in the compiler dealing
> with this.
Sounds like a good plan.

One more thing I'd like us to consider after this is where the struct
layout map should live. Currently it's in DataLayout which feels
right until you think that DataLayout lives in the module but is
caching based on pointers in the context.

It makes me feel like DataLayout should live in the context, but then
LTO is an issue with linking modules with different layouts (is that
even allowed?

I think that, generally speaking, this does not make sense. You could imagine linking together two modules where one data layout was a "subset" of the other (one is missing details of the vector types, for example, in a module that used no vector types), but even that seems tenuous.

-Hal

From: "Pete Cooper" <peter_cooper@apple.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 3:34:59 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

Sent from my iPhone

I've just wasted a day chasing my tail because of subtleties
introduced to handle the optionality of the DataLayout. I would
like to never do this again. =]

Agreed, it's a pain to do this.

We now have this attached to the Module with just a flimsy faked-up
pass to keep APIs consistent. So, is there any problem with
beginning down the path of:

1) Synthesizing a "default" boring DataLayout for all modules that
don't specify one.
2) Changing the APIs to make it clear that this can never be
missing and is always available.
3) Start ripping out all of the complexity in the compiler dealing
with this.

Sounds like a good plan.

One more thing I'd like us to consider after this is where the struct
layout map should live. Currently it's in DataLayout which feels
right until you think that DataLayout lives in the module but is
caching based on pointers in the context.

It makes me feel like DataLayout should live in the context, but then
LTO is an issue with linking modules with different layouts (is that
even allowed?

I think that, generally speaking, this does not make sense. You could imagine linking together two modules where one data layout was a "subset" of the other (one is missing details of the vector types, for example, in a module that used no vector types), but even that seems tenuous.

If you're suggesting that a given context should only support modules
with a single common data layout, that doesn't make sense to me.

Even if we don't want to support *linking* modules with different data
layouts, why wouldn't we support loading them both from bitcode in the
same context? Seems like an awkward limitation.

Is this a concern over memory usage, when there are multiple modules
with the same data layout? If so, could that be solved by uniquing
in the context?

From: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Pete Cooper" <peter_cooper@apple.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 4:20:15 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

>
>> From: "Pete Cooper" <peter_cooper@apple.com>
>> To: "Chandler Carruth" <chandlerc@gmail.com>
>> Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
>> Sent: Sunday, October 19, 2014 3:34:59 PM
>> Subject: Re: [LLVMdev] RFC: Are we ready to completely move away
>> from the optionality of a DataLayout?
>>
>>
>>
>> Sent from my iPhone
>>
>>>
>>> I've just wasted a day chasing my tail because of subtleties
>>> introduced to handle the optionality of the DataLayout. I would
>>> like to never do this again. =]
>> Agreed, it's a pain to do this.
>>>
>>> We now have this attached to the Module with just a flimsy
>>> faked-up
>>> pass to keep APIs consistent. So, is there any problem with
>>> beginning down the path of:
>>>
>>> 1) Synthesizing a "default" boring DataLayout for all modules
>>> that
>>> don't specify one.
>>> 2) Changing the APIs to make it clear that this can never be
>>> missing and is always available.
>>> 3) Start ripping out all of the complexity in the compiler
>>> dealing
>>> with this.
>> Sounds like a good plan.
>>
>> One more thing I'd like us to consider after this is where the
>> struct
>> layout map should live. Currently it's in DataLayout which feels
>> right until you think that DataLayout lives in the module but is
>> caching based on pointers in the context.
>>
>> It makes me feel like DataLayout should live in the context, but
>> then
>> LTO is an issue with linking modules with different layouts (is
>> that
>> even allowed?
>
> I think that, generally speaking, this does not make sense. You
> could imagine linking together two modules where one data layout
> was a "subset" of the other (one is missing details of the vector
> types, for example, in a module that used no vector types), but
> even that seems tenuous.
>

If you're suggesting that a given context should only support modules
with a single common data layout, that doesn't make sense to me.

Me either. I was simply saying that linking them together likely does not make sense.

-Hal

Hi,

I have a question:

1) Synthesizing a "default" boring DataLayout for all modules that don't
specify one.

What is a default boring value for endianness?

-- Sanjoy

From: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Pete Cooper" <peter_cooper@apple.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 4:20:15 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

From: "Pete Cooper" <peter_cooper@apple.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 3:34:59 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away
from the optionality of a DataLayout?

Sent from my iPhone

I've just wasted a day chasing my tail because of subtleties
introduced to handle the optionality of the DataLayout. I would
like to never do this again. =]

Agreed, it's a pain to do this.

We now have this attached to the Module with just a flimsy
faked-up
pass to keep APIs consistent. So, is there any problem with
beginning down the path of:

1) Synthesizing a "default" boring DataLayout for all modules
that
don't specify one.
2) Changing the APIs to make it clear that this can never be
missing and is always available.
3) Start ripping out all of the complexity in the compiler
dealing
with this.

Sounds like a good plan.

One more thing I'd like us to consider after this is where the
struct
layout map should live. Currently it's in DataLayout which feels
right until you think that DataLayout lives in the module but is
caching based on pointers in the context.

It makes me feel like DataLayout should live in the context, but
then
LTO is an issue with linking modules with different layouts (is
that
even allowed?

I think that, generally speaking, this does not make sense. You
could imagine linking together two modules where one data layout
was a "subset" of the other (one is missing details of the vector
types, for example, in a module that used no vector types), but
even that seems tenuous.

If you're suggesting that a given context should only support modules
with a single common data layout, that doesn't make sense to me.

Me either. I was simply saying that linking them together likely does not make sense.

-Hal

Okay, cool. I think I agree with you there.

Little. Sorry, but LE won here.

I mean, we could make the default big-endian just to test the less common
scenario, but I think it would just result in bugs in people's test cases
rather than teasing out actual bugs in their code.

Just as a heads up, I’m hearing widespread support and no concerns with this. I’ll probably start poking it forward, although it’s not likely to be a top priority at any time. I’ll at least try to update the documentation where I can find it so that we stop fixing bugs with missing datalayout and just delete that code path.

From: "Chandler Carruth" <chandlerc@gmail.com>
To: "Sanjoy Das" <sanjoy@playingwithpointers.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 5:11:58 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

I have a question:

> 1) Synthesizing a "default" boring DataLayout for all modules that
> don't
> specify one.

What is a default boring value for endianness?
Little. Sorry, but LE won here.

I mean, we could make the default big-endian just to test the less
common scenario, but I think it would just result in bugs in
people's test cases rather than teasing out actual bugs in their
code.

No :wink: -- little endian should be the default.

-Hal

What is a default boring value for endianness?
Little. Sorry, but LE won here.

I mean, we could make the default big-endian just to test the less
common scenario, but I think it would just result in bugs in
people's test cases rather than teasing out actual bugs in their
code.

No :wink: -- little endian should be the default.

Makes sense. I was curious because the current DataLayout analysis
pass chooses big endian by default, and I've had at least one
hard-to-diagnose miscompile because of that. :slight_smile:

-- Sanjoy

From: "Sanjoy Das" <sanjoy@playingwithpointers.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@gmail.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, October 19, 2014 5:48:01 PM
Subject: Re: [LLVMdev] RFC: Are we ready to completely move away from the optionality of a DataLayout?

>> What is a default boring value for endianness?
>> Little. Sorry, but LE won here.
>>
>>
>> I mean, we could make the default big-endian just to test the less
>> common scenario, but I think it would just result in bugs in
>> people's test cases rather than teasing out actual bugs in their
>> code.
>
> No :wink: -- little endian should be the default.

Makes sense. I was curious because the current DataLayout analysis
pass chooses big endian by default, and I've had at least one
hard-to-diagnose miscompile because of that. :slight_smile:

Awesome :wink: -- Yea, we'll likely want to change that.

-Hal

Just as a heads up, I'm hearing widespread support

After 12 hours on a Sunday? You people need Real Lives. :slight_smile:
--paulr

P.S. This won't make multiarch stuff (arm/thumb, or the stuff
Eric will be talking about at the dev meeting) more painful,
right? If it's already module-level that seems unlikely but I
felt like I should ask.

> Just as a heads up, I'm hearing widespread support

After 12 hours on a Sunday? You people need Real Lives. :slight_smile:

Hah. Probably true, but honestly the only reason I'm not waiting a lot
longer is because this is extremely low risk and low controversy I suspect.
The writing has been on the wall here for years.

--paulr

P.S. This won't make multiarch stuff (arm/thumb, or the stuff
Eric will be talking about at the dev meeting) more painful,
right? If it's already module-level that seems unlikely but I
felt like I should ask.

Not at all, if anything it will make it easier by removing variables. I
agree messing with any of that stuff would be cause for concern.

I think linking modules with different vector types makes perfect sense.
Consider a larger program that includes optimised SSE2 vs AVX routines,
switching between them at run time.

Joerg

I've just wasted a day chasing my tail because of subtleties introduced to handle the optionality of the DataLayout. I would like to never do this again. =]

I'm wondering how large the engineering tradeoff actually is. I'm biased towards making DataLayout mandatory but it does break legitimate use cases. Target-independent bitcode is not in the best shape but this change would kill it off entirely, so we better make sure the maintenance is causing enough pain to rectify the change. I debugged missing 'DL != nullptr' checks a couple of times, not the most pleasant task in the world but also not a big hassle.

- Ben

I'm biased towards making DataLayout mandatory but it does break legitimate use cases. Target-independent bitcode is not in the best shape but this change would kill it off entirely, so we better make sure the maintenance is causing enough pain to rectify the change.

Target-independent bitcode exists in the form of things like SPIR and PNaCl. These all have a DataLayout. The IR already implicitly depends on some of these things (e.g. pointer size), making it explicit doesn't break things.

I debugged missing 'DL != nullptr' checks a couple of times, not the most pleasant task in the world but also not a big hassle.

In the case of one of the things that I have in our local branch, the !DL case does the wrong thing (or, at least, probably does the wrong thing). It's easy to make sure that the !DL case does *something*, but it's hard to be sure that that something is actually correct.

David